Papers by Keyword: Information Gain

Paper TitlePage

Authors: Liu Liu, Bao Sheng Wang, Qiu Xi Zhong, Hai Liang Hu
Abstract: Rough set theory is a popular mathematical knowledge to resolve problems which are vagueness and uncertainly. And it has been used of solving the redundancy of attribute and data. Decision tree has been widely used in data mining techniques, because it is efficient, fast and easy to be understood in terms of data classification. There are many approaches have been used to generate a decision tree. In this paper, a novel and effective algorithm is introduced for decision tree. This algorithm is based on the core of discernibility matrix on rough set theory and the degree of consistent dependence. This algorithm is to improve the decision tree on node selection. This approach reduces the time complexity of the decision tree production and space complexity compared with ID3.In the end of the article, there is an example of the algorithm can exhibit superiority.
Authors: Yu Ling Ma
Abstract: With the promotion of social information construction and the rapid update and replacement of large capacity storage equipment, the amount of data from every field grows exponentially. Reportedly, the amount of the data accumulated by Shandong Hi-speed Group is very large. These data can satisfy us some daily usefulness, such as query, retrieval, statistics, statements etc. But what is more important is that how can we discover some useful information from the information ocean. This information can be used in real life such as auxiliary decision. This paper is proposed in this historical background. Data mining is a powerful tool for acquiring knowledge from massive data. Some methods of data mining, such as decision tree, support vector machine, Bayesian decision theory, artificial neural network, k-nearest neighbor, association rule mining etc, are commonly used. In this paper, we design a recommendation system of highway ETC card by using the theory of decision tree. The recommendation system can predict whether a car owner is a potential ETC customer or not through the analysis of the vehicle information. Experiments proved that the accuracy rate of the recommendation system is larger than 90%, so it can provide effective information for the extension of China's ETC card.
Authors: Xiao Juan Chen, Zhi Gang Zhang, Yue Tong
Abstract: As the classical algorithm of the decision tree classification algorithm, ID3 algorithm is famous for the merits of high classifying speed, strong learning ability and easy construction. But when used to make classification, the problem of inclining to choose attributions which have many values affect its practicality. This paper presents an improved algorithm based on the expectation information entropy and Association Function instead of the traditional information gain. In the improved algorithm, it modified the expectation information entropy with the improved Association Function and the number of the attributes values. The experiment result shows that the improved algorithm can get more reasonable and more effective rules.
Authors: Ming Hua Jiang, Xiao Suo Luo
Abstract: In this paper, ID3 Algorithm in Decision Tree is introduced and used to sort computer operation achievement, give useful information to students and teachers.
Authors: Yan Qin Zhao, Zhen Chong Wang, Cheng Zhi Jiang, Wei Lai Hao, Guo Quan Wang
Abstract: ID3 algorithm is a classic method in the classification of data mining, but it has several limits that the notable one is it can not deal with constant values. This paper introduces a new Improved_ID3 algorithm and make use of it in the coal mine safety monitoring process. It has been proved from the production that Improved_ID3 can not only handle the constant attribute values, but also has higher efficiency when creating the rules. Its application in the coal mine safety monitoring is of great significance.
Authors: Ming Liu
Abstract: The excellent suppliers are the guarantee for smooth operation of green supply chain. While the scientific green supplier evaluation system is the foundation of supplier selection. The article analyzes the characteristics of supplier selection criterion system in green supply chain, and lists the common evaluation criterion. To construct a more reasonable evaluation system, the method of information gain analysis is introduced into the paper, which can handpick the criterions reasonably.
Authors: Zong Jie Wang, Yi Liu, Zhong Jian Wang
Abstract: The co-occurrence word emphasize the word and word internal relations, so its use can improve shortage from the hypothetical of Bayesian algorithm. To build Token Dictionary, Information Gain algorithm is used to choose Tokens, and Synonyms Dictionary is used to acquire more Tokens. By large amounts of training, the matching scores of Token are counted, according to the matching rate the Tokens that is valuable are selected, and the Token Dictionary is established. The proposed method is used to E-mail classification experiment, the results show that the accuracy of spam filter has a well improvement.
Authors: Yu Zhou, Guo Qi Wei, He Kun Guo
Abstract: Knowledge of the permeability distribution is critical to a successful reservoir model. Nuclear Magnetic Resonance (NMR) measurements can be used for permeability prediction because the T2 relaxation time is proportional to pore size. Due to the conventional estimators have difficult and complex problems in simulating the relationship between permeability and NMR measurements, an intelligent technique using artificial neural network and genetic algorithm to estimate permeability from NMR measurements is developed. Neural network is used as a nonlinear regression method to develop transformation between the permeability and NMR measurements. Genetic algorithm is used for selecting the best parameters and initial value for the neural network, which solved two major problems of the network: local minima and parameter selection depend on experience. Information gain principle is introduced to select the neural network's input parameters automatically from data. The technique is demonstrated with an application to the well data in Northeast China. The results show that the refined technique make more accurate and reliable reservoir permeability estimation compared with conventional methods. This intelligent technique can be utilized a powerful tool for estimate permeability from NMR logs in oil and gas industry.
Authors: Nian Li, Li Yin, Qing Xi Peng
Abstract: The Internet has experienced profound changes. Large amount of user-generated-contents provide valuable information to the public. Customers usually express their opinion in online shopping. After they finish the reviews, they give an overall rating to the product or service. In this paper, we focus on the review rating prediction problem. Previous studies usually regard this problem as a regression problem. We take a new machine learning method to solve the problem. Learning to rank method has been exploited to tackle the prediction. After feature selection, the maximum entropy classifier has been employed to solve the multi-classification problem. The real life dataset has been crawled to verify the proposed method. Empirical studies demonstrate the proposed method outperform the baseline methods.
Authors: Jing Dong Wang
Abstract: Traditional decision tree is based on the information gain of the decision attribute,but sometimes the information gain is changing dynamically according to different values of the decision attribute.This paper propose the decision forest algorithm which is based on feature counting,deduced the calculation method of dynamic values of decision attribute information gain.,andestablish the model of decision forest with specific data sets.The experiment indicate that the decision-making model of forest classification based on count feature has higher classification accuracy.
Showing 1 to 10 of 12 Paper Titles