Papers by Keyword: Feature Selection

Paper TitlePage

Abstract: One single machine learning algorithm presents shortcomings when the data environment changes in the process of application. This article puts forward a heteromorphic ensemble learning model made up of bayes, support vector machine (SVM) and decision tree which classifies P2P traffic by voting principle. The experiment shows that the model can significantly improve the classification accuracy, and has a good stability.
2693
Abstract: Traditional random forest algorithm is difficult to achieve very good effect for the classification of small sample data set. Because in the process of repeated random selection, selection sample is little, resulting in trees with very small degree of difference, which floods right decisions, makes bigger generalization error of the model, and the predict rate is reduced. For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling. Eventually reduce model generalization error, and improve accuracy of prediction.
1416
Abstract: Fault diagnosis is very important to ensure the safe operation of hydraulic generator units (HGU). Because of the complexity of HGU, the vast amounts of measured data and the redundant information, the accuracy and instantaneity of fault diagnosis are severely limited. At present, feature selection technique is an effective method to break through this bottleneck. According to the specific characteristics of HGU faults, this paper puts forward a hierarchical feature selection method based on classification tree (HFSMCT). HFSMCT selects the most effective feature for each branch node through filtering evaluation criteria and heuristic search strategy, and all the selected features constitute the final feature set. Moreover, HFSMCT is easy to design and implement, and it is very prominent in computational efficiency and accuracy. The simulation results also prove that HFSMCT is very suitable for HGU fault diagnosis.
398
Abstract: Feature selection is an effective pre-processing technology to facilitate text mining on high dimensional feature space. In recent years, many effective redundant feature selection methods have been proposed from different motivations. However, a comparative experimental study on redundant feature selection methods in the field of text mining has not been reported yet. In order to solve this problem, an extensive empirical comparative study with the task of text classification is given in the paper. The experimental results indicate that the 3-way Mutual Information represents the redundancy much better than traditional 2-way Mutual Information, since the label information are considered by 3-way Mutual Information. As a result, the performances of redundant feature selection methods based on 3-way Mutual Information overwhelm other methods.
1258
Abstract: Through the research on the flow identification algorithm based on statistical feature, this paper puts forward the statistical feature selection algorithm in order to reduce the number of features in identification, increase the speed of the flow identification, the experimental results show that the algorithm can effectively reduce the amount of features, improve the efficiency of identification.
1709
Abstract: With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.
1219
Abstract: Synthetic aperture radar (SAR) is a sort of microwave remote sensing imaging radar, which has much advantage. But it also has much shortcoming, such as speckle noise and directional sensitivity. Reducing impact of them to SAR image processing and applications is an important content, especially, extracting features for ground objects. Contourlet transform is a kind of multi-scale and multi-direction transform theory, and it is a sparse representation mode, too. This paper mainly studied Contourlet transform theory and its decomposition structure, and then it was used to extract SAR image features. Experimental results show that Contourlet transform can effetely extract SAR image features.
431
Abstract: The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit rating for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines (SVM) against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.
573
Abstract: In order to cope with the increasing size of the training corpus and adapt to the requirements of incremental learning, this paper introduces a feature selection algorithm of maximum entropy model into the research of Chinese word segmentation technology, designs and implements a Chinese word segmentation system based on incremental learning. The experimental results show that the system gradually improves the segmentation accuracy in the incremental learning process which without wasting time to restudy.
3469
Abstract: Due to the large amount of network data and complex representation, traditional network security behavior recognition system always leads to high redundancy and dimension, resulting in taking up more resources, larger computation. To solve this problem, we do the features selection. This article presents a consensus decision-making method, which combines current famous feature selection algorithms to obtain a more reasonable result and to sort the features in order of importance to facilitate the appropriate selection of features under different conditions. With this method tested on SVM (Support Vector Machine) as classification algorithm, it proves that the algorithm effectively improves the recognition accuracy with fewer features and performs better in terms of result stability.
2188
Showing 11 to 20 of 118 Paper Titles