Feature Selection | p. 2 | Scientific.Net

Classification of P2P Traffic Based on a Heteromorphic Ensemble Learning Model

Authors: Li Ding, Li Mao, Xiao Feng Wang

Abstract: One single machine learning algorithm presents shortcomings when the data environment changes in the process of application. This article puts forward a heteromorphic ensemble learning model made up of bayes, support vector machine (SVM) and decision tree which classifies P2P traffic by voting principle. The experiment shows that the model can significantly improve the classification accuracy, and has a good stability.

2693

Feature Selection and Optimization of Random Forest Modeling

Authors: Min Zhu, Jing Xia, Mo Lei Yan, Sheng Yu Zhang, Guo Long Cai, Jing Yan, Gang Min Ning

Abstract: Traditional random forest algorithm is difficult to achieve very good effect for the classification of small sample data set. Because in the process of repeated random selection, selection sample is little, resulting in trees with very small degree of difference, which floods right decisions, makes bigger generalization error of the model, and the predict rate is reduced. For the sample size of sepsis cases data, this paper adopts for parameters used in random forest modeling interval division choice; divide feature interval into high correlation and uncertain correlation intervals; select data from two intervals respectively for modeling. Eventually reduce model generalization error, and improve accuracy of prediction.

1416

A Hierarchical Feature Selection Method Based on Classification Tree for HGU Fault Diagnosis

Authors: Xiao Yue Chen, Jian Zhong Zhou, Xiao Min Xu, Yong Chuan Zhang

Abstract: Fault diagnosis is very important to ensure the safe operation of hydraulic generator units (HGU). Because of the complexity of HGU, the vast amounts of measured data and the redundant information, the accuracy and instantaneity of fault diagnosis are severely limited. At present, feature selection technique is an effective method to break through this bottleneck. According to the specific characteristics of HGU faults, this paper puts forward a hierarchical feature selection method based on classification tree (HFSMCT). HFSMCT selects the most effective feature for each branch node through filtering evaluation criteria and heuristic search strategy, and all the selected features constitute the final feature set. Moreover, HFSMCT is easy to design and implement, and it is very prominent in computational efficiency and accuracy. The simulation results also prove that HFSMCT is very suitable for HGU fault diagnosis.

398

Redundant Feature Selection Methods in Text Classification

Authors: Su Fen Chen

Abstract: Feature selection is an effective pre-processing technology to facilitate text mining on high dimensional feature space. In recent years, many effective redundant feature selection methods have been proposed from different motivations. However, a comparative experimental study on redundant feature selection methods in the field of text mining has not been reported yet. In order to solve this problem, an extensive empirical comparative study with the task of text classification is given in the paper. The experimental results indicate that the 3-way Mutual Information represents the redundancy much better than traditional 2-way Mutual Information, since the label information are considered by 3-way Mutual Information. As a result, the performances of redundant feature selection methods based on 3-way Mutual Information overwhelm other methods.

1258

Flow Feature Selection Method Based on Statistics

Authors: Kai Min Song, Xun Yi Ren

Abstract: Through the research on the flow identification algorithm based on statistical feature, this paper puts forward the statistical feature selection algorithm in order to reduce the number of features in identification, increase the speed of the flow identification, the experimental results show that the algorithm can effectively reduce the amount of features, improve the efficiency of identification.

1709

Sentiment Analysis of Chinese Micro Blog Using Machine Learning and an Improved Feature Selection Method

Authors: Jia Hao Chen, Jian Hua Wu

Abstract: With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.

1219

Feature Analysis and Selection of SAR Image Based on Contourlet Transform

Authors: Shi Qi Huang, Pei Feng Su, Yi Ting Wang

Abstract: Synthetic aperture radar (SAR) is a sort of microwave remote sensing imaging radar, which has much advantage. But it also has much shortcoming, such as speckle noise and directional sensitivity. Reducing impact of them to SAR image processing and applications is an important content, especially, extracting features for ground objects. Contourlet transform is a kind of multi-scale and multi-direction transform theory, and it is a sparse representation mode, too. This paper mainly studied Contourlet transform theory and its decomposition structure, and then it was used to extract SAR image features. Experimental results show that Contourlet transform can effetely extract SAR image features.

431

SVM-Based Credit Rating and Feature Selection

Authors: Yu Qiang Qin, Yu Dong Qi, Hui Ying

Abstract: The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit rating for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines (SVM) against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.

573

Research on the Chinese Word Segmentation System Based on Incremental Learning

Authors: Fan Jin Mai, Shi Tong Wu, Lai Yue Wang

Abstract: In order to cope with the increasing size of the training corpus and adapt to the requirements of incremental learning, this paper introduces a feature selection algorithm of maximum entropy model into the research of Chinese word segmentation technology, designs and implements a Chinese word segmentation system based on incremental learning. The experimental results show that the system gradually improves the segmentation accuracy in the incremental learning process which without wasting time to restudy.

3469

Network Security Behavior Recognition Based on Consensus Decision-Making Feature Selection

Authors: Yang Yu, Li Mao, Xiao Feng Wang

Abstract: Due to the large amount of network data and complex representation, traditional network security behavior recognition system always leads to high redundancy and dimension, resulting in taking up more resources, larger computation. To solve this problem, we do the features selection. This article presents a consensus decision-making method, which combines current famous feature selection algorithms to obtain a more reasonable result and to sort the features in order of importance to facilitate the appropriate selection of features under different conditions. With this method tested on SVM (Support Vector Machine) as classification algorithm, it proves that the algorithm effectively improves the recognition accuracy with fewer features and performs better in terms of result stability.

2188

Papers by Keyword: Feature Selection