Research on Classification and Redundant Information Filtering of Massive Data in Big Database

Article Preview

Abstract:

The precise classification of massive information in big database was researched in this paper, and also the redundant information as the interference should be filtered in the subject. According to the traditional data classification method, the frequency points were concentrated and the data classification frequent points were not easy to be eliminated. The nodes classification technology with low self-adaptive property refused the nodes in high disturbance and in the deep attenuation parts, and then the classification precision and the immunity of the disturbance property were limited greatly. A new optimum data classification method and the redundant information model were proposed based on the chaotic probability analysis. The classification error rates was mapped as a probability density function based on the channel mapping function method, the classification probability was allocated with this probability density function. The random series which could reflect the essential feature was produced based on the chaotic probability analysis method which could meet to the demands of the random frequency classification. And the data clustering and optimization classification was realized finally. Simulation was taken with the KDD_CUP2009 experimental big database, and simulation result shows that the proposed method can classify each type of the data effectively. The performance of the data classification is perfect, comparing to the traditional neural net fuzzy c-means method, the classification precision rate was improved by 17.8% It show that the model and algorithm has excellent classification performance and can be taken in the application such as data mining, fault diagnosis and target recognition as engineering practice.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 791-793)

Pages:

1419-1422

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Cristina Stan, C. P. Cristescu, D. G. Dimitriu. Analysis of the intermittent behavior in a low-temperature discharge plasma by recurrence plot quantification[J]. Physics of Plasmas, 2010; 17(4): (042115)1-6.

DOI: 10.1063/1.3385796

Google Scholar

[2] M. Thiel, M. C. Romano. How much information is contained in a recurrence plot[J]. Physics Letters A, 2004; 330: 343-349.

DOI: 10.1016/j.physleta.2004.07.050

Google Scholar

[3] Dunn J C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Q. cybernet, 1974(3): 32-571.

DOI: 10.1080/01969727308546046

Google Scholar

[4] Senthil Arumugam M, Rao MVC, Aarthi Chandramohan. A new and improved version of particle swarm optimization algorithm with global-local best parameters[C]. Knowl Inf Syst, 2008(16): 331-357.

DOI: 10.1007/s10115-007-0109-z

Google Scholar