A Distributed Processing Framework of Incremental Text Clustering under the Background of Big Data

Zhen Tan; Yi Fan Chen; Zhong Lin Shi; Bin Ge; Yan Li Hu; Da Quan Tang; Hai Kuo Zhang

doi:10.4028/www.scientific.net/AMR.1049-1050.1421

Paper Titles

Application of Rough Set Theory in the Evaluation of Heavy Metal Pollution
p.1403

Step Adaptive Normalization Blind Source Separation Algorithm
p.1407

Study on the Prediction of Shanghai Composite Index Based on a Fusion Model of RBF Neural Network, Markov Chain and GA
p.1413

Consistent Mesh Segmentation Based on Shape Diameter Function and EM
p.1417

A Distributed Processing Framework of Incremental Text Clustering under the Background of Big Data
p.1421

MOEAs Based on Dynamic Chaotic Mutation
p.1427

Multi-Frequency Delta-Kicked Models for the Quantum Ratchet Effect
p.1431

Hybridizing Invasive Weed Optimization and Simulated Annealing Algorithm for High-Dimensional Function Optimization
p.1436

Calculation of Semantic Similarity Property Based on Generalized Semantic Weak Incomplete Information System
p.1440

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 1049-1050A Distributed Processing Framework of Incremental...

A Distributed Processing Framework of Incremental Text Clustering under the Background of Big Data

Abstract:

In the era of big data, due to the rapid expansion of the data, the existing incremental text clustering algorithm has the drawback that the efficiency of algorithm will sharp decline with the time and data volume increasing. Because of poor timeliness and robustness, the algorithms are hard to be applied in practice. In this paper, we propose a distributed model framework of Single-Pass algorithm based on MapReduce, the experiments result of increment text cluster is accuracy, the algorithm effectively improve the computing efficiency of the algorithm and real-time of result. Algorithm has a great prospect under the background of big data.

You might also be interested in these eBooks

Modern Technologies in Materials, Mechanics and Intelligent Systems

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 1049-1050)

Pages:

1421-1426

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.1049-1050.1421

Citation:

Cite this paper

Online since:

October 2014

Authors:

Zhen Tan, Yi Fan Chen, Zhong Lin Shi, Bin Ge, Yan Li Hu, Da Quan Tang, Hai Kuo Zhang*

Keywords:

Big Data, Distributed Single-Pass Algorithm, Incremental Text Clustering, MapReduce

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] Jing L, NgM K and Huang JZ. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data, IEEE Transactions on Knowledge and Data Engineering, Vol. 19 (2007), No. 8, p.1026.

DOI: 10.1109/tkde.2007.1048

Google Scholar

[2] Meyer and Kenneth • Ku Keye: Big Data era ( Zhejiang People's Publication, 2013).

Google Scholar

[3] Takes you to understand the data. Big Data [citation Date 2014-02-26].

Google Scholar

[4] Information on http: /www. 199it. com/archives/196804. html.

Google Scholar

[5] EsterM, KriegelH P, Sander J, et al. Incremental clustering for mining in a data warehousing environment. The 24rd International Conference on Very Large Data Bases. NY: Morgan Kaufmann.

Google Scholar

[6] Ning C and Zhou LX. Grid density based clustering algorithm incremental Journal of Software, Vol. 13 (2002), No. 1, p.1.

Google Scholar

[7] Huang YP and Zou LK. Bulk incremental clustering algorithm based on density data warehouse Computer Engineering and Applications, Vol. 29(2004), p.206.

Google Scholar

[8] Xu XH and Xie YH. Summary of incremental clustering. And incremental DBSCAN clustering algorithm North China Institute of Aerospace Engineering, Vol. 16 (2006), No. 2, p.15.

Google Scholar

[9] Liu JY, Li F. A high performance based on incremental density clustering algorithm. Computer Engineering. Vol. 32 (2006), No. 21, p.76.

Google Scholar

[10] Liu JM and H LC. Waiting Sumner A new clustering algorithm based on particle clustering algorithm Computer Engineering and Applications, Vol. 41(2005), NO. 20, p.183.

Google Scholar