A Distributed Processing Framework of Incremental Text Clustering under the Background of Big Data

Article Preview

Abstract:

In the era of big data, due to the rapid expansion of the data, the existing incremental text clustering algorithm has the drawback that the efficiency of algorithm will sharp decline with the time and data volume increasing. Because of poor timeliness and robustness, the algorithms are hard to be applied in practice. In this paper, we propose a distributed model framework of Single-Pass algorithm based on MapReduce, the experiments result of increment text cluster is accuracy, the algorithm effectively improve the computing efficiency of the algorithm and real-time of result. Algorithm has a great prospect under the background of big data.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 1049-1050)

Pages:

1421-1426

Citation:

Online since:

October 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Jing L, NgM K and Huang JZ. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data, IEEE Transactions on Knowledge and Data Engineering, Vol. 19 (2007), No. 8, p.1026.

DOI: 10.1109/tkde.2007.1048

Google Scholar

[2] Meyer and Kenneth • Ku Keye: Big Data era ( Zhejiang People's Publication, 2013).

Google Scholar

[3] Takes you to understand the data. Big Data [citation Date 2014-02-26].

Google Scholar

[4] Information on http: /www. 199it. com/archives/196804. html.

Google Scholar

[5] EsterM, KriegelH P, Sander J, et al. Incremental clustering for mining in a data warehousing environment. The 24rd International Conference on Very Large Data Bases. NY: Morgan Kaufmann.

Google Scholar

[6] Ning C and Zhou LX. Grid density based clustering algorithm incremental Journal of Software, Vol. 13 (2002), No. 1, p.1.

Google Scholar

[7] Huang YP and Zou LK. Bulk incremental clustering algorithm based on density data warehouse Computer Engineering and Applications, Vol. 29(2004), p.206.

Google Scholar

[8] Xu XH and Xie YH. Summary of incremental clustering. And incremental DBSCAN clustering algorithm North China Institute of Aerospace Engineering, Vol. 16 (2006), No. 2, p.15.

Google Scholar

[9] Liu JY, Li F. A high performance based on incremental density clustering algorithm. Computer Engineering. Vol. 32 (2006), No. 21, p.76.

Google Scholar

[10] Liu JM and H LC. Waiting Sumner A new clustering algorithm based on particle clustering algorithm Computer Engineering and Applications, Vol. 41(2005), NO. 20, p.183.

Google Scholar