Improving Frequent-Term Based Text Clustering with Word Belief Network

Article Preview

Abstract:

The algorithm of frequent-term based text clustering (FTC) can be applied to news topic clustering system, in order to help users locate interested topics and articles quickly. But it is difficult to set support threshold for mining association rules. This paper tries to build a word belief network, which satisfies basic rules of small worlds. So we can improve FTC algorithm with characteristics of small worlds and implement text clustering quickly. The paper puts forward an idea that adopts inverted index into this algorithm, which simplifies programming and improves operation efficiency. The experimental results verified that the system could find current hot news topics efficiently and users could locate their interested document collection.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

207-214

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Xiaoyun Chen. The Key Techniques Research on Text Mining [D]. (2005).

Google Scholar

[2] Jun Guo. Web Search [M], 2009, pp.23-25.

Google Scholar

[3] Tao Liu, Shengping Liu, Zheng Chen, Wei-Ying Ma. An Evaluation on Feature Selection for Text Clustering. Proceedings of International Conference on Machine Learning-ICML 2003. p.488–495.

Google Scholar

[4] Florian Beil, Martin Ester, Xiaowei Xu. Frequent Term-Based Text Clustering. Proceedings of ACM SIGKDD 2002, pp.436-442.

DOI: 10.1145/775047.775110

Google Scholar

[5] Watts, J.W., Strogatz, S.H. 1998. Nature 393: 440-442.

Google Scholar

[6] Maoting Gao, Zhengou Wang. Comparing Dimension Reduction Methods of Text Feature Matrix [J]. COMPUTER ENGINEERING AND APPLICATIONS. (2006).

Google Scholar

[7] Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee. An Efficient Algorithm for Incremental Mining of Association Rules. 15th International Workshop on RIDE-SDMA, Apr. 2005, pp.3-10.

DOI: 10.1109/ride.2005.6

Google Scholar

[8] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining [M], 2011, pp.276-285.

Google Scholar

[9] Tom M. Mitchel, Machine Learning [M], 2003, pp.3-8.

Google Scholar

[10] Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval [M], 2010, pp.26-32.

Google Scholar

[11] http: /www. datatang. com/data/12272.

Google Scholar

[12] http: /xapian. org/docs/apidoc/html/annotated. html. XAPIAN API.

Google Scholar

[13] http: /www. sogou. com/labs/dl/tdte. html.

Google Scholar