Topical Concept Based Text Clustering Method

Yi Ding; Xian Fu

doi:10.4028/www.scientific.net/AMR.532-533.939

Paper Titles

Research on Data Mining Technology in Web Based on the Cloud Computing
p.919

Low-Altitude Airship Remote Sensing Images Matching Based on Improved Hybrid Ant Colony Algorithm
p.924

A Model Study on Emergency Information’s Spread under Mobile Internet Environment
p.929

Image Recognition and Wireless Transmission–Based Intelligent Vehicle Access Control System
p.934

Topical Concept Based Text Clustering Method
p.939

Design of USB Communication System Based on C8051F340 MCU
p.944

Trademark Image Retrieval Using Weighted Region Information Entropy
p.949

Image Matching Based on Analysis of Reliability in Log-Polar Field
p.954

A New Outlier Mining Method Based on CLIQUE in Multi-Database
p.959

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 532-533Topical Concept Based Text Clustering Method

Topical Concept Based Text Clustering Method

Abstract:

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. . To solve these problems, based on topic concept clustering, this paper proposes a method for Chinese document clustering. In this paper, we introduce a novel topical document clustering method called Document Features Indexing Clustering (DFIC), which can identify topics accurately and cluster documents according to these topics. In DFIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document features are investigated and exploited. Experimental results show that DFIC can gain a higher precision (92.76%) than some widely used traditional clustering methods.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

939-943

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.532-533.939

Citation:

Cite this paper

Online since:

June 2012

Authors:

Yi Ding, Xian Fu

Keywords:

Clusters Indexing, Document Clustering, Topical Concept

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, Indexing by latent semantic analysis [J], Journal of the Society for Information Science, 2002, 41(6), 391-407.

DOI: 10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9

Google Scholar

[2] Lee D-L, Chuang H and Seamons K. Document Ranking and the Vector-Space Model [J]. IEEE Software, 20097, Vol. 14 (2): 67-75.

DOI: 10.1109/52.582976

Google Scholar

[3] Daniel Fasulo. An analysis of recent work on clustering algorithms [M]. Technical Report UW-CSE-01-03-02, University of Washington, (2004).

Google Scholar

[4] Zamir O and Etzioni O. Web Document Clustering: A Feasibility Demonstration [A]. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 2008. pp.46-54.

DOI: 10.1145/290941.290956

Google Scholar

[5] Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology [M]. Cambridge, UK: Cambridge University Press, (2007).

DOI: 10.1145/300307.1040356

Google Scholar

[6] S. A. Macskassy, A. Banerjee, B.D. Davison, and H. Hirsh. Human performance on clustering web pages: a preliminary study. In Proc. of KDD-1998, New York, NY, USA, August 2008, pages 264–268, Menlo Park, CA, USA, 2008. AAAI Press.

Google Scholar

[7] A. Maedche and S. Staab. Ontology learning for the semantic web. IEEE Intelligent Systems, 16(2), (2001).

DOI: 10.1109/5254.920602

Google Scholar

[8] G. Miller. WordNet: A lexical database for english. CACM, 38(11): 39–41, (2005).

Google Scholar

[9] G. Neumann, R. Backofen, J. Baur, M. Becker, and C. Braun. An information extraction core system for real world german text processing. In ANLP-1997 — Proceedings of the Conference on Applied Natural Language Processing, pages 208–215, Washington, USA, (2007).

DOI: 10.3115/974557.974588

Google Scholar