A Text Hybrid Clustering Algorithm Based on HowNet Semantics

Zheng Yu Zhu; Shu Jia Dong; Chun Lei Yu; Jie He

doi:10.4028/www.scientific.net/KEM.474-476.2071

Paper Titles

Energy Environment and Economic Sustainable Growth
p.2049

Research on Natural Characteristics of Helical Gear in Gearbox for Wind Turbine Generator
p.2054

Research of Magnetic Stage Control System Based on DSP
p.2058

Finite Element Analysis of Hybrid Ceramic Ball Bearing Contact
p.2064

A Text Hybrid Clustering Algorithm Based on HowNet Semantics
p.2071

Anti-Stealth Radar with Spread Spectrum Technology
p.2079

Application of Hyper-Dispersant in PP/OMMT Composite
p.2085

Research on Building Engineering Materials Supplier Selection Based on ANP Method
p.2089

Application of CMAC-PID Controller Based FPGA in the Wastewater Treatment
p.2095

HomeKey Engineering MaterialsKey Engineering Materials Vols. 474-476A Text Hybrid Clustering Algorithm Based on HowNet...

A Text Hybrid Clustering Algorithm Based on HowNet Semantics

Abstract:

Many existing text clustering algorithms overlook the semantic information between words and so they possess a lower accuracy of text similarity computation. A new text hybrid clustering algorithm (HCA) based on HowNet semantics has been proposed in this paper. It calculates the semantic similarity of words by using the words’ semantic concept description in HowNet and then combines it with the method of maximum weight matching of bipartite graph to calculate a semantic-based text similarity. Based on the new text similarity and by combining an improved genetic algorithm with k-medoids algorithm, HCA has been designed. The comparative experiments show that: 1) compared with two existing traditional clustering algorithms, HCA can get better quality and 2) when their text cosine similarity is replaced with the new semantic-based text similarity, all the qualities of the three clustering algorithms can be improved significantly.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Pages:

2071-2078

DOI:

https://doi.org/10.4028/www.scientific.net/KEM.474-476.2071

Citation:

Cite this paper

Online since:

April 2011

Authors:

Zheng Yu Zhu, Shu Jia Dong, Chun Lei Yu, Jie He

Keywords:

Genetic Algorithm (GA), HowNet, Maximum Weight Matching, Semantic Similarity, Text Clustering

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Zhendong Dong, Qiang Dong. Introduction to HowNet [M/OL]. 1999. http: /www. keenage. com.

Google Scholar

[2] Qun Liu, Sujian Li. Word Similarity Computing Based on How-net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.

Google Scholar

[3] Gang Yu, Yangjun Pei, Zhengyu Zhu et al. Research of text similarity based on word similarity computing [J]. Computer Engineering and Design, 2006, (2): 67-70.

Google Scholar

[4] Qu Gong. Graph theory and network optimization algorithms [M]. Chongqing: Chongqing University Press, 2000. 87-96.

Google Scholar

[5] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques, Second Edition [M]. Beijing: China Machine Press, 2007. 263-266.

Google Scholar

[6] Zhengyu Zhu, Lipei Li, Ying Luo et al. Fitness Function Applied to Chinese Text Clustering [J]. Computer Science, 2009, (5): 244-246, 272.

Google Scholar

[7] C. -H. Chou, M. -C. Su, E. Lai A new cluster validity measure and its application to image compression [J]. Pattern Analysis & Applications (Springer London). July 2004. Vol 7, Issue 2. 205-220.

DOI: 10.1007/s10044-004-0218-1

Google Scholar

[8] Licheng Jiao, Fang Liu, Shuiping Gou et al. Intelligent Data Mining and Knowledge Discovery [M]. Xian: XiDian University Press, 2006. 351-353.

Google Scholar

[9] Zhengyu Zhu, Yunyan Tian. An improved partitioning-based web documents clustering method combining GA with ISODATA [J]. Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, v2, pp.208-213.

DOI: 10.1109/fskd.2007.165

Google Scholar

[10] Ronglu Li. Chinese text classification corpus [DB/OL]. 2003. http: /www. nlp. org. cn/docs /download. php?doc_id=281.

Google Scholar

[11] LARSEN B, AONE C. Fast and effective text mining using linear time document clustering [A]. In Proc. of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999. pp.16-22.

DOI: 10.1145/312129.312186

Google Scholar