An Improved Genetic Algorithm for Text Clustering

Article Preview

Abstract:

The genetic algorithm (GA) is a self-adapted probability search method used to solve optimization problems, which has been applied widely in science and engineering. In this paper, we propose an improved variable string length genetic algorithm (IVGA) for text clustering. Our algorithm has been exploited for automatically evolving the optimal number of clusters as well as providing proper data set clustering. The chromosome is encoded by special indices to indicate the location of each gene. More effective version of evolutional steps can automatically adjust the influence between the diversity of the population and selective pressure during generations. The superiority of the improved genetic algorithm over conventional variable string length genetic algorithm (VGA) is demonstrated by providing proper text clustering.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

1853-1856

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Y. C. Liu, X. L. Wang, Z. M. Xu and Y. Guan, A Survey of Document Clustering, Journal of Chinese Information Processing, vol. 20, no. 3, 2006, p.55–62.

Google Scholar

[2] M. Sasaki and H. Shinnou, Spam Detection Using Text Clustering, " In: Proceedings of the 2005 International Conference on Cyberworlds (CW, 05), Singapore, Nov. 2005, p.316–319.

DOI: 10.1109/cw.2005.83

Google Scholar

[3] F. He and X. Q. Ding, Combining Text Clustering and Retrieval for Corpus Adaptation, Proceedings of SPIE, Jan. 2007, vol. 6500, pp. 65000P. 1-65000P. 7.

Google Scholar

[4] Maulik and Bandyopadhyay, Genetic Algorithm Based Clustering Technique, Pattern Recognition, vol. 33, no. 9, 2000, p.1455–1465.

DOI: 10.1016/s0031-3203(99)00137-5

Google Scholar

[5] Sanghamitra Bandyopadhyay and Ujjwal Mauilk, Nonparametric Genetic Clustering: Comparison of Validity Indices, IEEE Transactions on System, Man, and Cybernetics-Part C: Applications and Reviews, vol. 31, no. 1, Feb. 2001, p.120–125.

DOI: 10.1109/5326.923275

Google Scholar

[6] Ujjwal Mauilk and Sanghamitra Bandyopadhyay, Performance Evaluation of Some Clustering Algorithms and Validity Indices, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 12, Dec. 2002, p.1650–1654.

DOI: 10.1109/tpami.2002.1114856

Google Scholar

[7] C. J. van RIJSBERGEN, Information Retrieval, 2nd ed., London: Butterworths, (1979).

Google Scholar

[8] Xin Yao, Yong liu and Guangming Lin, Evolutionary Programming Made Faster, IEEE Transactions on Evolutionary Computation, vol. 3, no. 2, Jul. 1999, p.82–102.

DOI: 10.1109/4235.771163

Google Scholar

[9] D. L. Davies and D. W. Bouldin, A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, 1979, p.224–227.

DOI: 10.1109/tpami.1979.4766909

Google Scholar