Research on Building Methods of Hierarchical Structure in Text Classification

Article Preview

Abstract:

There always exists semantic hierarchical relationship in text classification. Therefore, it's inevitable to organize documents in accordance with the hierarchical structure. Based on confusion matrix, this paper attempted to adopt two different algorithms including hierarchical clustering and confusion category to build hierarchical structure of document category, and finally made use of hierarchical classification to carry on experiment, results of which showed that the confusion category strategy is superior to hierarchical clustering strategy and recall and precision of flat classification are both improved.

You have full access to the following eBook

Info:

[1] Yuan Shijin, Li Ronglu, Zhou Shuigeng, Hu Yunfa. Hierarachical Chinese Document Categorization. Journal of China Institute of Communications, Vol. 25 No. 11, 2004: 55-63.

Google Scholar

[2] Zhan Xuegang, Lin Hongfei, Yao Tianshun. Hierarachical Method for Chinese Document Classification. Journal of Chinese Information Processing, Vol. 13, No. 6, 1999: 20-25.

Google Scholar

[3] McCallum. A, Rosenfeld. R, Mitchell. T, Ng.A. Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML98). Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 1998, 359-367.

Google Scholar

[4] Koller. D, Sahami.M. Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning (ICML97), Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 1997, 170-178.

Google Scholar

[5] Wan Hao, Ren Yong, Shan Xiuming. Confusion-Matrix Based Whole Aspect Range HRRP Recognition, Microelectronics & Computer, Vol. 22, No. 3, 2005: 136-143.

Google Scholar

[6] Zhang Jialu, Qi Shiqian, Yu ge. Assessment methods of speech synthesis systems for Chinese, Acta Acustica, Vol. 23, No. 1, 1998: 19-30.

Google Scholar

[7] Godbole.S. Exploiting confusion matrices for automatic generation of topic hierarchies and scaling up multi-way classifiers. Technical Report, Indian Institute of Technology, Bombay, 2002, Available online at http: /citeseer. nj. nec. com/godbole02exploiting. html.

Google Scholar

[8] Godbole, Sarawagi. S, Chakrabarti.S. Scaling multi-class support vector machines using inter-class confusion. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, NY, USA, 2002, 513-518.

DOI: 10.1145/775047.775122

Google Scholar