Realization of Text Categorization for Small-Scaled Dataset

Article Preview

Abstract:

Testing of the text categorization and comparison testing is carried out based on small-scaled dataset. In case of lack of trained set, without training, the indexed text keywords are used to categorize the expert subject terms, with large categorization accuracy amounted to 0.82. In case of less trained set, after training, the characteristics vectors acquired from the training are added into experts’ subject terms and are categorized, with large accuracy amounted to 0.94, the level-3 accuracy amounted to 0.73, so the results are satisfying.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1239-1242

Citation:

Online since:

June 2012

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Fabrizio Sebastiani. Machine learning in automated text categorization [J]. ACM Computing Surveys,2002,34(1):1-47.

Google Scholar

[2] Y. Yang. An evaluation of statistical approaches to text categorization [J]. Journal of Information Retrieval, 1(1/2): 67-88, (1999).

Google Scholar

[3] Pang Jianfeng, etal. Research and implementation of text automatic categorization system based on vector space model [J]. computer application investigation, 2001, 18(9): 23~26.

Google Scholar

[4] Zhou Xuezhong. Researches on Chinese text categorization feature representation and categorization methods [C]. Advances in Computation of Oriental Languages. Beijing: publishing company of Tsinghua University, (2003).

Google Scholar

[5] Chen Keli. Balanced language material analysis and text categorization methods based on large-scale real texts [C]. Advances in Computation of Oriental Languages. Beijing: publishing company of Tsinghua University, (2003).

Google Scholar

[6] Shi tongnian, Lu zhongliang. Researches on multi-classification and multi-label Chinese text automatic categorization [J]. Journal of Information,2003, 22(3): 306-309.

Google Scholar