The Research on Tibetan Text Classification Based on N-Gram Model

Article Preview

Abstract:

This Compared with the traditional text classification model, the Tibetan text classification based on N-Gram model has adopted N-Gram model in terms of the level of word. In other words, during the text classification, word segmentation is not required. Also, feature selection and abundant pre-treatment processes are avoided. This paper not only carried out profound research on N-Gram models, but also discusses the selection of parameter N in the model by adopting Naïve Bayes Multinomial classifier.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1896-1900

Citation:

Online since:

March 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] C. Jun, W. Gao, B. Liu, et al.: A Cache-based Distributed Terabyte Text Retrieval System in China-America Digital Academic Library, ICADL2002 (Singapore 2002).

Google Scholar

[2] John Ueffrty, C. X. Zhai: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information retrieval In 2001 ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), (2001), pp.334-342.

DOI: 10.1145/383952.384019

Google Scholar

[3] Kian Ming Adam Chai, Hwee Tou Ng and Hai Leong Chieu. Bayesian online classifiers for text classification and filtering. Proceedings of the 25st ACM International Conference on Research and Development in Information Retrieval (SIGIR-02), 2002, pp.97-104.

DOI: 10.1145/564376.564395

Google Scholar

[4] Roberto Basili, Alessandro Moschitti and Maria Teresa Pazienza. Language sensitive text classification. Proceeding of the 6th international Conference on Content-Based Multimedia Information Access (RIAO-00). 2000, pp.331-343.

Google Scholar

[5] Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, et al.: Multilingual and Cross-lingual News Topic Tracking. In proceedings of the 20th International Conference on Computational Linguistics (Switzerland, 2004).

DOI: 10.3115/1220355.1220493

Google Scholar

[6] D. J. Xue: A Study on Key Issues of Automated Text Categorization for Chinese Documents (Beijing, China 2004), pp.58-61.

Google Scholar

[7] Pedro Domingos. Bayesian averaging of classifiers and the over fitting problem. Proceedings of the 17th International Conference on Machine Learning (ICML-00). (2000), pp.223-230.

Google Scholar