The Research on Tibetan Text Classification Based on N-Gram Model

Deng Zhou; Wen Huang He; Tao Tao Wu

doi:10.4028/www.scientific.net/AMM.543-547.1896

Paper Titles

Bi-Cubic Interpolation Algorithm Based on Non-Subsampled Contourlet Transformation
p.1880

BFO Optimization Algorithms for Vehicle Routing Problem with Time Windows
p.1884

A Novel BFO Optimization Algorithm with Neighborhood Learning
p.1888

Study on Three-Dimensional Surgical Simulation and Face Prediction of the Individualized Maxillofacial Soft and Hard Tissue
p.1892

The Research on Tibetan Text Classification Based on N-Gram Model
p.1896

The Improved Fuzzy Analytic Hierarchy Process
p.1901

The Properties of Bilinear Derivative
p.1905

A Research on the Improved Information Fusion Algorithm
p.1909

Large-Scale Text Clustering Based on Improved K-Means Algorithm in the Storm Platform
p.1913

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 543-547The Research on Tibetan Text Classification Based...

The Research on Tibetan Text Classification Based on N-Gram Model

Abstract:

This Compared with the traditional text classification model, the Tibetan text classification based on N-Gram model has adopted N-Gram model in terms of the level of word. In other words, during the text classification, word segmentation is not required. Also, feature selection and abundant pre-treatment processes are avoided. This paper not only carried out profound research on N-Gram models, but also discusses the selection of parameter N in the model by adopting Naïve Bayes Multinomial classifier.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 543-547)

Pages:

1896-1900

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.543-547.1896

Citation:

Cite this paper

Online since:

March 2014

Authors:

Deng Zhou*, Wen Huang He, Tao Tao Wu

Keywords:

Corpus, Naive Bayes Multinomial Classifier, N-Gram Model, Text Classification

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] C. Jun, W. Gao, B. Liu, et al.: A Cache-based Distributed Terabyte Text Retrieval System in China-America Digital Academic Library, ICADL2002 (Singapore 2002).

Google Scholar

[2] John Ueffrty, C. X. Zhai: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information retrieval In 2001 ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), (2001), pp.334-342.

DOI: 10.1145/383952.384019

Google Scholar

[3] Kian Ming Adam Chai, Hwee Tou Ng and Hai Leong Chieu. Bayesian online classifiers for text classification and filtering. Proceedings of the 25st ACM International Conference on Research and Development in Information Retrieval (SIGIR-02), 2002, pp.97-104.

DOI: 10.1145/564376.564395

Google Scholar

[4] Roberto Basili, Alessandro Moschitti and Maria Teresa Pazienza. Language sensitive text classification. Proceeding of the 6th international Conference on Content-Based Multimedia Information Access (RIAO-00). 2000, pp.331-343.

Google Scholar

[5] Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, et al.: Multilingual and Cross-lingual News Topic Tracking. In proceedings of the 20th International Conference on Computational Linguistics (Switzerland, 2004).

DOI: 10.3115/1220355.1220493

Google Scholar

[6] D. J. Xue: A Study on Key Issues of Automated Text Categorization for Chinese Documents (Beijing, China 2004), pp.58-61.

Google Scholar

[7] Pedro Domingos. Bayesian averaging of classifiers and the over fitting problem. Proceedings of the 17th International Conference on Machine Learning (ICML-00). (2000), pp.223-230.

Google Scholar