Discriminate Chinese Word Segmenter with Global and Context Features

Article Preview

Abstract:

Chinese Word segmenter is the basis for all subsequent applications of natural language processing. The Corpus-based statistic method has become the predominant method. However, the training corpora are not enough especially in certain areas. Therefore, we introduce some global features and context features in order to get almost the same performance only with much smaller scale corpus. The experiments results show that our approach significantly outperforms the original feature sets in the same training data. Meanwhile, the time-consuming of model training is also reduced. In addition, these features do not depend on classifiers, so our method can easily be changed to other models.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

267-272

Citation:

Online since:

September 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Weiwei Sun. A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging. 2011, 13.

Google Scholar

[2] Z. G. Li and Y. Liu. Chinese name recognition based on boundary templates and local frequency (Chinese), InJournal of Chinese Information Processing, 20, pp.44-50, (2006).

Google Scholar

[3] Weiwei Sun, Jia Xu. Enhancing Chinese Word Segmentation Using Unlabeled Data. 2011, 14.

Google Scholar

[4] Yue Zhang and Stephen Clark. Chinese Segmentation with a Word-Based Perceptron Algorithm.

Google Scholar

[5] Maoxi Li, Chengqing Zong, Hwee Tou Ng. Automatic evaluation of Chinese translation output: word-level or character-level.

Google Scholar

[6] Ann Clifton, Anoop Sarkar. Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction.

Google Scholar

[7] Zhongguo Li. Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation.

Google Scholar

[8] John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields: Probabilistic Models for Sequence Data, 2001, 282-289.

Google Scholar

[9] Lishuang Li Degen Huang, Dan li. Recognizing Chinese Person Names based on Hybrid Models. International Journal of Advanced Intelligence, Volume 3, Number 2, pp.219-228, July, (2011).

Google Scholar