CRFs Based Chinese Word Segmentation

Article Preview

Abstract:

Chinese word segmentation is a fundamental problem in natural language processing. CRFs (Conditional Random Fields, CRFs) is an undirected graph model. It can work well with a variety of features, full use of the text information. Thus, this article adopts CRFs based Chinese word segmentation. This paper first gives the definition of CRFs model, the model parameter learning methods and reasoning algorithms. Then, it introduces the word tagging system which is widely used in Chinese word segmentation. The Bakeoff 2005 corpora are used in Chinese word segmentation experiments, and we achieve an excellent result on both MSRA and PKU corpora. The F-Measures on both corpora are 0.964 and 0.943, while the ROOV Values are 0.705 and 0.765.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4376-4379

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Jin Kiat Low, Hwee Tou Ng and Wenyuan Guo. A maximum ent ropy approach to Chinese words Segmentation, SIGHAN, (2005).

Google Scholar

[2] Huihsin Tseng, Pichuan Chang, Galen Andrew, Daniel Jurafsky and Christopher Manning. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005, SIGHAN, (2005).

Google Scholar

[3] Hai Zhao, Chang-Ning Huang, Mu Li and Bao-Liang Lu. A Unified Character-Based Tagging Framework for Chinese Word Segmentation, ACM Trans, (2010).

Google Scholar

[4] John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 01 Proceedings of the Eighteenth International Conference on Machine Learning, (2001).

DOI: 10.1145/1015330.1015422

Google Scholar

[5] Sutton, Charles, McCallum, Andrew. AnIntroduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4(4), (2012).

Google Scholar

[6] Huihsin Tseng, Daniel Jurafsky, and Christopher Manning. Morphological features help pos tagging of unknown words across language varieties. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, (2005).

Google Scholar

[7] Fuchun Peng, Fang-fang Feng, Andrew McCallum. Chinese Segmentation and new word detection using conditional random fields. COLING '04 Proceedings of the 20th international conference on Computational Linguistics.

DOI: 10.3115/1220355.1220436

Google Scholar

[8] GuoDong Zhou, Jian Su. A Chinese Efficient Analyser Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing. Association for Computational Linguistics.

DOI: 10.3115/1119250.1119261

Google Scholar