Identification of Chinese Unknown Word Based on Finite Multi-List Method

Article Preview

Abstract:

Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Pages:

460-465

Citation:

Online since:

April 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] P. Zhang , finite multi-listing" the discussion about the of "finite multi-listing, for automatic segmentation methods in Modern Chinese: language and computer, 61-64(1986)No. 3(In Chinese).

Google Scholar

[2] K.K. He,H. Xu,B. Sun, Design Principle of Expert System for Automatic Words Segmentation in Written Chinese: Journal of Chinese Information Processing, 38-47(1991)No. 5(In Chinese).

Google Scholar

[3] X.H. Chen, A Package Scheme for Identifying Unlisted Words in Chinese Segmentation : Applied Linguistics(In Chinese).

Google Scholar

[4] C.F. Yuan C.L. Huang, Chinese morpheme database construction and application : Communication of COLIPS, (1997. )(In Chinese).

Google Scholar