Identification of Chinese Unknown Word Based on Finite Multi-List Method
Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.
B. Sun et al., "Identification of Chinese Unknown Word Based on Finite Multi-List Method", Key Engineering Materials, Vols. 474-476, pp. 460-465, 2011