Identification of Chinese Unknown Word Based on Finite Multi-List Method

Abstract:

Article Preview

Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Edited by:

Garry Zhu

Pages:

460-465

DOI:

10.4028/www.scientific.net/KEM.474-476.460

Citation:

B. Sun et al., "Identification of Chinese Unknown Word Based on Finite Multi-List Method", Key Engineering Materials, Vols. 474-476, pp. 460-465, 2011

Online since:

April 2011

Export:

Price:

$35.00

In order to see related information, you need to Login.

In order to see related information, you need to Login.