Chinese New Word Identification Using N-Gram and PPM Models
New word identification is one of the difficult problems of the Chinese information processing. This paper presents a new method to identify new words. First of all, the text is segmented using N-Gram; then PPM is used to identify the new words which are in the text; finally, the new identified words are added to update the dictionary using LRU. Compared with three well-known word segmentation systems, the experimental results show that this method can improve the precision and recall rate of new word identification to a certain extent.
Yongping Zhang, Linhua Zhou and Elwin Mao
D. Li et al., "Chinese New Word Identification Using N-Gram and PPM Models", Applied Mechanics and Materials, Vol. 109, pp. 612-616, 2012