Chinese New Word Identification Using N-Gram and PPM Models

Article Preview

Abstract:

New word identification is one of the difficult problems of the Chinese information processing. This paper presents a new method to identify new words. First of all, the text is segmented using N-Gram; then PPM is used to identify the new words which are in the text; finally, the new identified words are added to update the dictionary using LRU. Compared with three well-known word segmentation systems, the experimental results show that this method can improve the precision and recall rate of new word identification to a certain extent.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

612-616

Citation:

Online since:

October 2011

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Yiling ZENG, Hongbo XU. Research on Internet hotspot information detection[J]. Journal on Communications, 2007, 28(12): 141-146.

Google Scholar

[2] Haijun ZHANG, Shumin SHI, Chaoyong ZHU, et al. Survey of Chinese New Words Identification [J]. Computer Science, 2010, 37(3): 6-10.

Google Scholar

[3] Xuming WANG, Wenquan YANG. Contemporary on Chinese new words [J]. Chinese Language Learning, 2009, (1): 97-104.

Google Scholar

[4] Kun WANG, Chengqing ZONG, Keh-Yih SU. A character-based joint model for Chinese word segmentation [C]. COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics, 2010, 1173-1181.

Google Scholar

[5] Xiao SUN, De-gen HUNG, Fu-ji REN. Chinese New Word Identification: A Latent Discriminative Model with Global Features [J]. Journal of Computer Science & Technology, 2011, 26(1): 14-24.

DOI: 10.1007/s11390-011-9411-z

Google Scholar

[6] Daniel ZENG, Donghua WEI, Michael CHAU and Feiyue WANG. Domain-specific Chinese word segmentation using suffix tree and mutual information [J]. Information Systems Frontiers, 2011, 13(1): 115-125.

DOI: 10.1007/s10796-010-9278-5

Google Scholar

[6] Dun LI, Fu-yuan CAO, Yuan-da CAO, et al. Internet-oriented New Word Identification [J]. Journal of Computational Information System, 2007, 3(6): 2229-2234.

Google Scholar

[7] JG Cleary, IH Witten. Data compression using adaptive coding and partial string matching[C]. IEEE Trans on Communications 32 (1984): 396-402.

DOI: 10.1109/tcom.1984.1096090

Google Scholar