Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus

Article Preview

Abstract:

Tibetan-Chinese named entity extraction is the foundation of cross language information processing, and provides a basis for machine translation and cross language information retrieval research. In this paper, we use the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combine sentence length, word matching and entity boundary words together to get parallel sentence. Then we extract Tibetan-Chinese named entity from the comparable corpus in three ways: (1) Extracting Natural labeling information. (2) Acquiring the links of Tibetan entries and Chinese entries. (3) Using sequence intersection method, which includes the sentence representation, Chinese named entity recognition and corresponding Tibetan sentences intersection. Finally, the results show the extraction method based on comparable corpus is effective.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1202-1205

Citation:

Online since:

June 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Laviosa S: Target, 1997, 9(2): 289-319.

Google Scholar

[2] Information on http: /code. google. com/p/jwpl.

Google Scholar

[3] Hongsi Hu: in Chinese, SJTU, (2013).

Google Scholar

[4] Information on https: /code. google. com/p/opencc/wiki/Install.

Google Scholar

[5] Yuzhong Chen, Baoli Li, Shiwen Yu: in Chinese, (2003).

Google Scholar

[6] Weizhen Ma: in Chinese, Journal of Tibet University, 2012 (5): 70-76.

Google Scholar

[7] Brown P F, Lai J C, Mercer R L: Proceedings of the 29th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1991: 169-176.

DOI: 10.3115/981344.981366

Google Scholar

[8] Chen Wang, Guolong Song, Honglin Wu: in Chinese, Journal of Chinese Information Processing, 2009, 23(1): 38-43.

Google Scholar

[9] Minghua Nuo, Jian Wu, Huidan Liu: in Chinese, Journal of Chinese Information Processing, 2011, 25(3): 112-117.

Google Scholar