Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus

Yuan Sun; Qian Zhao

doi:10.4028/www.scientific.net/AMM.571-572.1202

Paper Titles

Analysis of User Influence Using User Behavior and Random Walk
p.1163

The Design and Realization of Management System Based on Java for College Graduation Practice
p.1168

A Robust and Efficient Password Authentication Scheme Using Smart Cards
p.1172

An Iot-Based Remote Health Monitoring and Management System
p.1176

The Internet of Things Technology Application and the Intelligent Library
p.1180

The Research on the Development of Smart Library
p.1184

Financial Market Risk Overflow Modeling and Inspection Based on Support Vector Machine
p.1189

The Planning System Based on the Postponement Manufacturing Theory
p.1195

Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus
p.1202

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 571-572Tibetan-Chinese Named Entity Extraction Based on...

Tibetan-Chinese Named Entity Extraction Based on Comparable Corpus

Abstract:

Tibetan-Chinese named entity extraction is the foundation of cross language information processing, and provides a basis for machine translation and cross language information retrieval research. In this paper, we use the multi-language links of Wikipedia to obtain Tibetan-Chinese comparable corpus, and combine sentence length, word matching and entity boundary words together to get parallel sentence. Then we extract Tibetan-Chinese named entity from the comparable corpus in three ways: (1) Extracting Natural labeling information. (2) Acquiring the links of Tibetan entries and Chinese entries. (3) Using sequence intersection method, which includes the sentence representation, Chinese named entity recognition and corresponding Tibetan sentences intersection. Finally, the results show the extraction method based on comparable corpus is effective.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 571-572)

Pages:

1202-1205

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.571-572.1202

Citation:

Cite this paper

Online since:

June 2014

Authors:

Yuan Sun, Qian Zhao

Keywords:

Comparable Corpus, Sequence Intersection, Tibetan-Chinese Named Entity, Wikipedia

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Laviosa S: Target, 1997, 9(2): 289-319.

Google Scholar

[2] Information on http: /code. google. com/p/jwpl.

Google Scholar

[3] Hongsi Hu: in Chinese, SJTU, (2013).

Google Scholar

[4] Information on https: /code. google. com/p/opencc/wiki/Install.

Google Scholar

[5] Yuzhong Chen, Baoli Li, Shiwen Yu: in Chinese, (2003).

Google Scholar

[6] Weizhen Ma: in Chinese, Journal of Tibet University, 2012 (5): 70-76.

Google Scholar

[7] Brown P F, Lai J C, Mercer R L: Proceedings of the 29th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1991: 169-176.

DOI: 10.3115/981344.981366

Google Scholar

[8] Chen Wang, Guolong Song, Honglin Wu: in Chinese, Journal of Chinese Information Processing, 2009, 23(1): 38-43.

Google Scholar

[9] Minghua Nuo, Jian Wu, Huidan Liu: in Chinese, Journal of Chinese Information Processing, 2011, 25(3): 112-117.

Google Scholar