Chinese-Uyghur Sentences Alignment Using Multiple Clues

Article Preview

Abstract:

This paper introduces a new method to Chinese-Uyghur sentence alignment, in which a two-step procedure is applied. In the first step, multiple clues such as proper names, technical terms, numbers, punctuation marks, location information and length information are used to generate anchor sentences that satisfy some conditions. In the second step, texts are divided into several segments by using the anchor sentences as boundaries, and then the sentences in each segment are aligned by using a length-based approach. By applying the segmentation technique, the method avoids complex computation and error spreading. Experimental result shows that the accuracy of the method is 95.2% on the average for multi-domain texts.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

4990-4995

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Peter F. Brown, Jennifer C. Lai, and Robert L. Mercer. 1991. Aligning sentences in parallel corpora. In Proceedings of the 29th annual meeting on Association for Computational Linguistics (ACL'91). Association for Computational Linguistics, Stroudsburg, PA, USA, 169-176.

DOI: 10.3115/981344.981366

Google Scholar

[2] William A. Gale and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics. 19, 1 (1993), 75-102.

Google Scholar

[3] Martin Kay and Martin Ro¨scheisen. 1993. Text-translation alignment. Computational Linguistics. 19, 1 (1993), 121-142.

Google Scholar

[4] Stanley F. Chen. 1993. Aligning sentences in bilingual corpora using lexical information. In Proceedings of the 31st annual meeting on Association for Computational Linguistics (ACL '93). Association for Computational Linguistics, Stroudsburg, PA, USA, 9-16.

DOI: 10.3115/981574.981576

Google Scholar

[5] Dekai Wu. 1994. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 80-87.

DOI: 10.3115/981732.981744

Google Scholar

[6] I. Dan Melamed. 1999. Bitext maps and alignment via pattern recognition. Computational Linguistics. 25, 1 (1999), 107-130.

Google Scholar

[7] Michel Simard and Pierre Plamondon. 1998. Bilingual Sentence Alignment: Balancing Robustness and Accuracy. Machine Translation, 13(1), 59-80.

Google Scholar