Research on Sentence Segmentation with Conjunctions in Patent Machine Translation

Article Preview

Abstract:

The processing of long sentences is a difficult problem in machine translation. Previous researchers used punctuation to deal with it. In this paper, we presented a rule-based method for sentence segmentation with conjunctions to improve the performance of long sentence machine translation in patent text. We divided conjunctions into different LEVELs according to semantic features of verbs which are before and behind them. Then, we formulated a number of rules based on the LEVELs of conjunctions to segment long Chinese sentence into separated shorter ones. We conducted experiments on 10 intact patent documents which contain 901 conjunctions. Consequently, our method achieves an accuracy of over 89% overall. The result indicates that our method can efficiently improve the performance of long patent sentence translation.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4605-4609

Citation:

Online since:

February 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Zhang Yan. Research and application of the theory, method on Chinese parsing[D]. Beijing: Institute of Automation, Chinese Academy of Sciences, (2003).

Google Scholar

[2] Jin Yaohong. Algorithm to Improve Long Patent Sentence Machine Translation[J]. Chengdu: Application Research of Computers, 2011 28(8): 2893-2896.

Google Scholar

[3] Jin Yaohong. Hybrid-strategy Method Combining Semantic Analysis with Rule-based MT for Patent Machine Translation[J]. Beijing: Computer Engineering and Applications, 2012 48 (1) 29-32.

DOI: 10.1109/nlpke.2010.5587763

Google Scholar

[4] Zhang Quan. The Comparison of Comma between Chinese and English and the Translation Processing of Comma. Proceedings of the 7th computational linguistics joint academic conference[C]. Beijing: Tsinghua University Press, 2003 454-460.

Google Scholar

[5] Huang Heyan, Chen Zhaoxiong. The Hybrid Strategy Processing Approach of Complex Long Sentence[J]. Beijing: Journal of Chinese Information processing, 2002 16 (3): 1-7.

Google Scholar

[6] Li Xing, Zong Chengqing. A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences[J]. Beijing: Journal of Chinese Information processing, 2006 20 (4): 8-15.

Google Scholar

[7] Xue Nianwen, Yang Yaqin. Chinese sentence segmentation as comma classification[C]. Oregon: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, 2011. 631–635.

Google Scholar

[8] Song Qianqian, Zhu Yun, Wang Lixia, Jin Yaohong. A RULE-BASED METHOD FOR COMMAS' DISAMBIGUATION IN CHINESE PATENT TEXT [C]. Hangzhou: 2nd IEEE Conference on Cloud Computing and Intelligence Systems, 2012. 1988-(1992).

DOI: 10.1109/ccis.2012.6664636

Google Scholar

[9] Huang Zengyang. Hierarchical Network of Concepts (HNC) Theory[M]. Beijing: Tsinghua University Press, (1998).

Google Scholar