Research on Bilingual Corpus Based Machine Translation

Shuang Wang

doi:10.4028/www.scientific.net/AMM.687-691.1683

Paper Titles

The Differential Game Analysis on the Construction Problem of the Public Libraries
p.1662

Empirical Evidence of Hurst Exponent Estimation Wavelet Based
p.1668

Event Probability Based Priority Filter for Efficient Event Matching
p.1672

Research on the Multi Object Polymorphism in C++ Based on Display Management Overloaded Set
p.1679

Research on Bilingual Corpus Based Machine Translation
p.1683

Research and Application of Network Aided Translation Method
p.1687

Advances of Key Technology for Machine Translation and its Applications
p.1691

The Application and Study on Intelligent Real-Time Machine Translation Technology
p.1695

Study on Key Technique for Mechanical Engineering Based on Machine Translation
p.1700

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 687-691Research on Bilingual Corpus Based Machine...

Research on Bilingual Corpus Based Machine Translation

Abstract:

This thesis proposes several methods for bilingual corpus form different websites, such as Automatic acquisition of bilingual corpus base on "iciba" web, CNKI and Patent network. It introduced methods, procedures of the acquisition of a variety of corpus. We proposed different methods to obtain the bilingual corpus for different characteristics of different sites, and achieved fast and accurate automatic access of a large-scale bilingual corpus. When we obtain the bilingual corpus based on "iciba" web, the main method is Nutch crawler, which is relatively good, and has an accurate retrieve and a good correlation. In addition, we give up the idea of bilingual corpus obtained from the entire Internet, but we use an entirely new access, that is to access to the basic information of scholarly thesis’s in the CNKI to obtain the large-scale high-quality English-Chinese bilingual corpus. We obtain GB level of large-scale bilingual aligned corpus in the end, which is very accurate by the manual evaluation. And the corpus makes preparation for the further cross-language information retrieval research.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 687-691)

Pages:

1683-1686

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.687-691.1683

Citation:

Cite this paper

Online since:

November 2014

Authors:

Shuang Wang

Keywords:

Corpus, Statistical Machine Translation, Tree-Tree Translation Model, Word Alignment

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Yu-Chun Wang, Richard Tzong-Han Tsai, Wen-Lian Hsu. Web-based pattern learning for named entity translation in Korean-Chinese cross-language information retrieval. Expert Systems with Applications: An International Journal. 2009, 36 (2): 3990-3995.

DOI: 10.1016/j.eswa.2008.02.067

Google Scholar

[2] Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), 2005: 363-370.

DOI: 10.3115/1219840.1219885

Google Scholar

[3] Hai Zhao and Chunyu Kit. Exploiting unlabeled text with different unsupervised segmentation criteria for Chinese word segmentation. In Research in Computing Science, 2008, 33(a): 93-104.

Google Scholar

[4] Chun-Jen Lee, Jason S. Chang, Jyh-Shing R. Jang. Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources. ACM Transactions on Asian Language Information Processing (TALIP), 2006, 5(2): 121-145.

DOI: 10.1145/1165255.1165257

Google Scholar