iTagger: Part-of-Speech Tagging Based on SBCB Learning Algorithm

Xiao Dong Zeng; Lidia S. Chao; Derek F. Wong; Liang Ye He

doi:10.4028/www.scientific.net/AMM.284-287.3449

Paper Titles

GPU Computation for Online Realtime Multi-Pattern Matching
p.3428

How to Enhance MIFARE System Security in the Current Crypto-1 Broken Status
p.3433

Implementation of Internet Audio Transmission System via Clock Correction
p.3438

Improved on Date Attachable Electronic Cash
p.3444

iTagger: Part-of-Speech Tagging Based on SBCB Learning Algorithm
p.3449

JokerBot – An Android-Based Botnet
p.3454

Web-Base Virtual Collaboration in E-Learning System
p.3459

Voice Communication Network Quality of Service Estimation and Forecast Based on Cloud Model
p.3463

Visualization of Nanomanipulation Using an Interactive Virtual Environment
p.3468

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 284-287iTagger: Part-of-Speech Tagging Based on SBCB...

iTagger: Part-of-Speech Tagging Based on SBCB Learning Algorithm

Abstract:

The problem of part-of-speech (POS) tagging or disambiguation is a practical issue in natural language processing (NLP) community, especially in the development of a machine translation system. The performance of POS tagging system may interference the subsequent analytical tasks in the translation process, and thereafter affects the overall translation quality. This paper presents a novel POS tagging system, iTagger, which is developed based on Selecting Base Classifiers on Bagging (SBCB) learning algorithm. In this work, the POS tagging task is regarded as a classification problem. Features such as the surrounding context of ambiguous candidates, n-gram information, lexical items and linguistic clues are used and automatically extracted from the annotated corpus. The proposed system has been compared against two state-of-the-art tagging methods, Hidden Markov Model (HMM) and Maximum Entropy. The empirical results conducted on the corpora of (English) Brown corpus, (Portuguese) Tycho Brahe corpus and the Chinese Tree Bank corpus reveal the competitiveness of iTagger. Moreover, the iTagger has been developed and released to the public as library and tool for various development and application purposes.

You might also be interested in these eBooks

Innovation for Applied Science and Technology

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 284-287)

Pages:

3449-3453

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.284-287.3449

Citation:

Cite this paper

Online since:

January 2013

Authors:

Xiao Dong Zeng, Lidia S. Chao, Derek F. Wong, Liang Ye He

Keywords:

Machine Learning (ML), Part-of-Speech Tagging, POS Tagging, SBCB

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] X. D. Zeng, S. Chao, and F. Wong, Optimization of bagging classifiers based on SBCB algorithm, Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC). 1 (2010) 262-267.

DOI: 10.1109/icmlc.2010.5581054

Google Scholar

[2] Z. S. Haris, String Analysis of Sentence Structure, Mouton, The Hague, 1962.

Google Scholar

[3] S. Klein and R. F. Simmons, A computational approach to grammatical coding of English words, Journal of the Assiciation for Computing Machinery. 10 (1963) 334-347.

DOI: 10.1145/321172.321180

Google Scholar

[4] B. B. Greene and G.M. Rubin, Automatic grammatical tagging of English, Department of Linguistics, Brown University, Providence, Rhode Island, 1971.

Google Scholar

[5] B. Merialdo, Tagging English Text with a Probabilistic Model, Computational Linguistics. (1994) 155-171.

Google Scholar

[6] E. Brill, Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging, Computational Linguistics. (1995) 543-566.

Google Scholar

[7] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Building a large annotated corpus of English: The Penn Treebank, Computational linguistics. 19 (1993) 313-330.

DOI: 10.21236/ada273556

Google Scholar

[8] Charlotte Galves and Helena Britto, A Construção do Corpus Anotado do Português Histórico Tycho Brahe: o sistema de anotação morfológica, in IV PROPOR, Evora: University of Evora. (1999) 55-67.

DOI: 10.22481/rbba.v8i1.5585

Google Scholar

[9] N. Xue, F. Xia, F. D. Chiou, and M. Palmer, The Penn Chinese Treebank: Phrase structure annotation of a large corpus, Natural Language Engineering. 11 (2005) 207-238.

DOI: 10.1017/s135132490400364x

Google Scholar

[10] T. Brants, TnT: a statistical part-of-speech tagger, Proceedings of the sixth conference on Applied natural language processing. (2000) 224-231.

DOI: 10.3115/974147.974178

Google Scholar