Finding Out Biological Terms from Texts with CRFs for Reinforcement Learning

Article Preview

Abstract:

The rapid growth of biological texts promotes the study of text mining which focuses on mining biological knowledge in various unstructured documents. Meanwhile, most biological text mining efforts are based on identifying biological terms such as gene and protein names. Therefore, how to identify biological terms effectively from text has become one of the important problems in bioinformatics. Conditional random fields (CRFs), an important machine learning algorithm, are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Here we use CRFs in a class of temporal learning algorithms, reinforcement learning. Consequently the labels are actions that update the environment and affect the next observation. As a result, from the view of reinforcement learning, CRFs provide a way to model joint actions in a decentralized Markov decision process, which define how agents can communicate with each other to choose the optimal joint action. We use GENIA corpus to carry on training and testing the proposed approach. The result showed the system could find out biological terms from texts effectively. We get average precision rate=90.8%, average recall rate=90.6%, and average F1 rate=90.6% on six classes of biological terms. The results are pretty better than many other biological named entity recognition systems.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1345-1350

Citation:

Online since:

September 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Jussi Kujala, Timo Aho, Tapio Elomaa, A Walk from 2-Norm SVM to 1-Norm SVM, icdm, pp.836-841, 2009 Ninth IEEE International Conference on Data Mining, (2009).

DOI: 10.1109/icdm.2009.100

Google Scholar

[2] Kilbridge, KL, Fraser, G, Krahn, M et al. Lack of comprehension of common prostate biological terms in an underserved population. [J]. Journal of Clinical Oncology, 2009, 27(12): 2015-(2021).

DOI: 10.1200/jco.2008.17.3468

Google Scholar

[3] Zeyuan Allen Zhu, Weizhu Chen, Gang Wang, Chenguang Zhu, Zheng Chen, P-packSVM: Parallel Primal grAdient desCent Kernel SVM, icdm, pp.677-686, 2009 Ninth IEEE International Conference on Data Mining, (2009).

DOI: 10.1109/icdm.2009.29

Google Scholar

[4] Jeong, M., Lee, G. G. Triangular-Chain Conditional Random Fields[J]. IEEE transactions on audio, speech and language processing, 2008, 16(7): 1287-1302.

DOI: 10.1109/tasl.2008.925143

Google Scholar

[5] Yang Jin1, Ryan T McDonald, Kevin Lerman, Mark A Mandel, Steven Carroll, Mark Y Liberman, Fernando C Pereira, Raymond S Winters3 and Peter S White*, Automated recognition of malignancy mentions in biological literature, BMC Bioinformatics 2006, 7.

DOI: 10.1186/1471-2105-7-492

Google Scholar

[6] Vijay Sundar Ram R, Akilandeswari A and Sobha Lalitha Devi, Linguistic Features for Named Entity Recognition Using CRFs, , 2010 International Conference on Asian Language Processing.

DOI: 10.1109/ialp.2010.41

Google Scholar

[7] John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-01, (2001).

DOI: 10.1145/1015330.1015422

Google Scholar

[8] M. Narayanaswamy, K. E. Ravikumar, K. Vijay- Shanker. A biological named entity recognizer. Pacific Symposium on Biocomputing, (2003).

DOI: 10.1142/9789812776303_0040

Google Scholar

[9] L. Tanabe, W. J. Wilbur. Tagging gene and protein names in biological text. Bioinformatics 18(8), (2002).

Google Scholar

[10] Hifny, Y., Renals, S. Speech Recognition Using Augmented Conditional Random Fields[J]. IEEE transactions on audio, speech and language processing, 2009, 17(2): 354-365.

DOI: 10.1109/tasl.2008.2010286

Google Scholar

[11] Xiong, Y, Zhu, J, Huang, H et al. Minimum tag error for discriminative training of conditional random fields[J]. Information Sciences, 2009, 179(1/2): 169-179.

DOI: 10.1016/j.ins.2008.09.018

Google Scholar

[12] Artan, Y . Prostate Biological Localization With Multispectral MRI Using Cost-Sensitive Support Vector Machines and Conditional Random Fields[J]. IEEE Transactions on Image Processing, 2010, 19(9).

DOI: 10.1109/tip.2010.2048612

Google Scholar

[13] Cuiqin Hou, Licheng Jiao. Selecting features of linear-chain conditional random fields via greedy stage-wise algorithms[J]. Pattern recognition letters, 2010, 31(2).

DOI: 10.1016/j.patrec.2009.09.025

Google Scholar

[14] Wei Liu, Jianxun Zeng. Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model[J]. Journal of Software, 2011, 6(8).

DOI: 10.4304/jsw.6.8.1409-1416

Google Scholar