Physico-Chemically Weighted Kernel for SVM Protein Classification

Article Preview

Abstract:

In this paper, a novel kernel taking into consideration of the physico-chemical properties of amino acids as well as the motif information is proposed to tackle the problem of protein classification. Similarity matrix is constructed based on an AAindex2 substitution matrix which measures the amino acid pair distance. Together with the motif content posing importance on the protein sequences, a new kernel is constructed. Numerical examples indicate that the string-based kernel in conjunction with SVM classifier performs significantly better than the traditional spectrum kernel method.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

385-390

Citation:

Online since:

August 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] A. Krogh, M. Brown, I. Mian, K. Sjolander, and D. Haussler, Hidden Markov models in computational biology: Applicationsto protein modeling, J. Mol. Biol. 235, 1501-1531: (1994).

DOI: 10.1006/jmbi.1994.1104

Google Scholar

[2] E. Eskin, W. Noble, and G.Y. Singer, Protein family classification using sparse Markov transducers, Proc. Eighth. Inter. Conf. on Intelligent Systems for Molecular Biology. 131-135, (2000).

Google Scholar

[3] T. Jaakkola, M. Diekhans, and D. Haussler, A discriminative framework for detecting remote protein homologies, Journal of Computational Biology. 7(1-2), 95-114: (2000).

DOI: 10.1089/10665270050081405

Google Scholar

[4] B. Scholkopf, Kernel methods in computational biology, MIT Press New York: (2004).

Google Scholar

[5] C. Leslie, E. Eskin and W.S. Noble, The spectrum kernel: A string kernel for SVM protein classification, Proceedings of the Pacific Biocomputing Symposium. 564-575, (2002).

DOI: 10.1142/9789812799623_0053

Google Scholar

[6] C. Leslie, E. Eskin, J. Weston and W.S. Noble, Mismatch string kernel for discriminative protein classification, Bioinformatics. 20(4): 467-476, (2003).

DOI: 10.1093/bioinformatics/btg431

Google Scholar

[7] Y.S. Yuan, L. Lin, Q.W. Dong, X.L. Wang and M.H. Li, A protein classification method based on latent semantic analysis, Proceedings of the 2005 IEEE Engineering in Mdeicine and Biology 27th Annl. Conf. 7: 7738-7741, (2005).

DOI: 10.1109/iembs.2005.1616306

Google Scholar

[8] G. Ratsch, S. Sonnenburg, B. Scolkopf, RASE: Recognition of Alternatively Spliced Exons in c. elegans., Bioinformatics 21(suppl I): i369-i377, (2005).

DOI: 10.1093/bioinformatics/bti1053

Google Scholar

[9] B.J.M. Webb-Robertson, K.G. Ratuiste, C.S. Oehmen, Physic ochemical property distributions for accurate and rapid pair-wise protein homology detection, BMC Bioinformatics 11: 145, (2010).

DOI: 10.1186/1471-2105-11-145

Google Scholar

[10] K. Tommi and M. Kanehisa, Analysis of amino acid indices and mutation matrices for sequence comparison and structure, prediction of proteins, Protein Engineering 9(1), 27-36: (1996).

DOI: 10.1093/protein/9.1.27

Google Scholar

[11] B.H. Asa and D. Brutlay, Remote homology detection: a motif based approach, Bioinformatics19(1), 26-33: (2003).

Google Scholar

[12] T. Miyata, S. Miyazawa and T. Yasunaga MIYT790101, J. Mol. Evol. 12, 219-236: (1979).

Google Scholar

[13] R.A. Horn and C.R. Johnson Matrix analysis, Cambridge University Press, (1985).

Google Scholar

[14] Functional Glycomics Gateway, http: /www. functionalglycomics. org.

Google Scholar

[15] Y. Yang, L. Lin, Q. Dong, X. Wang, M. Li, Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties, J. Theor. Biol. 252(1): 145-154, (2008).

DOI: 10.1016/j.jtbi.2008.01.028

Google Scholar