Identification and Analysis of Single- and Multiple-Region Mitotic Protein Complexes by Grouping Gene Ontology Terms

Article Preview

Abstract:

Many mitotic proteins are assembled into protein super complexes in three regions - midbody, centrosome and kinetochore (MCK) - with distinctive roles in modulating the mitosis process. However, more than 16% of the mitotic proteins are in multiple regions. Advance identification of mitotic proteins will be helpful to realize the molecular regulatory mechanisms of this organelle. Few ensemble-classifier methods can solve this problem but these methods often fuse various complementary features. In which, Gene ontology (GO) terms play an important role but the GO-term search space is massive and sparse. This motives this work to present an easily implemented method, namely mMck-GO, by identifying a small number of GO terms with support vector machine (SVM) and k-nearest neighbor (KNN) in predicting single-and multiple-region MCK proteins. The mMck-GO method using a simple grouping scheme based on a SVM classifier assembles the GO terms into several groups according to their numbers of annotated proteins in the training dataset, and then measures which top-grouped GO terms performs the best. A new MCK protein dataset containing 701 (611 single-and 90 multiple-region) is established in this work. None of the MCK proteins has a 25% pair-wise sequence identity with any other proteins in the same region. When performing on this dataset, we find that the GO term with the maximum annotation number annotates 49.2% of the training protein sequences; contrarily, 56.5% of the GO terms annotate single one protein sequence. This shows the sparse character of GO terms and the effectiveness of top-grouped GO terms in distinguishing MCK proteins. Accordingly, a small group of top 134 GO terms is identified and mMck-GO fuses the GO terms with amino acid composition (AAC) as input features to yield and independent-testing accuracies of 71.66% and 69.18%, respectively. Top 30 GO terms contain eight, eight, and 14 GO terms belonging to molecular function, biological process and cellular component branches, respectively. The 14 GO terms in cellular-component ontology in addition to centrosome and kinetochore are reverent to subcellular compartments, microtubule, membrane, and spindle, where GO:0005737 (cytoplasm) is ranked first. The eight GO terms enabling molecular functions comprise GO:0005515 (protein binding), GO:0000166 (nucleotide binding), and GO:0005524 (ATP binding). Most of the eight GO terms in biological-process ontology are reverent to cell cycle, cell division and mitosis but two GO terms, GO:0045449 and GO:0045449, are reverent to regulation of transcription and transport processes, which helps us to clarify the molecular regulatory mechanisms of this organelle. The top-grouped GO terms can be as an indispensable feature set when concerning other feature types to solve multiple-class problems in the investigation of biological functions.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

277-285

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] N. J. Ganem, S. A. Godinho and D. Pellman, Nature 460 (2009) 278.

Google Scholar

[2] T. Sakuno, K. Tada and Y. Watanabe, Nature 458 (2009) 852.

Google Scholar

[3] C. Pohl and S. Jentsch, Nat Cell Biol 11 (2009) 65.

Google Scholar

[4] C. Pohl and S. Jentsch, Cell 132 (2008) 832.

Google Scholar

[5] A. R. Skop, H. Liu, J. Yates, B. J. Meyer and R. Heald, Science 305 (2004) 61.

Google Scholar

[6] S. L. Jaspersen and M. Winey, Annual Review of Cell and Developmental Biology 20 (2004) 1.

Google Scholar

[7] T. Sakuno, K. Tada and Y. Watanabe, Nature 458 (2009) 852.

Google Scholar

[8] E. A. Nigg and T. Stearns, Nat Cell Biol 13 (2011) 1154.

Google Scholar

[9] I. M. Cheeseman and A. Desai, Nat. Rev. Mol. Cell Biol. 9 (2008) 33.

Google Scholar

[10] X. Wan, R. P. O'Quinn, H. L. Pierce, A. P. Joglekar, W. E. Gall, J. G. DeLuca, C. W. Carroll, S. T. Liu, T. J. Yen, B. F. McEwen, P. T. Stukenberg, A. Desai and E. D. Salmon, Cell 137 (2009) 672.

DOI: 10.1016/j.cell.2009.03.035

Google Scholar

[11] J. Ren, Z. Liu, X. Gao, C. Jin, M. Ye, H. Zou, L. Wen, Z. Zhang, Y. Xue and X. Yao, Nucleic Acids Research (2009).

Google Scholar

[12] K. C. Chou, Z. C. Wu and X. Xiao, PLoS One 6 (2011).

Google Scholar

[13] L. Li, Y. Zhang, L. Zou, C. Li, B. Yu, X. Zheng and Y. Zhou, PLoS ONE 7 (2012) e31057.

Google Scholar

[14] K. C. Chou and H. B. Shen, PLoS ONE 5 (2010) e9931.

Google Scholar

[15] K. -C. Chou and H. -B. Shen, PLoS ONE 5 (2010) e11335.

Google Scholar

[16] H. -B. Shen and K. -C. Chou, Journal of Theoretical Biology 264 (2010) 326.

Google Scholar

[17] X. Xiao, Z. -C. Wu and K. -C. Chou, PLoS ONE 6 (2011) e20592.

Google Scholar

[18] Z. Lei and Y. Dai, BMC Bioinformatics (2006) 491.

Google Scholar

[19] S. Mei, W. Fei and S. Zhou, BMC Bioinformatics 12 (2011) 44.

Google Scholar

[20] W. L. Huang, C. W. Tung, S. W. Ho, S. F. Hwang and S. Y. Ho, BMC Bioinformatics. 9 (2008) 80.

Google Scholar

[21] W. L. Huang, C. W. Tung, H. L. Huang and S. Y. Ho, BioSystems (2009).

Google Scholar

[22] K. C. Chou and H. B. Shen, Journal Proteome Research (2007).

Google Scholar

[23] W. Chen and H. Lin, Biochemical and Biophysical Research Communications 401 (2010) 382.

Google Scholar

[24] W. -L. Huang, Journal of Theoretical Biology 312 (2012) 105.

Google Scholar

[25] G. L. Wang and R. L. Dunbrack Jr., Bioinformatics 19 (2003) 1589.

Google Scholar

[26] S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman, J. Mol. Biol. 215 (1990) 403.

Google Scholar

[27] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, Nucleic Acids Res. 25 (1997) 3389.

Google Scholar

[28] C. C. Chang and C. J. Lin, (2001).

Google Scholar

[29] B. Efron and G. Gong, The American Statistician 37 (1983) 36.

Google Scholar