MicroRNA Prediction Based on Sample Classification Imbalance

Article Preview

Abstract:

MicroRNAs (miRNAs) play important regulatory roles in animals and plants by targeting mRNA for cleavage or translational repression. The main methods of predicting miRNA are biological experimental approaches and computational approaches. MiRNAs that have very low expression levels or that are expressed at specific stage are difficult to find by biological experiments. Computational approaches, especially machine learning approaches, can effectively overcome these difficulties. SVM (Support vector machine), which is one of the effective machine learning approaches, has a good performance on miRNA prediction. At present, the number of miRNA precursors that are experimentally validated is limited; however, the number of the sequence segment, which is similar to real miRNA precursors, is up to millions and millions. This caused classification imbalance when the samples are learned in the SVM. In this paper, the authors applied ensemble learning to solve this problem and achieve satisfactory performance.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1252-1257

Citation:

Online since:

July 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] E. Berezikov, E. Cuppen, R.H.A. Plasterk. Approaches to microRNA discovery[J]. Nature Genetics, 2006, 38(6s): S2-S7.

DOI: 10.1038/ng1794

Google Scholar

[2] V.N. Kim,J. -W. Nam. Genomics of microRNA[J]. Trends in Genetics, 2006, 22(3): 165-173.

Google Scholar

[3] C.H. Xue, F. Li, T. He, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine[J]. BMC Bioinformatics, 2005, 6(1): 310-316.

DOI: 10.1186/1471-2105-6-310

Google Scholar

[4] P. Jiang, H. Wu, W. Wang, et al. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features[J]. Nucleic Acids Research, 2007, 35(Web Server): W339-W344.

DOI: 10.1093/nar/gkm368

Google Scholar

[5] R. Batuwita,V. Palade. microPred: effective classification of pre-miRNAs for human miRNA gene prediction[J]. Bioinformatics, 2009, 25(8): 989-995.

DOI: 10.1093/bioinformatics/btp107

Google Scholar

[6] S. Griffiths-Jones, H.K. Saini, S. van Dongen, et al. miRBase: tools for microRNA genomics[J]. Nucleic Acids Research, 2007, 36(Database): D154-D158.

DOI: 10.1093/nar/gkm952

Google Scholar

[7] K.L.S. Ng,S.K. Mishra. De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures[J]. Bioinformatics, 2007, 23(11): 1321-1330.

DOI: 10.1093/bioinformatics/btm026

Google Scholar

[8] Y. -W. Chen,C. -J. Lin. Combining SVMs with Various Feature Selection Strategies[J]. http: /www. csie. ntu. edu. tw/~cjlin/papers/features. pdf, (2009).

Google Scholar

[9] C. -C. Chang,C. -J. Lin. LIBSVM: A Library for Support Vector Machines[D]. (2012).

Google Scholar

[10] H. -Y. Wang, H. -K. Fan, Z. -A. Yao. et al. Imbalance dataset classification study[J]. 25, 2008, 5: 1301-1303.

Google Scholar

[11] J. -G. Sun. Clustering Al gorithms Research[J]. Journal of Software, 2008, 19(1): 48-61.

Google Scholar