A Novel Efficient Mining Algorithm for Frequent Patterns on Biological Multiple Sequence

Article Preview

Abstract:

In order to overcome the shortcomings of traditional algorithms, the algorithm MSPM was proposed. It used longer patterns for mining, which avoided producing lots of patterns with short length. Meanwhile by the use of prefix tree of primary frequent patterns, we extended the primary patterns which avoided plenty of irrelevant patterns. The experimental results show that MSPM not only improves the performance but also achieves effective mining results.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3697-3701

Citation:

Online since:

December 2010

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Srikant R, Agrawal R. Mining sequential patterns: Generalization and performance improvements[C]. In: Apers PMG, Bouzeghoub M, Gardarin G, eds. Advances in Database Technology, Proc. of the 15th Int'l Conf. on Extending Database Technology. London.

Google Scholar

[2] Apostolico A, Prefarata F. Optimal off-line detection of repetitions in a string[J]. Theoretical Computer Science, 1983, 22(3): 297-315.

DOI: 10.1016/0304-3975(83)90109-3

Google Scholar

[3] Delgrange O, Rivals E. STAR: An algorithm to search for tandem approximate repeats[J]. Bioinformatics, 2004, 20(16): 2812-2820.

DOI: 10.1093/bioinformatics/bth335

Google Scholar

[4] Benson G. Tandem repeats finder: A program to analyze DNA sequences[J]. Nucleic Acids Research, 1999, 27(2): 573-580.

DOI: 10.1093/nar/27.2.573

Google Scholar

[5] Xiong Yun, Zhu Yangyong. BioPM: An Efficient Algorithm for Protein Motif Mining[C]. In: Proc. of ICBBE'07. [S. l. ]: IEEE Press, 2007. 394-397.

Google Scholar

[6] Guo Shun, Jiang Qingshan, Wang Beizhan, Shi Liang. A new pattern mining algorithm for protein sequences[J]. Computer Engineering, 2009. 4, 35(8), pp: 208-210.

Google Scholar

[7] Bateman A, Birney E, Cerruti L, et al. The Pfam Protein Families Database[J]. Nucleic Acids Res., 2002, 30(1): 276-288.

DOI: 10.1093/nar/30.1.276

Google Scholar