An Optimized LCP Table Based Algorithm for Frequent String Mining

Article Preview

Abstract:

Given m databases D1,...,Dm of strings, the purpose of the frequent string mining is to find all strings that fulfill certain constraints of all string databases. In this paper, a useful data structure is proposed to construct suffix and LCP table which can reduce the total space consumption of string mining efficiently. We demonstrate the use of this data structure by optimizing the algorithm proposed by A.Kügel et al [7] and present the improved algorithm. It is achieved that the space consumption in our algorithm is proportional to the length of the largest string of all databases. A set of comprehensive performance experiments shows that the processing rate is enhanced because amount of items are reduced in new data structure.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

653-658

Citation:

Online since:

January 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Fischer J, Heun V, Kramer S. Optimal string mining under frequency constraints. In: Fürnkranz J, Scheffer T, SpiliopoulouM(eds) PKDD. Volume 4213 of lecture notes in computer science. Springer, pp.139-150 (2006).

DOI: 10.1007/11871637_17

Google Scholar

[2] Abouelhoda MI, Kurtz S, Ohlebusch E. Replacing suffix trees with enhanced suffix arrays. J Discrete Algorithms 2(1): 53-86 (2004).

DOI: 10.1016/s1570-8667(03)00065-0

Google Scholar

[3] Maaß MG. Computing suffix links for suffix trees and arrays. Inf Process Lett 101(6): 250-254 (2007).

DOI: 10.1016/j.ipl.2005.12.012

Google Scholar

[4] Jeon JE, Park H, Kim DK. Efficient construction of generalized suffix arrays by merging suffix arrays. J KISS: Comput Syst Theor 32(6): 268-278 (2005).

Google Scholar

[5] Fischer J. Linear frequent string miner and emerging substring miner (PKDD'06). http: /www. bio. ifi. lmu. de/~fischer/frequentLinear. tgz (2007).

Google Scholar

[6] Fischer J, Heun V. A new succinct representation of rmq-information and improvements in the enhanced suffix array. In: Chen B, Paterson M, Zhang G (eds) ESCAPE. Volume 4614 of lecture notes in computer science. Springer, pp.459-470 (2007).

DOI: 10.1007/978-3-540-74450-4_41

Google Scholar

[7] A. K¨ugel and E. Ohlebusch. A space efficient solution to the frequent string mining problem for many databases. Data Mining and Knowledge Discovery, 17(1): 24-38, (2008).

DOI: 10.1007/s10618-008-0110-5

Google Scholar

[8] Hui LCK. Color set size problem with application to string matching. In: Apostolico A (1992).

Google Scholar

[9] NEWT Taxonomy Browser (2007) http: /www. ebi. ac. uk/newt.

Google Scholar