Suffix Tree Constructing Algorithm for Datasets with Discrete Contents

Zhen Zhong Jin; Zheng Huang; Hua Zhang

doi:10.4028/www.scientific.net/AMM.733.867

Paper Titles

Optimization Analysis of Supply Logistics in Automobile Manufacturing Based on Internet of Things
p.846

Design and Implementation of Meteorological Observation Instrument Virtual Assembly System Using Blender
p.850

An Optimized Low-Power-Consumption Tracking Algorithm Based on Navigation Message Structure for Weak Signals
p.854

Calculations for the Number of Titanic Sets
p.863

Suffix Tree Constructing Algorithm for Datasets with Discrete Contents
p.867

A Kriging Model-Based Approximation Technique for Predictive Design
p.871

Solving Computational Expensive Problems with a Response Surface Surrogate Models
p.876

Surrogating for High Dimensional Computationally Expensive Multi-Modal Functions with Elliptical Basis Function Models
p.880

Study on Approximation Methods with Chebyshev Polynomial Models
p.885

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vol. 733Suffix Tree Constructing Algorithm for Datasets...

Suffix Tree Constructing Algorithm for Datasets with Discrete Contents

Abstract:

The suffix tree is a useful data structure constructed for indexing strings. However, when it comes to large datasets of discrete contents, most existing algorithms become very inefficient. Discrete datasets are need to be indexed in many fields like record analysis, data analyze in sensor network, association analysis etc. This paper presents an algorithm, STD, which stands for Suffix Tree for Discrete contents, that performs very efficiently with discrete input datasets. It imports several wonderful intermediate data structures for discrete strings; we also take care of the situation that the discrete input strings have similar characteristics. Moreover, STD keeps the advantages of existing implementations which are for successive input strings. Experiments were taken to evaluate the performance and shown that the method works well.

You might also be interested in these eBooks

Engineering Decisions for Industrial Development

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volume 733)

Pages:

867-870

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.733.867

Citation:

Cite this paper

Online since:

February 2015

Authors:

Zhen Zhong Jin, Zheng Huang, Hua Zhang

Keywords:

Discrete Content, Indexing, Suffix Tree

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] McCreight E M: A space-economical suffix tree construction algorithm [J]. Journal of the ACM (JACM), Vol. 23 (1976) No. 2, p.262.

DOI: 10.1145/321941.321946

Google Scholar

[2] Charras C, Lecroq T: Handbook of exact string matching algorithms [M] (King's College Publications, 2004).

Google Scholar

[3] Mansour E, Allam A, Skiadopoulos S, et al: ERA: efficient serial and parallel suffix tree construction for very long strings [J]. Proceedings of the VLDB Endowment, Vol. 5 (2011) No. 1, p.49.

DOI: 10.14778/2047485.2047490

Google Scholar

[4] Ukkonen E: On-line construction of suffix trees [J]. Algorithmica, Vol. 14 (1995) No. 3, p.249.

DOI: 10.1007/bf01206331

Google Scholar

[5] Hunt E, Atkinson M P, Irving R W: Database indexing for large DNA and protein sequence collections [J]. The VLDB Journal, Vol. 11 (2002) No. 3, p.256.

DOI: 10.1007/s007780200064

Google Scholar

[6] Tata S, Hankins R A, Patel J M: Practical suffix tree construction[C]. Proceedings of the Thirtieth international conference on Very large data bases (Toronto, Canada, August 31-September 3, 2004), Vol. 30 (2004) p.36.

DOI: 10.1016/b978-012088469-8.50007-3

Google Scholar

[7] Tian Y, Tata S, Hankins R A, et al: Practical methods for constructing suffix trees [J]. The VLDB Journal, Vol. 14 (2005), No. 3, p.281.

DOI: 10.1007/s00778-005-0154-8

Google Scholar

[8] Phoophakdee B, Zaki M J: Genome-scale disk-based suffix tree indexing[C]. Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, (2007) p.833.

DOI: 10.1145/1247480.1247572

Google Scholar

[9] Phoophakdee B, Zaki M J: TRELLIS+: an effective approach for indexing genome-scale sequences using suffix trees[C]. Pacific Symposium on Biocomputing, Vol. 13 (2008), p.90.

DOI: 10.1142/9789812776136_0011

Google Scholar

[10] Ghoting A, Makarychev K: Serial and parallel methods for I/O efficient suffix tree construction[C]. Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, (2009) p.827.

DOI: 10.1145/1559845.1559931

Google Scholar

[11] Barsky M, Stege U, Thomo A, et al: Suffix trees for very large genomic sequences[C]. Proceedings of the 18th ACM conference on Information and knowledge management. ACM, (2009), p.1417.

DOI: 10.1145/1645953.1646134

Google Scholar

[12] Weiner P. Linear pattern matching algorithms[C], Switching and Automata Theory, 1973. SWAT'08. IEEE Conference Record of 14th Annual Symposium on. IEEE, (1973) p.1.

DOI: 10.1109/swat.1973.13

Google Scholar

[13] Schürmann K B, Stoye J: Suffix tree construction and storage with limited main memory [J]. (2003).

Google Scholar

[14] Japp R: The top-compressed suffix tree[C]. 21st Annual British National Conference on Databases. (2004).

Google Scholar