Fast Fuzzy Search for Mixed Data Using Locality Sensitive Hashing

Article Preview

Abstract:

The drastic increase in data volume strongly demands efficient search techniques for similar data to queries. It is sometimes useful to specify data of interest with fuzzy constraints. When data objects contain both numerical and categorical attributes, it is usually not easy to define commonly-accepted distance measures between data objects. With no efficient indexing structure, it costs much to search for specific data objects because a linear search needs to be conducted over the whole data set. This paper proposes a method to use locality sensitive hashing technique and fuzzy constrained queries to search for interesting ones from big data. The method builds up a locality sensitive hashing-based indexing structure only with constituting continuous attributes, collects a small number of candidate data objects to which query is examined, and then evaluates their satisfaction degree to fuzzy constrained query so that data objects satisfying the query are determined.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

321-325

Citation:

Online since:

November 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] H. -J. Zimmermann, Fuzzy Set Theory – and its Applications, (4th eds. ), Kluwer Academic Publishers (2001).

Google Scholar

[2] P. Indyk and R. Motwani, Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality, Proc. of STOC1998 (1998).

DOI: 10.1145/276698.276876

Google Scholar

[3] A. Gionis, P. Indyk, and R. Motwani, Similarity Search in High Dimensions via Hashing, Proc. of VLDB1999 (1999).

Google Scholar

[4] K. M. Lee, Locality sensitive hashing with extended partitioning boundaries, Applied Mechanics and Materials, 321-324 , pp.804-807 (2013).

DOI: 10.4028/www.scientific.net/amm.321-324.804

Google Scholar

[5] K. M. Lee, Locality-sensitive Hashing Techniques for Nearest Neighbor Search, Int. J. of Fuzzy Logic and Intell. Syst., 12(4) (2012).

DOI: 10.5391/ijfis.2012.12.4.300

Google Scholar

[6] A. Andoini, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Comm. of ACM, 51(1) (2006).

DOI: 10.1145/1327452.1327494

Google Scholar

[7] K. -M. Lee, K. M, Lee, A locality sensitive hashing technique for categorical data, Applied Mechanics and Materials 241-244 , pp.3159-3164 (2013).

DOI: 10.4028/www.scientific.net/amm.241-244.3159

Google Scholar

[8] K. M. Lee, C.H. Lee, K.M. Lee, Mining frequent common families in trees, Lecture Notes in Computer Science 7694 LNAI , pp.13-22 (2012).

DOI: 10.1007/978-3-642-35455-7_2

Google Scholar

[9] K. M. Lee, A projection-based locality-sensitive hashing technique for reducing false negatives, Applied Mechanics and Materials 263-266 (PART 1) , pp.1341-1346 (2013).

DOI: 10.4028/www.scientific.net/amm.263-266.1341

Google Scholar

[10] K.M. Lee, K.M. Lee, C.H. Lee, Statistical cluster validity indexes to consider cohesion and separation, International Conference on Fuzzy Theory and Its Applications, iFUZZY 2012 , art. no. 6409706 , pp.228-232 (2012).

DOI: 10.1109/ifuzzy.2012.6409706

Google Scholar

[11] Y. Lin, D. Cai, C. Li, Density Sensitive Hashing, arXiv: 1205. 2930 [cs. IR] (2012).

Google Scholar