A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives

Keon Myung Lee

doi:10.4028/www.scientific.net/AMM.263-266.1341

Paper Titles

Study on Intelligent Monitoring and Controlling System in Oil Drilling Optimization
p.1319

The Simulation Study on Environmental Field of Livestock and Poultry Establishments under Two Winds Conditions
p.1323

Research on Dynamic Micro-Extended-Analog-Computer Array Controller
p.1327

An Embedded Optical Vector Matrix Multiplication Processor
p.1334

A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives
p.1341

A Semantic-Oriented Method for Ontology Merging
p.1347

Application of Visual Basic and FORTRAN Mixed-Language Programming to Visualization Technology for Hydro-Meteorological Model
p.1352

Campus Network Upgrade and Optimization Based on OPNET
p.1356

Comparative Research of Two XML Document Storage Modes Based on Relational Database
p.1360

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 263-266A Projection-Based Locality-Sensitive Hashing...

A Projection-Based Locality-Sensitive Hashing Technique for Reducing False Negatives

Abstract:

It is challenging to efficiently find similar pairs of objects when the number of objects is huge. The locality-sensitive hashing techniques have been developed to address this issue. They employ the hash functions to map objects into buckets, where similar objects have high chances to fall into the same buckets. This paper is concerned with a locality-sensitive hashing technique, the projection-based method, which is applicable to the Euclidean distance-based similar pair identification problem. It proposes an extended method which allows an object to be hashed to more than one bucket by introducing additional hashing functions. From the experimental studies, it has been shown that the proposed method could provide better performance compared to the projection-based method.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 263-266)

Pages:

1341-1346

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.263-266.1341

Citation:

Cite this paper

Online since:

December 2012

Authors:

Keon Myung Lee*

Keywords:

Continuous Data, Data Analysis, Locality Sensitive Hashing, Similar Pair Identification

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] A. Rajaraman and J. D. Ullman: Mining of Massive Datasets, Cambridge University Press (2012).

Google Scholar

[2] U. Manber: Finding similar files in a large file system, Proc. USENIX Conference (1994) 1–10.

Google Scholar

[3] A. Z. Broder: On the resemblance and containment of documents, Proc. Compression and Complexity of Sequence (1997) 21–29.

Google Scholar

[4] A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher: Min-wise independent permutations, ACM Symposium on Theory of Computing (1998) 327–336.

DOI: 10.1145/276698.276781

Google Scholar

[5] A. Andoni and P. Indyk: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Comm. ACM, 51(1) (2008) 117–122.

DOI: 10.1145/1327452.1327494

Google Scholar

[6] M. S. Charika: Similarity estimation techniques from rounding algorithms, ACM Symposium on Theory of Computing (2002) 380–388.

Google Scholar

[7] K. M. Lee, K. M. Lee: Fuzzy Technique-based Identification of Close and Distant Clusters in Clustering, Int. J. of Fuzzy Logic and Intelligent Systems 11(3) (2011) 165–170.

DOI: 10.5391/ijfis.2011.11.3.165

Google Scholar

[8] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni: Locality-sensitive hashing scheme based on p-stable distribution, Symp. on Computational Geometry (2004) 253–262.

DOI: 10.1145/997817.997857

Google Scholar

[9] M. Theodbald, J. Siddhaarth, and A. Paepcke: SpotSigs - robust and efficient near duplicate detection in large web collections, 31st Annual ACM SIGIR Conference, Singapore, July (2008).

DOI: 10.1145/1390334.1390431

Google Scholar

[10] S. Chauddhuri, V. Ganti, and R. Kaushik: A primitive operattor for similarity joins in data cleaning, Proc. Int. Conf. on Data Engineering (2006).

DOI: 10.1109/icde.2006.9

Google Scholar

[11] F. Hao, J. Daugman, and P. Zielinski: A fast search algorithm for a large fuzzy database, IEEE Trans. on Information Forensics and Security 3(2) (2008).

DOI: 10.1109/tifs.2008.920726

Google Scholar

[12] M. Potthast and B. Stein: New issues in near-duplicate detection, Data Analysis, Machine Learning and Applications, Springer (2008) 601–609.

DOI: 10.1007/978-3-540-78246-9_71

Google Scholar

[13] A. Frank, A. Asuncion: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2010).

Google Scholar