[1]
A. Rajaraman and J. D. Ullman: Mining of Massive Datasets, Cambridge University Press (2012).
Google Scholar
[2]
U. Manber: Finding similar files in a large file system, Proc. USENIX Conference (1994) 1–10.
Google Scholar
[3]
A. Z. Broder: On the resemblance and containment of documents, Proc. Compression and Complexity of Sequence (1997) 21–29.
Google Scholar
[4]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher: Min-wise independent permutations, ACM Symposium on Theory of Computing (1998) 327–336.
DOI: 10.1145/276698.276781
Google Scholar
[5]
A. Andoni and P. Indyk: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Comm. ACM, 51(1) (2008) 117–122.
DOI: 10.1145/1327452.1327494
Google Scholar
[6]
M. S. Charika: Similarity estimation techniques from rounding algorithms, ACM Symposium on Theory of Computing (2002) 380–388.
Google Scholar
[7]
K. M. Lee, K. M. Lee: Fuzzy Technique-based Identification of Close and Distant Clusters in Clustering, Int. J. of Fuzzy Logic and Intelligent Systems 11(3) (2011) 165–170.
DOI: 10.5391/ijfis.2011.11.3.165
Google Scholar
[8]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni: Locality-sensitive hashing scheme based on p-stable distribution, Symp. on Computational Geometry (2004) 253–262.
DOI: 10.1145/997817.997857
Google Scholar
[9]
M. Theodbald, J. Siddhaarth, and A. Paepcke: SpotSigs - robust and efficient near duplicate detection in large web collections, 31st Annual ACM SIGIR Conference, Singapore, July (2008).
DOI: 10.1145/1390334.1390431
Google Scholar
[10]
S. Chauddhuri, V. Ganti, and R. Kaushik: A primitive operattor for similarity joins in data cleaning, Proc. Int. Conf. on Data Engineering (2006).
DOI: 10.1109/icde.2006.9
Google Scholar
[11]
F. Hao, J. Daugman, and P. Zielinski: A fast search algorithm for a large fuzzy database, IEEE Trans. on Information Forensics and Security 3(2) (2008).
DOI: 10.1109/tifs.2008.920726
Google Scholar
[12]
M. Potthast and B. Stein: New issues in near-duplicate detection, Data Analysis, Machine Learning and Applications, Springer (2008) 601–609.
DOI: 10.1007/978-3-540-78246-9_71
Google Scholar
[13]
A. Frank, A. Asuncion: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2010).
Google Scholar