System of Fuzzy Duplicates Detection

Article Preview

Abstract:

In the paper we discuss the problem of fuzzy duplicate detecting. There are given the basic approaches to detection of text duplicates. We review the existing methods of fuzzy duplicate detecting. There is presented algorithm of fuzzy duplicate detection. Algorithm is based on method of shingles. We describe modification of algorithm. We propose to consider not all text of document but its processed and filtered copy. There is presented the structure of system for fuzzy duplicates detection. System checks text duplications in the internal database and in Internet.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1503-1507

Citation:

Online since:

January 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] J.G. Zelenkov, I.V. Segalovich, in: Proceedings of the 9th Scientific Conference Digital Libraries: Advanced Methods and Technologies, Digital Collections" RCDL, 2007, Vol. 1 (2007), P. 166-174.

Google Scholar

[2] A. Broder , in: Compression and Complexity of Sequences (SEQUENCES'97), IEEE Computer Society (1998), pp.21-29.

Google Scholar

[3] A. Kolcz, A. Chowdhury, J. Alspector, in: Proceedings of KDD 2004, Seattle, Washington, USA (2004).

Google Scholar

[4] S. Ilyinsky, M. Kuzmin, A. Melkov, I. Segalovich, in: Proceedings of WWW Conference (2002).

Google Scholar

[5] R.V. Sharapov, E.V. Sharapova, in: Information Systems and Technology, Vol. 6 (2009), pp.74-78.

Google Scholar

[6] N.V. Neelova, A.A. Sychugov, in: Vestnik RGRTU, Vol. 4 (2010), pp.72-78.

Google Scholar

[7] R.V. Sharapov RV, E.V. Sharapova, in: Proceedings of the 13th Scientific Conference Digital libraries: Advanced Methods and Technologies, Digital Collections" RCDL, 2011, CEUR Workshop Proceedings, Vol. 803 (2011), pp.121-126.

Google Scholar