System of Fuzzy Duplicates Detection

Ekaterina V. Sharapova; Ruslan V. Sharapov

doi:10.4028/www.scientific.net/AMM.490-491.1503

Paper Titles

The Research on the Ultrasonic Focusing and TR Thickness Test Technology
p.1481

Investigate the Method on Measurement Transparence of Tissue Engineered Cornea
p.1489

Research on Location-Based Personalized Recommendation System
p.1493

Evaluation of Chinese Vinegar by Electronic Nose
p.1497

System of Fuzzy Duplicates Detection
p.1503

Parallelism Measurement System of Porous Parts Based on Computer Vision
p.1508

Beam Forming of Lamb Waves for Nondestructive Testing of Plates
p.1512

Research and Application of 3D Digital Detection Technology in Reverse Engineering
p.1517

Study of Portable Shaft Power Measurement System
p.1522

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 490-491System of Fuzzy Duplicates Detection

System of Fuzzy Duplicates Detection

Abstract:

In the paper we discuss the problem of fuzzy duplicate detecting. There are given the basic approaches to detection of text duplicates. We review the existing methods of fuzzy duplicate detecting. There is presented algorithm of fuzzy duplicate detection. Algorithm is based on method of shingles. We describe modification of algorithm. We propose to consider not all text of document but its processed and filtered copy. There is presented the structure of system for fuzzy duplicates detection. System checks text duplications in the internal database and in Internet.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 490-491)

Pages:

1503-1507

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.490-491.1503

Citation:

Cite this paper

Online since:

January 2014

Authors:

Ekaterina V. Sharapova, Ruslan V. Sharapov

Keywords:

Duplicate Detection, Fuzzy Duplicate, Shingle, Text

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] J.G. Zelenkov, I.V. Segalovich, in: Proceedings of the 9th Scientific Conference Digital Libraries: Advanced Methods and Technologies, Digital Collections" RCDL, 2007, Vol. 1 (2007), P. 166-174.

Google Scholar

[2] A. Broder , in: Compression and Complexity of Sequences (SEQUENCES'97), IEEE Computer Society (1998), pp.21-29.

Google Scholar

[3] A. Kolcz, A. Chowdhury, J. Alspector, in: Proceedings of KDD 2004, Seattle, Washington, USA (2004).

Google Scholar

[4] S. Ilyinsky, M. Kuzmin, A. Melkov, I. Segalovich, in: Proceedings of WWW Conference (2002).

Google Scholar

[5] R.V. Sharapov, E.V. Sharapova, in: Information Systems and Technology, Vol. 6 (2009), pp.74-78.

Google Scholar

[6] N.V. Neelova, A.A. Sychugov, in: Vestnik RGRTU, Vol. 4 (2010), pp.72-78.

Google Scholar

[7] R.V. Sharapov RV, E.V. Sharapova, in: Proceedings of the 13th Scientific Conference Digital libraries: Advanced Methods and Technologies, Digital Collections" RCDL, 2011, CEUR Workshop Proceedings, Vol. 803 (2011), pp.121-126.

Google Scholar