Apply Language Nature Rhythm to Large Scale Duplicated Text Detection

Fan Chen; Zhi Yong Feng; Geng Zhao

doi:10.4028/www.scientific.net/AMR.457-458.635

Paper Titles

Microstructure and Formation Mechanism of Titanium Carbide Reinforced Ferrous Composite
p.611

Scheduling for the Flexible Job-Shop Problem Based on Genetic Algorithm (GA)
p.616

A Novel Method to Recognise Closely Connected CAPTCHA
p.620

An Approach to Assembly Sequence Planning Using Simulated Annealing
p.628

Apply Language Nature Rhythm to Large Scale Duplicated Text Detection
p.635

The Foundation of Campus Card System Platform Construction and Application Function Analysis
p.641

Calculation of Array Gain in the Reverberation Field
p.644

Color Quantization Based on Gaussian Mixture Mode
p.650

Cooperative Target Allocation for UCAV Team Air-to-Ground Attack Based on Decision Graph Bayesian Optimization Algorithm
p.655

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 457-458Apply Language Nature Rhythm to Large Scale...

Apply Language Nature Rhythm to Large Scale Duplicated Text Detection

Abstract:

It is urgent that detect the duplication in large scale text in the Web. An arithmetic based on language rhythm for text duplication detection is proposed here. Get the nature rhythm marked by punctuations in text and build the rhythm compare matrix to complete the publication detection for each paragraph. This arithmetic is different with the other one which is based on words analysis. And it has a high accuracy and a low complicacy.

You might also be interested in these eBooks

Advanced Materials and Engineering Materials

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 457-458)

Pages:

635-640

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.457-458.635

Citation:

Cite this paper

Online since:

January 2012

Authors:

Fan Chen, Zhi Yong Feng, Geng Zhao

Keywords:

Duplicated Text Detection, Language Nature Rhythm, Punctuation

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] The 25th of China Internet Development Statistics Report http: /www. cnnic. net. cn/html/Dir/2010/01/15/5767. htm . Jan. (2010).

Google Scholar

[2] N. Shivakumar H.G. Molina. Finding Near-replicas of Documents and Servers on the Web. Proceedings of the International Workshop on World Wide Web and Databases, 1998, Valencia, Spain: 204–212.

DOI: 10.1007/10704656_13

Google Scholar

[3] Manber U Finding similar files in a large file system Proeeedings of the Winter USENIX Conferenee1994 1~10.

Google Scholar

[4] Brin S, Davis J, Garcia-Molina H. Copy detection mechanisms for digital documents. In：Proceedings of the ACM SIGMOD Annual Conference. 1995. http: /www-db. stanford. edu/pub/brin/1995/copy. ps.

DOI: 10.1145/223784.223855

Google Scholar

[5] Shivakumar N. Garcia-Molina H. SCAM: A copy detection mechanism for digital documents. In: Proceedings of the 2nd International Conference in Theory and Practice of Digital Libraries (DL'95). 1995. http: /www-db. stanford. edu/~shiva/publns. html.

DOI: 10.1145/226931.226961

Google Scholar

[6] Heintze N. Scalable document fingerprinting. In: Proceedings of the 2nd USENIX Workshop on Electronic Commerce. 1996. http: /www. cs. cmu. edu/afs/cs/user/nch/www/koala/main. html.

Google Scholar

[7] Si A, Leong HV, Lau RWH. CHECK: A document plagiarism detection system. In: Proceedings of the ACM Symposium for Applied Computing. 1997. 70~77. http: /www. acm. org/pubs/citations/proceedings/ sac/331697/p.70-si.

DOI: 10.1145/331697.335176

Google Scholar

[8] Monostori K, Zaslavsky A, Schmidt H. MatchDetectReveal: Finding overlapping and similar digital documents. In: Proceedings of the Information Resources Management Association International Conference (IRMA2000). 2000. http: /www. csse. monash. edu. au/projects/MDR/papers.

Google Scholar

[9] Song QB, Shen JY. On illegal coping and distributing detection mechanism for digital goods. Journal of Computer Research and Development, 2001, 38(1): 121~125 (in Chinese with English abstract).

Google Scholar

[10] WU Pingbo 　CHEN Qunxiu 　MA Liang. The Study on Large Scale Dupl icated Web Pages of Chinese Fast Deletion Algorithm Based on String of Feature Code Journal of Chinese Information Processing Vol. 17 No. 2.

Google Scholar