A Practical Algorithm for Plagiarism Detection Based on Search Engine

Article Preview

Abstract:

In the current background of quantitative academic evaluation system in China, many scholars, graduate students are tend to plagiarize from web. To detect plagiarism efficiently, there should be a massive text collection which could be accessed easily, cheaply and quickly. Some algorithms refer to the quickly developing online database, such as Chinese CNKI database. We introduced an algorithm to detect plagiarism quantitatively based on natural language segment and precise retrieval function of search engine. The source text is segmented into sentences with punctuation marks. Each sentence is searched in search engine as a single keyword with quotes. The similarity between source file and web information is computed by the ratio of matched sentences return by search engine. The experiments show that this algorithm is practical and feasible.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2287-2290

Citation:

Online since:

July 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] U. Manber, Finding Similar Files in a Large File System, Usenix Winter 1994 Technical Conference, San Francisco, January 1994, pp.1-10.

Google Scholar

[2] S. Brin, J. Davis, H. Garcia-Molina, Copy Detection Mechanisms for Digital Documents, Proceedings of the ACM SIGMOD Annual Conference, San Francisco, CA, May (1995).

DOI: 10.1145/223784.223855

Google Scholar

[3] N. Shivakumar and H. Garca-Molina, Finding near-replicas of documents on the web. WebDB 1998. pp: 204-212.

Google Scholar

[4] Cheng Yuzhu, Wu Shuyue, Text similarity computing based on components, Computer Engineering and Design, 2006(18).

Google Scholar

[5] Jin Bo, Shi Yanjun, Teng Hongfei, Document-structure-based copy detection algorithm, Journal of Dalian University of Technology, 2007(1), pp.125-130.

Google Scholar

[6] Qin Xinguo, Research on the Copy Detection Based on the Similarity of Sentences, New Technology of Library and Information Service, 2007(11), pp.63-66.

Google Scholar