A Novel Method for Text Similarity Calculation

Cai Rui; Li Fei; Chen Bin; Quan Cong

doi:10.4028/www.scientific.net/AMR.660.202

Paper Titles

The DOA Estimation Algorithm Based on a New Decoherence Method
p.179

An Improved Ad Hoc Network Communication Based on Cluster
p.184

Scale and Orientation Adaptive Moving Object Tracking in a Sequence of Imageries
p.190

SLA (Service Level Agreement) Driven Orchestration Based New Methodology for Cloud Computing Services
p.196

A Novel Method for Text Similarity Calculation
p.202

Mobile SSL VPN Based on Port Forwarding
p.207

A Wireless Wormhole Detection Model Based on Packet Leashes
p.212

RETRACTED: Probabilistic, Cacheable Epistemologies
p.217

Construction of Smart Building Skins with Flow and Bend Split Products
p.222

HomeAdvanced Materials ResearchAdvanced Materials Research Vol. 660A Novel Method for Text Similarity Calculation

A Novel Method for Text Similarity Calculation

Abstract:

In view of the fact that traditional vector space model for text similarity calculation which does not take word order into consideration leads to bias, this paper puts forward a longest common subsequence and the traditional vector space model of combining text similarity calculation. This method takes the word order and word frequency information into account, using the texts of the longest common subsequence and substring of their information from all public records and the use of word order and word frequency in the text. The importance of similarity calculation is acknowledged, and the traditional vector space model in the calculation of the weight is used on the word frequency information. Some of the dataset collected through the web crawler are used in the proposed text similarity calculation method for testing, and the results proved the effectivity of the method.

You might also be interested in these eBooks

Future Optical Materials and Circuit Design

View Preview

Info:

Periodical:

Advanced Materials Research (Volume 660)

Pages:

202-206

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.660.202

Citation:

Cite this paper

Online since:

February 2013

Authors:

Cai Rui, Li Fei, Chen Bin, Quan Cong

Keywords:

Feature Vector, Micro-Blog, TDT, Text Similarity

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] J Allan, J Carbonell, G Doddington, J Yamron and Y Yang. Topic Detection and Tracking Pilot Study: Final Report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Virginia: Lansdowne, February 1998, 194-218.

Google Scholar

[2] Chuang, S.L. and L.F. Chien. A Practical Web-Based Approach to Generating Topic Hierarchy for Text Segments. In the 13th ACM Conference on Information and Knowledge Management. (2004).

DOI: 10.1145/1031171.1031193

Google Scholar

[3] Raghavan, V.V. and H. Sever. On the Reuse of Past Optimal Queries. In the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1995).

DOI: 10.1145/215206.215381

Google Scholar

[4] Fitzpatrick, L. and M. Dent. Automatic Feedback Using Past Queries: Social Searching? In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1997. Philadelphia, Pennsylvania, United States.

DOI: 10.1145/258525.258597

Google Scholar

[5] Sahami, M. and T.D. Heilman. A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. In the 15th International Conference on World Wide Web. (2006).

DOI: 10.1145/1135777.1135834

Google Scholar

[6] Rudi, L.C. and M.B. Paul, The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering IEEE Transactions on Knowledge and Data Engineering, 2007. 19(3): 370-383.

DOI: 10.1109/tkde.2007.48

Google Scholar

[7] Zelikovitz, S. and H. Hirsh. Improving Short-Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. In the 17th International Conference on Machine Learning. (2000).

Google Scholar

[8] PENG Jing, YANG DongQing, TANG ShiWei, FU Yan, JIANG HanKui. A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic. CHINESE JOURNAL OF COMPUTERS, 2007, 30(8)：1354—1363.

Google Scholar

[9] A. Passant, T. Hastrup, U. Bojars and J. Breslin, Microblogging: A Semantic Web and Distributed Approach, Proceedings of the 4th Workshop on Scriptingfor the Semantic Web, CEUR Workshop Proceedings, (2008).

DOI: 10.1609/icwsm.v4i1.14067

Google Scholar

[10] Feng Yi, An order-based taxonomy for text similarity, Lecture Notes in Electrical Engineering, v 107 LNEE, pp.1617-1623, 2012, Computer, Informatics, Cybernetics and Applications - Proceedings of the CICA (2011).

DOI: 10.1007/978-94-007-1839-5_174

Google Scholar