Paper Titles

Unified Information Modeling for Achieving High-Performance Information Sharing in the Information Grid
p.369

A GPS-Less Cell-Based Localization Technique for Wireless Sensor Networks
p.376

The Design of Inter-Regional Fire and Rescue Mobilization System
p.383

Network Manufacturing Technology Based on Cloud Computing
p.390

A New Document Representation Using a Unified Graph to Document Similarity Search
p.394

Model Checking of Constraint-Based Workflow Based on Linear Temporal Logic
p.401

The Waterfilling Based Subcarrier and Power Allocation for OFDM Cognitive Radio Networks
p.406

Improvement and Research of Lean Production in Car-Seat Assembly Line
p.415

Action Mechanism of Psychological Empowerment on OCBs
p.420

HomeAdvanced Materials ResearchAdvanced Materials Research Vol. 601A New Document Representation Using a Unified...

A New Document Representation Using a Unified Graph to Document Similarity Search

Article Preview

Abstract:

Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.

You might also be interested in these eBooks

Management, Manufacturing and Materials Engineering II

Info:

Periodical:

Advanced Materials Research (Volume 601)

Pages:

394-400

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.601.394

Citation:

Cite this paper

Online since:

December 2012

Authors:

Taeh Wan Kim, Ho Cheol Jeon, Joong Min Choi

Keywords:

Document Similarity Search, Graph Representation, Information Retrieval, Text Mining

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Martínez-Trinidad, J, F. Beltrán-Martínez, B. & Ruiz-Shulcloper, J, A tool to discover the main themes in a Spanish or English document, Expert Systems with Applications, Elsevier, November 2000, pp.319-327.

DOI: 10.1016/s0957-4174(00)00043-9

[2] Berry, M, W. Castellanos, M, Survey of Text Mining: Clustering, Classification, and Retrieval, Springer, 30 September (2007).

[3] Salton, G. Automatic Text Processing, the transformation, analysis, and retrieval of information by computer, Addison-Wesley, (1989).

[4] Baeza-Yates, R. Ribeiro-Neto, B, Modern information retrival, Addison Wesley, (1999).

[5] Deerwester, S., Dumais, S. R, Indexing by latent semantic analysis, Journal of the American Society of Informatio Science, 41(6), 1990, pp.391-407.

DOI: 10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9

[6] Hofmann, T, "Probabilistic latent semantic indexing. In Proceedings of the tweenty-second annual international SIGIR conference.

[7] Blei, D., Ng, A., Jordan, M, Latent Dirichlet allocation, Journal of Machine Learning Research, 2003, pp.993-1022.

[8] Welling, M., Rosen-Zvi, M., Hinton, G, Exponential family harmoniums with an application to information retrieval, Advances in neural information processing systems, vol. 17, 2004, pp.1481-1488.

[9] Horng, Yih-Jen, Chen, Shyi-Ming, Chang, Yu-Chuan, Lee, Chia-Hoang, A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques, IEEE Transaction on Fuzzy Systems, 13(2), 2005, pp.216-228.

DOI: 10.1109/tfuzz.2004.840134

[10] Rldvan, Saracoglu, Tutuncu, Kemal, Allahverdi, Novruz, A fuzzy clustering approach for finding similar documents using a novel similarity measure, Expert Systems with Applications, 33, 1980, pp.600-605.

DOI: 10.1016/j.eswa.2006.06.002

[11] Wan, X., Yang, J., Xiao, J, Towards a unified approach to document similarity search using manifold-ranking of blocks, Information Processing & Management, ScienceDirect, 2008, pp.1032-1048.

DOI: 10.1016/j.ipm.2007.07.012

[12] Rıdvan Saraçoğlu, Kemal Tütüncü, Novruz Allahverdi, A new approach on search for similar documents with multiple categories using fuzzy clustering, Expert Systems with Applications, Volume 34, Issue 4, May 2008, pp.2545-2554.

DOI: 10.1016/j.eswa.2007.04.003

[13] SS Weng, YJ Lin and F. Jen, 'A study on searching for similar documents based on multiple concepts and distribution of concepts, Expert Systems with Applications, 25, 2003, p.355–368.

DOI: 10.1016/s0957-4174(03)00076-9

[14] The Standford Parser. http: /nlp. stanford. edu/software/lex-parser. shtml.

[15] Cormack, G, V. Lhotak, O. Palmer, C, R, Estimating precision by random sampling, ACM/SIGIR, 1999, pp.273-274.

DOI: 10.1145/312624.312692

[16] Robertson, S., Walker, S, Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, In Proc. Of the 17th international ACM/SIGIR conference on research and development in information retrieval, 1994, pp.232-241.

DOI: 10.1007/978-1-4471-2099-5_24

[17] Singhal, A., Buckley, C., Mitra, M, Pivoted document length normalization", In Proceedings of SIGIR, 96.

DOI: 10.1145/3130348.3130365