A Modified Cosine Similarity for Cross Language Information Retrieval

Article Preview

Abstract:

Since millions of documents are available on the Internet, some documents contain similar content but they are written in different languages by various authors. Unfortunately, the existing search engines do not support to all documents that are relevant to a single language query. Therefore, several researchers have put a huge effort to overcome such a problem. The major problems of a cross language search engine include 1) how to store information in a unify model and represent information of multiple languages documents effectively and 2) how to rank the retrieved multiple language documents and present to a user in the right order. This paper overcomes the first problem using an ontology model and we present a new ranking technique for a cross language information retrieval system (CLIR). Keyword weighting scheme in an ontology and document sections are introduced. Cosine similarity formula is modified to particularly support CLIR. The experimental results show the modified formula obtains more efficient ranking results than the existing method.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 931-932)

Pages:

1348-1352

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] C. Intaraprapan and K. Kesorn, Ranking Technique for Cross-Language (Thai-English) Information Retrieval, " in Proceedings of The 10th International Joint Conference on Computer Science and Software Engineering (JCSSE, 13), 2013, pp.1-6.

Google Scholar

[2] K. Kesorn, Semantic Search: the New Idea of Search Engine and the Future Development, Valaya Anongkorn Review, vol. 2, (2011).

Google Scholar

[3] S. Thoongsup, et al., Thai WordNet construction, in Proceedings of the 7th Workshop on Asian Language Resources, Suntec, Singapore, 2009, pp.139-144.

DOI: 10.3115/1690299.1690319

Google Scholar