The Research and Application in Intelligent Document Retrieval Based on Text Quantification and Subject Mapping

Article Preview

Abstract:

Nowadays, document retrieval was an important way of academic exchange and achieving new knowledge. Choosing corresponding category of database and matching the input key words was the traditional document retrieval method. Using the method, a mass of documents would be got and it was hard for users to find the most relevant document. The paper put forward text quantification method. That was mining the features of each element in some document, which including word concept, weight value for position function, improved weights characteristic value, text distribution function weights value and text element length. Then the word’ contributions to this document would be got from the combination of five elements characteristics. Every document in database was stored digitally by the contribution of elements. And a subject mapping scheme was designed in the paper, which the similarity calculation method based on contribution and association rule was firstly designed, according to the method, the documents in the database would be conducted text clustering, and then feature extraction method was used to find class subject. When searching some document, the description which users input would be quantified and mapped to some class automatically by subject mapping, then the document sequences would be retrieved by computing the similarity between the description and the other documents’ features in the class. Experiment shows that the scheme has many merits such as intelligence, accuracy as well as improving retrieval speed.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 605-607)

Pages:

2561-2568

Citation:

Online since:

December 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Ozge U, I. Burhan T 2007 J. Info. Sci. 177 449–466.

Google Scholar

[2] Hanchuan P, Fuhu L, Chris D 2005 Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance and Min-Redundancy J. Trans. on PAT A & M/C Intell. 27 1226–38.

DOI: 10.1109/tpami.2005.159

Google Scholar

[3] FORMAN G 2003 An extensive empirical study of feature selection metrics for text classification J. J of M/C L Rech. 3 1289–1305.

Google Scholar

[4] Qun L, Sujian L 2002 Word Similarity Computing Method Based on HowNet J. C Linguistics and CN Lang. PROC 22 59–76.

Google Scholar

[5] Jungyi J, Renjia L, Shiejue L 2011 A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification Journal Trans. on Knowl. & data EGR 23 335–349.

DOI: 10.1109/tkde.2010.122

Google Scholar

[6] Fabrizio S 2002 Machine learning in automated text categorization J. ACM Computing Sur. 34 1–47.

Google Scholar

[7] Mladenic D, Brank J, Grobelnik M, Milic-Frayling N. Feature selection using linear classifier weights: Interaction with classification models Proc. of the 27th ACM Int'l Conf. on Research and Development in Information Retrieval (Sheffield, ACM Press) p.234.

DOI: 10.1145/1008992.1009034

Google Scholar

[8] Agrawal R, Imielinski T, Swami A 1993 Mining association rules between sets of items in large database (Washington, DC) p.207–216.

DOI: 10.1145/170036.170072

Google Scholar

[9] Shouning Q, Qin W 2006 Research and application in supply chain management based on correlation analyze of Association Rule Materials Science Forum (Vols, 532–533) p.1024–1027.

DOI: 10.4028/www.scientific.net/msf.532-533.1024

Google Scholar

[10] Hassan N, Rasha O, Ismail H 2011 Clustering Generalised Instances Set Approaches for Text Classification J. Journal of Info. & Knowl. Mamt. 10 91–107.

Google Scholar

[11] Makrehchi M, Kamel M 2005 Text classification using small number of features Proc. of the 4th Int'l Conf. on Machine Learning and Data Mining in Pattern Recognition p.580–589.

DOI: 10.1007/11510888_57

Google Scholar

[12] Wen Z, Taketoshi Y, Xijin T, Qing W 2010 Text clustering using frequent itemsets J. Knowl. Based Systems 23 379–386.

Google Scholar

[13] Krishna S, Bhavani S 2010 An Efficient Approach for Text Clustering Based on Frequent Itemsets J. Euro. Journal of Sci. Rech. 42 412–423.

Google Scholar

[14] Yiming Y 1999 An evaluation of statistical approaches to text categorization J. Journal of Info. Retrieval 1 67–88.

Google Scholar

[15] Bollegala D, Matsuo Y, Ishizuka M 2007 Measuring Semantic Similarity Between Words Using Web Search Engines Proceedings of International World Wide Web Conference Committee (Banff, Alberta, Canada) p.757–766.

DOI: 10.1145/1242572.1242675

Google Scholar

[16] Weimin Q, Junlin Z, SunLe 2003 Research on a topic based Chinese language model J. Journal of C Rech. & DEV 40 1368–1374.

Google Scholar

[17] Fernandez J, Montanes E, Diaz I, Ranilla J, Combarro EF 2004 Text categorization by a machine-learning-based term selection Proc. of the Database and Expert Systems Applications (Zaragoza/ Spain) p.253–262.

DOI: 10.1007/978-3-540-30075-5_25

Google Scholar