A Document Feature Extraction Method Based on Concept-Word List

Article Preview

Abstract:

When describing a document in Vector Space Model (VSM), it often assumes that there is no semantic relationship between words or they are orthogonal to each other. In order to improve the inaccurate document description, a new document description method has been proposed in this paper by introducing a concept-word, which calculates the semantic similarity between words based on HowNet ontology database. Comparative experiments show that the new method can not only improve effectively the effect of document feature description in VSM, but also reduce significantly the dimension of a document vector. The research is very useful to document clustering, query word expansion in Web information retrieval and personalized service in e-business applications.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

386-392

Citation:

Online since:

June 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Z.L. Jiang, X.K. Xu and S. Li: Feature extraction of text classification based on word clustering. Journal of Harbin Engineering University. Vol. 11(2008), pp.1205-1209.

Google Scholar

[2] Y. Wang, M. Zhang and L. Ma: Text categorization based on word aggregation and decision tree. Journal of Hebei University(Natural Science Edition). Vol. 03(2005), pp.338-342.

Google Scholar

[3] L. Zhao, T. Hu, X.J. Huang et al.: Hownet-based conceptual feature selection method. Journal of China Institute of Communications. Vol. 07(2004), pp.46-54.

Google Scholar

[4] Q. Liu and S.J. Li: Word similarity computing based on how-net. Computational Linguistics and Chinese Language Processing. Vol. 7(2002), pp.59-76.

Google Scholar

[5] Information on http: /www. ictclas. org.

Google Scholar

[6] J.J. Sun and Y. Cheng: Technology of information retrieval. Science Press. pp.166-167. (2004).

Google Scholar

[7] Information on http: /download. csdn. net/source/1987618.

Google Scholar

[8] Information on http: /www. pudn. com/downloads91/sourcecode/chinese/detail348916. html.

Google Scholar