A Document Feature Extraction Method Based on Concept-Word List
When describing a document in Vector Space Model (VSM), it often assumes that there is no semantic relationship between words or they are orthogonal to each other. In order to improve the inaccurate document description, a new document description method has been proposed in this paper by introducing a concept-word, which calculates the semantic similarity between words based on HowNet ontology database. Comparative experiments show that the new method can not only improve effectively the effect of document feature description in VSM, but also reduce significantly the dimension of a document vector. The research is very useful to document clustering, query word expansion in Web information retrieval and personalized service in e-business applications.
Z. Y. Zhu et al., "A Document Feature Extraction Method Based on Concept-Word List", Advanced Materials Research, Vol. 267, pp. 386-392, 2011