Chinese Web Page Classification Based on Vector Space Model

Article Preview

Abstract:

Chinese web page classification has been considered as a hot research area in data mining. In this paper, Chinese web page classification algorithm based on vector space model is proposed. This algorithm makes use of supervised machine learning theory to implement a web page classifier. It combined text frequency and methods for feature extraction and improved traditional TFIDF weighting formula. The results show that the classifier was feasible and effective.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 846-847)

Pages:

1801-1804

Citation:

Online since:

November 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Yuan Fuyong, Yu Ge, Cui Chunhua. Research on Web Ppage Classification Algorithm Based on Feature Selection [J]. Computer Engineering and Design, 2007, 28(17).

Google Scholar

[2] Zhu Lina. Research on Chinese Web Page Classification Feature Extraction Method [D]. China University of Petroleum, (2009).

Google Scholar

[3] Yong WANG, Julia Hodges, Bo TANG. Classification of Web Documents Using a Naive Bayes Method[J]. 15th IEEE International Conference on Tools with Artificial Intelligence: 2003. 1082~3409.

DOI: 10.1109/tai.2003.1250241

Google Scholar

[4] Chakrabarti S, van den Berg M, Dom B. Focused craw -ling: a new approach to topic-specific web resource discovery[ C]. Proceedings of the 8th International World Wide Web Conference. USA, New York, (1999).

DOI: 10.1016/s1389-1286(99)00052-3

Google Scholar

[5] Chul Su Lim, Kong Joo Lee, Gil Chang Kim. Multiple sets of Features for Automatic Genre Classification of Web Documents. Information Processing and Management, 2005, 41: 1263~1276.

DOI: 10.1016/j.ipm.2004.06.004

Google Scholar

[6] Zhang Yufang, Peng Shiming, Lv Jia. Improvement and Application of TFIDF Based on Text [J]. Computer Engineering, 2006, 32(19).

Google Scholar

[7] Ali Selamat, Sigeru Omatu. Web Page Feature Selection and Classification Using Neural Networks [J]. Information Sciences, 2004, 158:69~88.

DOI: 10.1016/j.ins.2003.03.003

Google Scholar

[8] XuJ L, Xu B W, Zhang W F. A new feature select ion method for text clustering [J]. Wuhan University Journal of Natural Sciences, 2007, 12( 5): 912-916.

DOI: 10.1007/s11859-007-0040-x

Google Scholar