Visual KEA: A Visual Model Based on Keywords Extraction Algorithm for Hub Pages

Article Preview

Abstract:

Automatically extracting keywords from webpage is greatly important for focused spider. There are already quite many researches on automatically extracting keywords from content-intensive web pages. However, it is still a challenge to extract keywords automatically from hyperlink-intensive web pages (hub pages). The web page author will often use all kinds of visual strengthening means to prominently demonstrate some glossaries connected with the subject. Therefore, this paper proposes a visual model of web pages, DOM-PIXEL, which regards DOM leaf node of webpage as an image element expressed by a vision vector, in which each component corresponds to one visual emphasis means; the pixel value is from the visual energy. The pixel value reflects the relevance of the corresponding DOM node with respect to subject. These parts strengthened by page author will be highlighted with particular “color” in DOM-PIXEL image. Then, the only request for keywords extraction algorithm is to find these “particular points” with particular “color” automatically. Just because of the intrinsic anti-noise ability of DOM-PIXEL and its visual energy transfer rule, the visual model based keywords extraction algorithm (VisualKEA) proposed in this paper significantly promotes the performance on hub pages.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1593-1599

Citation:

Online since:

June 2012

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Chakrabarti, S., van den Berg, M., and Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Proc. International World Wide Web Conference(WWW 99).

DOI: 10.1016/s1389-1286(99)00052-3

Google Scholar

[2] M. Chau, H. Chen, Incorporating Web Analysis Into Neural Networks: An Example in Hopfield Net Searching. IEEE transactions on systems, man, and cybernetics-PART C: applications and reviews, Vol. 37, March, 2007, pp.352-358.

DOI: 10.1109/tsmcc.2007.893277

Google Scholar

[3] Kesong Han, Yongcheng Wang, Wei Teng, Research on automatic subject extracting from Web pages' chinese text. Journal of the china society for scientific and techmical information, Vol. 20, Feb. 2001, pp.217-222.

Google Scholar

[4] Mo, Chen Jian-Tao Sun, Hua-Jun Zeng, Kwok-Yan Lam, A practical system of keyphase extraction for Web pages. Proc. ACM international conference on information and knowledge management, Bremen, Germany, Novemver, (2005)

DOI: 10.1145/1099554.1099625

Google Scholar

[5] YongZheng Zhang, Nur Zincir-Heywood and Evangelos Milios, Narrative text classfication for automatic key phrase extraction in web document corpora. Proc. ACM international workshop on Web information and data management, Bermen, Germany, November, 2005.

DOI: 10.1145/1097047.1097059

Google Scholar

[6] Yih Wen-tau, Joshua Goodman, Vitor R. Carvalho, Finding advertising keywords on web pages. Proc. international conference on World Wide Web(WWW 2006).

DOI: 10.1145/1135777.1135813

Google Scholar

[7] Buyukkokten,H. Garcm-Molina,A. Paepcke.Seeing the whole in parts: Text summarization for Web browsing on handheld devises. Proc. Int'l Conf on World Wide Web(WWW 2001).New York:ACM Press, 2001, pp.652-662.

DOI: 10.1145/371920.372178

Google Scholar

[8] Qi WANG, Shi-Wei TANG, Dong-Qing YANG, and Teng-Jiao WANG, DOM-based automatic extraction of topical information from Web pages. journal of computer research and development,Vol. 41, Oct. 2004, pp.1786-1792.

Google Scholar

[9] Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma, VIPS: a Vision-based Page Segmentation Algorithm. MSR-TR-2003-79, Nov.1, 2003.

Google Scholar

[10] Zhang, Zh., Chen, J., LI, X., A Preprocessing Framework and Approach for Web Applications. Journal of Web Engineering, Vol.2, Mar. 2004, pp.175-191.

Google Scholar

[11] Xiao-Min LI, Hong-fei YAN, Ji-Min WANG, Search Engine: principal, technique and sysetm, Science Press, 2005.

Google Scholar

[12] Ke-qiang REN, Guang-fu ZHAO and Guo-ping ZHANG, Extracting keywords from Web page based on weighted natural language network. Computer Engineering and Applications, Vol.44, Aug. 2008, pp.155-157.

Google Scholar