Visual KEA: A Visual Model Based on Keywords Extraction Algorithm for Hub Pages

Hao Peng; Zhen Chen

doi:10.4028/www.scientific.net/AMR.532-533.1593

Paper Titles

Research on Optimize Scheduling Model and Heuristic Algorithm Based on Capacity Balance
p.1573

Medical Image Segmentation Algorithm Based on Granular Computing
p.1578

A Nighttime Vehicle License Character Segmentation Algorithm
p.1583

A Knowledge Rule Mining Method for the Evaluation of Library Service Quality Based on Genetic Algorithm
p.1588

Visual KEA: A Visual Model Based on Keywords Extraction Algorithm for Hub Pages
p.1593

Transitional Compensation Algorithm for Correcting Non-Uniformity of LED Display Image
p.1600

Application of Data Fusion Based on Genetic Algorithm and BP Neural Network in WSN
p.1606

Analysis on Using Average Curvature Algorithm to Simplify the Mapping of Large-Scale Battlefield Terrain Space Grids with LOD and its Errors
p.1611

Cost Optimization Problem of Hybrid Flow-Shop Based on PSO Algorithm
p.1616

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 532-533Visual KEA: A Visual Model Based on Keywords...

Visual KEA: A Visual Model Based on Keywords Extraction Algorithm for Hub Pages

Abstract:

Automatically extracting keywords from webpage is greatly important for focused spider. There are already quite many researches on automatically extracting keywords from content-intensive web pages. However, it is still a challenge to extract keywords automatically from hyperlink-intensive web pages (hub pages). The web page author will often use all kinds of visual strengthening means to prominently demonstrate some glossaries connected with the subject. Therefore, this paper proposes a visual model of web pages, DOM-PIXEL, which regards DOM leaf node of webpage as an image element expressed by a vision vector, in which each component corresponds to one visual emphasis means; the pixel value is from the visual energy. The pixel value reflects the relevance of the corresponding DOM node with respect to subject. These parts strengthened by page author will be highlighted with particular “color” in DOM-PIXEL image. Then, the only request for keywords extraction algorithm is to find these “particular points” with particular “color” automatically. Just because of the intrinsic anti-noise ability of DOM-PIXEL and its visual energy transfer rule, the visual model based keywords extraction algorithm (VisualKEA) proposed in this paper significantly promotes the performance on hub pages.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1593-1599

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.532-533.1593

Citation:

Cite this paper

Online since:

June 2012

Authors:

Hao Peng, Zhen Chen

Keywords:

Automatic Keyword Extraction, DOM-PIXEL, Visual Energy, Visual Energy Transferring Rule, Visual Vector

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Chakrabarti, S., van den Berg, M., and Dom, B.: Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery. Proc. International World Wide Web Conference(WWW 99).

DOI: 10.1016/s1389-1286(99)00052-3

Google Scholar

[2] M. Chau, H. Chen, Incorporating Web Analysis Into Neural Networks: An Example in Hopfield Net Searching. IEEE transactions on systems, man, and cybernetics-PART C: applications and reviews, Vol. 37, March, 2007, pp.352-358.

DOI: 10.1109/tsmcc.2007.893277

Google Scholar

[3] Kesong Han, Yongcheng Wang, Wei Teng, Research on automatic subject extracting from Web pages' chinese text. Journal of the china society for scientific and techmical information, Vol. 20, Feb. 2001, pp.217-222.

Google Scholar

[4] Mo, Chen Jian-Tao Sun, Hua-Jun Zeng, Kwok-Yan Lam, A practical system of keyphase extraction for Web pages. Proc. ACM international conference on information and knowledge management, Bremen, Germany, Novemver, (2005)

DOI: 10.1145/1099554.1099625

Google Scholar

[5] YongZheng Zhang, Nur Zincir-Heywood and Evangelos Milios, Narrative text classfication for automatic key phrase extraction in web document corpora. Proc. ACM international workshop on Web information and data management, Bermen, Germany, November, 2005.

DOI: 10.1145/1097047.1097059

Google Scholar

[6] Yih Wen-tau, Joshua Goodman, Vitor R. Carvalho, Finding advertising keywords on web pages. Proc. international conference on World Wide Web(WWW 2006).

DOI: 10.1145/1135777.1135813

Google Scholar

[7] Buyukkokten，H. Garcm-Molina，A. Paepcke．Seeing the whole in parts： Text summarization for Web browsing on handheld devises. Proc. Int'l Conf on World Wide Web(WWW 2001)．New York：ACM Press, 2001, pp.652-662.

DOI: 10.1145/371920.372178

Google Scholar

[8] Qi WANG, Shi-Wei TANG, Dong-Qing YANG, and Teng-Jiao WANG, DOM-based automatic extraction of topical information from Web pages. journal of computer research and development，Vol. 41, Oct. 2004, pp.1786-1792.

Google Scholar

[9] Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma, VIPS: a Vision-based Page Segmentation Algorithm. MSR-TR-2003-79, Nov.1, 2003.

Google Scholar

[10] Zhang, Zh., Chen, J., LI, X., A Preprocessing Framework and Approach for Web Applications. Journal of Web Engineering, Vol.2, Mar. 2004, pp.175-191.

Google Scholar

[11] Xiao-Min LI, Hong-fei YAN, Ji-Min WANG, Search Engine: principal, technique and sysetm, Science Press, 2005.

Google Scholar

[12] Ke-qiang REN, Guang-fu ZHAO and Guo-ping ZHANG, Extracting keywords from Web page based on weighted natural language network. Computer Engineering and Applications, Vol.44, Aug. 2008, pp.155-157.

Google Scholar