p.1573
p.1578
p.1583
p.1588
p.1593
p.1600
p.1606
p.1611
p.1616
Visual KEA: A Visual Model Based on Keywords Extraction Algorithm for Hub Pages
Abstract:
Automatically extracting keywords from webpage is greatly important for focused spider. There are already quite many researches on automatically extracting keywords from content-intensive web pages. However, it is still a challenge to extract keywords automatically from hyperlink-intensive web pages (hub pages). The web page author will often use all kinds of visual strengthening means to prominently demonstrate some glossaries connected with the subject. Therefore, this paper proposes a visual model of web pages, DOM-PIXEL, which regards DOM leaf node of webpage as an image element expressed by a vision vector, in which each component corresponds to one visual emphasis means; the pixel value is from the visual energy. The pixel value reflects the relevance of the corresponding DOM node with respect to subject. These parts strengthened by page author will be highlighted with particular “color” in DOM-PIXEL image. Then, the only request for keywords extraction algorithm is to find these “particular points” with particular “color” automatically. Just because of the intrinsic anti-noise ability of DOM-PIXEL and its visual energy transfer rule, the visual model based keywords extraction algorithm (VisualKEA) proposed in this paper significantly promotes the performance on hub pages.
Info:
Periodical:
Pages:
1593-1599
Citation:
Online since:
June 2012
Price:
Сopyright:
© 2012 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: