An Improved VIPS-Based Algorithm of Extracting Web Content

Long Li; An Min Zhou; Yong Fang; Liang Liu; Qian Wu

doi:10.4028/www.scientific.net/AMM.651-653.1806

Paper Titles

A Software Engineering Method Based on Bionic Components
p.1776

A Study of Knowledge Management System Based on Data Resources of Medical Industry
p.1784

An Efficient Intra-Cluster MAC Protocol in Underwater Acoustic Sensor Networks
p.1790

An Event Bridge Framework for Modeling and Simulating Networked Hybrid Dynamic Behaviors
p.1798

An Improved VIPS-Based Algorithm of Extracting Web Content
p.1806

Complex Network Theory in the Application of Optimization Topology Network
p.1811

Cooperative Space-Time Network Coding for Multi-Sources Distributed Cooperative Network
p.1816

Design and Implementation of an Open Source Content Management System
p.1821

Design and Implementation of Software Configuration Tool for Integrated Modular Avionics
p.1827

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 651-653An Improved VIPS-Based Algorithm of Extracting Web...

An Improved VIPS-Based Algorithm of Extracting Web Content

Abstract:

The paper studies the VIPS algorithm, and improves VIPS which has the deficiency with complex rules and low performance, according that the Web page has the feature of DIV structure in Web2.0, and combines the method based statistics information, introduces a DVIPS algorithm of extracting web main content.

You might also be interested in these eBooks

Material Science, Civil Engineering and Architecture Science, Mechanical Engineering and Manufacturing Technology II

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 651-653)

Pages:

1806-1810

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.651-653.1806

Citation:

Cite this paper

Online since:

September 2014

Authors:

Long Li, An Min Zhou, Yong Fang, Liang Liu, Qian Wu

Keywords:

Page Extract, Statistics, VIPS Algorithm, Web2.0

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] LIN Hong-Fei. A HYBRID MODEL FOR TEXT FILTERING[J]. Journal of Computer Research and Development, 2001, 38(9): 1127-1131.

Google Scholar

[2] QIU Jiang-tao, TANG Chang-jie, LI Chuan, et al. News content extraction based on block distribution [J]. Journal of Jilin University(Engineering and Technology Edition), 2009, 39(5).

Google Scholar

[3] CHANG Hong-yao, ZHU Zheng-yu, CHEN Ye, et al. Content Extraction Technique for Web Pages Based on HTML-tags[J], Computer Engineering and Design, 2010, 31(24).

Google Scholar

[4] SUN Cheng-jie, GUAN Yi. A Statistical Approach for Content Extraction from Web Page[J]. Journal of Chinese Information Processing, 2004, 18(5): 17-22.

Google Scholar

[5] Cai D, Yu S, Wen J R, et al. VIPS: A vision-based page segmentation algorithm[R]. Microsoft technical report, MSR-TR-2003-79, (2003).

Google Scholar

[6] Gao Le, Zhang Jian, Tian Xian-zhong. Improvement and Implementation of VIPS Algorithm[J]. Computer Systems & Applications, 2009, 18(4): 65-69.

Google Scholar

[7] Zhang Chao-qun. Theme crawling based on Webpage segmentation[D]. Journal of Jilin University, (2007).

Google Scholar

[8] Zhen Yan-hong. Web Information extracting based on vision segmentation and multi feature[J]. (2012).

Google Scholar

[9] Cai D, Yu S, Wen J R, et al. Extracting content structure for web pages based on visual representation[M]/Web Technologies and Applications. Springer Berlin Heidelberg, 2003: 406-417.

DOI: 10.1007/3-540-36901-5_42

Google Scholar

[10] Bhardwaj A, Mangat V. A novel approach for content extraction from web pages[C]/Engineering and Computational Sciences (RAECS), 2014 Recent Advances in. IEEE, 2014: 1-4.

DOI: 10.1109/raecs.2014.6799616

Google Scholar

[11] Liu W, Meng X, Meng W. Vide: A vision-based approach for deep web data extraction[J]. Knowledge and Data Engineering, IEEE Transactions on, 2010, 22(3): 447-460.

DOI: 10.1109/tkde.2009.109

Google Scholar

[12] Wang P, Zhang X, Zhou F. Finding and Extracting Academic Information from Conference Web Pages[M]/Social Media Retrieval and Mining. Springer Berlin Heidelberg, 2013: 65-79.

DOI: 10.1007/978-3-642-41629-3_6

Google Scholar

[13] Fang J, Gong L. Web2. 0 Environment of Personal Knowledge Management Applications[M]/Computer, Informatics, Cybernetics and Applications. Springer Netherlands, 2012: 1639-1646.

DOI: 10.1007/978-94-007-1839-5_177

Google Scholar

[14] Shi S. The use of Web2. 0 style technologies among Chinese civil society organizations[J]. Telematics and Informatics, 2013, 30(4): 346-358.

DOI: 10.1016/j.tele.2012.04.003

Google Scholar