An Improved VIPS-Based Algorithm of Extracting Web Content

Article Preview

Abstract:

The paper studies the VIPS algorithm, and improves VIPS which has the deficiency with complex rules and low performance, according that the Web page has the feature of DIV structure in Web2.0, and combines the method based statistics information, introduces a DVIPS algorithm of extracting web main content.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1806-1810

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] LIN Hong-Fei. A HYBRID MODEL FOR TEXT FILTERING[J]. Journal of Computer Research and Development, 2001, 38(9): 1127-1131.

Google Scholar

[2] QIU Jiang-tao, TANG Chang-jie, LI Chuan, et al. News content extraction based on block distribution [J]. Journal of Jilin University(Engineering and Technology Edition), 2009, 39(5).

Google Scholar

[3] CHANG Hong-yao, ZHU Zheng-yu, CHEN Ye, et al. Content Extraction Technique for Web Pages Based on HTML-tags[J], Computer Engineering and Design, 2010, 31(24).

Google Scholar

[4] SUN Cheng-jie, GUAN Yi. A Statistical Approach for Content Extraction from Web Page[J]. Journal of Chinese Information Processing, 2004, 18(5): 17-22.

Google Scholar

[5] Cai D, Yu S, Wen J R, et al. VIPS: A vision-based page segmentation algorithm[R]. Microsoft technical report, MSR-TR-2003-79, (2003).

Google Scholar

[6] Gao Le, Zhang Jian, Tian Xian-zhong. Improvement and Implementation of VIPS Algorithm[J]. Computer Systems & Applications, 2009, 18(4): 65-69.

Google Scholar

[7] Zhang Chao-qun. Theme crawling based on Webpage segmentation[D]. Journal of Jilin University, (2007).

Google Scholar

[8] Zhen Yan-hong. Web Information extracting based on vision segmentation and multi feature[J]. (2012).

Google Scholar

[9] Cai D, Yu S, Wen J R, et al. Extracting content structure for web pages based on visual representation[M]/Web Technologies and Applications. Springer Berlin Heidelberg, 2003: 406-417.

DOI: 10.1007/3-540-36901-5_42

Google Scholar

[10] Bhardwaj A, Mangat V. A novel approach for content extraction from web pages[C]/Engineering and Computational Sciences (RAECS), 2014 Recent Advances in. IEEE, 2014: 1-4.

DOI: 10.1109/raecs.2014.6799616

Google Scholar

[11] Liu W, Meng X, Meng W. Vide: A vision-based approach for deep web data extraction[J]. Knowledge and Data Engineering, IEEE Transactions on, 2010, 22(3): 447-460.

DOI: 10.1109/tkde.2009.109

Google Scholar

[12] Wang P, Zhang X, Zhou F. Finding and Extracting Academic Information from Conference Web Pages[M]/Social Media Retrieval and Mining. Springer Berlin Heidelberg, 2013: 65-79.

DOI: 10.1007/978-3-642-41629-3_6

Google Scholar

[13] Fang J, Gong L. Web2. 0 Environment of Personal Knowledge Management Applications[M]/Computer, Informatics, Cybernetics and Applications. Springer Netherlands, 2012: 1639-1646.

DOI: 10.1007/978-94-007-1839-5_177

Google Scholar

[14] Shi S. The use of Web2. 0 style technologies among Chinese civil society organizations[J]. Telematics and Informatics, 2013, 30(4): 346-358.

DOI: 10.1016/j.tele.2012.04.003

Google Scholar