Research of the Web Information Extraction Technology on Tourism Theme

Article Preview

Abstract:

With the development of web technology, the use of dynamic web pages and the personalization of page contents become more and more popular. Currently, the information of page is protean and the structures of different pages are vastly different, the traditional thinking of web information extraction technology has been difficult to adapt to the situation. In this paper, proposes a web information extraction method based on extended XPath policy through the analysis of structural features of web pages on tourist theme. This algorithm avoids the defects of traditional web information extraction technology; it is simple, practical, high cleaning efficiency, accuracy, and saving the overhead of the system.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

503-506

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Li Qingyong, Shi Zhiping, Shi Jun, Shi Zhongzhi. Swarm Intelligence Clustering Algorithm based on Attractor[c]. LNCS 36l0, 2005, 3: 496-504.

DOI: 10.1007/11539902_61

Google Scholar

[2] H Chen, C Schuffels, R 0rwig. Internet Categorization andSearch: A Self-0rgenizing Approach[J]. Journal of VisualCommunication and Image Representation, 1996, 7(1): 88-l02.

Google Scholar

[3] Junbin Chen. Research of the Web Information Extraction Strategy and Implementation. Information Development & Economy, 2008, 18(23): 169-170.

Google Scholar

[4] Kunmei Wen, Zhengding Lu, Weiguo Ye. Web-MIND: Web Information Extraction System based on a Specific Topic [J]. Computer Engineering and Science, 2007, 29(6): 71一73.

Google Scholar

[5] M. YuVarani, N. Ch. S. N. lyengar, A. Kannan. Lscrawlef: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics[A]. 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 2006, 794-800.

DOI: 10.1109/wi.2006.112

Google Scholar