Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System

Article Preview

Abstract:

Web information is main data source for the agricultural product quantity security system which is used to provide comprehensive analysis and early warning for national agriculture through large amounts of basic data. In this paper, Web information extraction architecture and a novel approach of wrapper construction are presented. The intelligence of wrapper is that both intensive and sparse data in web pages can be distinguished and extracted at one time. During the wrapper construction, hierarchical clustering is used to determine key information node and DOM technique and heuristic rules are applied to generate extraction expression according to different types of data. Experiments on a large of Web pages from different Web sites indicate that the extraction method, which has a high rate of recall and precision, is feasible and efficient.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 108-111)

Pages:

222-227

Citation:

Online since:

May 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] S.D. Zhang, N.M. Yao and Y. Qin: submitted to Journal of Key Engineering Materials (2010).

Google Scholar

[2] S.Z. Zhang and P.Z. Shi: Proceeding of 4 th International Conference on Computer Science & Education (2009), pp.1245-1250.

Google Scholar

[3] C.H. Chang, M. Kayed, M.R. Girgis and K. Shaalan: IEEE Transactions on Knowledge and Data Engineering, TKDE-0475-1104. R3 (2006).

Google Scholar

[4] X. Mei, X.Q. Cheng, Y. Guo, G. Zhang and G.D. Ding: Journal of Chinese Information Processing, Vol. 22 (2008), No. 1, pp.22-28.

Google Scholar

[5] B. Liu, R. Grossman and Y. Zhai: KDD(2003), pp.601-606.

Google Scholar

[6] Q. Gao, J.Z. Zhang, H. Geng and J.G. Pan: Journal of Computer Science, Vol. 34 (2007), No. 4, pp.210-212, 221.

Google Scholar

[7] J.S. Den, Q.L. Zheng, H. Peng and X.D. Lin: Journal of Computer Science, Vol. 34 (2007), No. 4, pp.213-216.

Google Scholar

[8] L. Zhao, H. Peng, S.N. Ye, H. Zhang and Q.Y. Yang: Journal of Computer Science, Vol. 36 (2009), No. 7, pp.202-203, 210.

Google Scholar

[9] H. Liu: Diss. Xi Dian University (2008).

Google Scholar