Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System
Web information is main data source for the agricultural product quantity security system which is used to provide comprehensive analysis and early warning for national agriculture through large amounts of basic data. In this paper, Web information extraction architecture and a novel approach of wrapper construction are presented. The intelligence of wrapper is that both intensive and sparse data in web pages can be distinguished and extracted at one time. During the wrapper construction, hierarchical clustering is used to determine key information node and DOM technique and heuristic rules are applied to generate extraction expression according to different types of data. Experiments on a large of Web pages from different Web sites indicate that the extraction method, which has a high rate of recall and precision, is feasible and efficient.
S. D. Zhang et al., "Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System", Advanced Materials Research, Vols. 108-111, pp. 222-227, 2010