Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System

Shu Dong Zhang; Y. Qin; N.M. Yao

doi:10.4028/www.scientific.net/AMR.108-111.222

Paper Titles

Medical Image Retrieval Based on Semi-Supervised Learning
p.201

Study on Dynamic Load Balance Method Based on Genetic Algorithm and RBF Neural Network
p.207

Research on Information Resources Management System Based on Domain Ontology
p.211

Personalized Intelligent Information Retrieval Entrance Mechanism
p.216

Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System
p.222

Study on the Architecture for Agricultural Products Quantity Security Early Warning System
p.228

Architecture of Highway Network Operation Monitoring and Emergency Management System
p.234

Towards Efficient Dimensionality Reduction for Evolving Bayesian Network Classifier
p.240

An Improved Synthesized Decision Tree Algorithm and its Application
p.244

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 108-111Study on Web Information Intelligent Extraction...

Study on Web Information Intelligent Extraction for Agricultural Product Quantity Security System

Abstract:

Web information is main data source for the agricultural product quantity security system which is used to provide comprehensive analysis and early warning for national agriculture through large amounts of basic data. In this paper, Web information extraction architecture and a novel approach of wrapper construction are presented. The intelligence of wrapper is that both intensive and sparse data in web pages can be distinguished and extracted at one time. During the wrapper construction, hierarchical clustering is used to determine key information node and DOM technique and heuristic rules are applied to generate extraction expression according to different types of data. Experiments on a large of Web pages from different Web sites indicate that the extraction method, which has a high rate of recall and precision, is feasible and efficient.