Vision-Based Label Extraction and Matching


Article Preview

Label extraction and matching is the basis of many form-based web applications. A vision-based element-label matching approach was proposed in this paper. First, the factors which affect label matching were deeply analyzed, and then a method of reconstructing query interface by analyzing its HTML code was proposed. Finally the element-label matching was realized through consideration of tag, text semanteme and position feature. Experiments on 278 query interfaces in 8 typical domains demonstrate the feasibility and effectiveness of our proposed approach



Edited by:

Helen Zhang and David Jin




C. M. Wu et al., "Vision-Based Label Extraction and Matching", Advanced Materials Research, Vol. 459, pp. 155-160, 2012

Online since:

January 2012




[1] He, B., Patel, M., Zhang, Z., Chang, K. C: Accessing the deep web: A survey. In: Communications of the ACM. Vol. 50, pp.94-101. ACM New York, NY, USA (2007).


[2] Z. Zhang, B. He, K. Chang: Understanding web query interfaces: best-effort parsing with hidden syntax. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp.107-118. ACM New York, NY, USA (2004).


[3] K. Chang, B. He, Z. Zhang: Metaquerier over the deep web: Shallow integration across holistic sources. In: Proceedings of the VLDB Workshop on Information Integration on the Web, pp.15-21. Citeseer (2004).

[4] W. Wu, A. Doan, C. Yu: WebIQ: Learning from the web to match deep-web query interfaces. In: Proceedings of 22nd International Conf. on Data Engineering, p.44 (2006).


[5] Nguyen, H., Nguyen, T., Freire, J: Learning to extract form labels. In: Proceedings of the VLDB Endowment, Vol. 1(1), pp.684-694. VLDB Endowment (2008).


[6] He, H., Meng, W., Lu, Y., et al.: Towards deeper understanding of the search interfaces of the deep web. In: World Wide Web Internet And Web Information Systems, Vol. 10(2), pp.133-155. Kluwer Academic Publishers (2007).


[7] Ritu Khare, Yuan An, II-Yeol Song: Understanding Deep Web Search Interfaces: A Survey, In: Proceedings of SIGMOD Record, pp.33-40. SIGMOD Record (2010).


[8] W. Wu, A. Doan, C.T. Yu, W. Meng: Modeling and Extracting Deep-Web Query Interfaces. In: Proceedings of Advances in Information and Intelligent Systems, pp.65-90 (2009).


[9] Khare, R., An, Y: An Empirical Study On Using Hidden Markov Model for Search Interface Segmentation. In: Proceedings of the 18th International Conference on Information and Knowledge Management, pp.17-26 (2009).


[10] Dragut E., C. Kabisch, T. Yu, C., Leser U: A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration. In: Proceedings of the 35th International Conference on VLDB, pp.325-335 (2009).


[11] Raghavan S., Garcia-Molina H: Crawling the hidden web. In: Proceedings of the 27th International Conference on VLDB, 129-138 (2001).

[12] Pystemmer, http: /pypi. python. org/pypi/PyStemmer/1. 0. 1.

[13] UIUC Web integration repository, http: /metaquerier. cs. uiuc. edu/ repository.