Web Image Classification Using an Optimized Feature Set

Article Preview

Abstract:

Redundant images currently abundant in World Wide Web pages need to be removed in order to transform or simplify the Web pages for suitable display in small-screened devices. Classifying removable images on the Web pages according to their uniqueness of content will allow simpler representation of Web pages. For such classification, machine learning based methods can be used to categorize images into two groups; eliminable and non-eliminable. We use two representative learning methods, the Naïve Bayesian classifier and C4.5 decision trees. For our Web image classification, we propose new features that have expressive power for Web images to be classified. We apply image samples to the two classifiers and analyze the results. In addition, we propose an algorithm to construct an optimized subset from a whole feature set, which includes most influential features for the purposes of classification. By using the optimized feature set, the accuracy of classification is found to improve markedly.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 277-279)

Pages:

361-368

Citation:

Online since:

January 2005

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2005 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] T. Bickmore, A. Girgenshon, and J. W. Sullivan, Web Page Filtering and Re-Authoring for Mobile Users, The Computer Journal, vol. 42, no. 6, pp.534-546, (1999).

DOI: 10.1093/comjnl/42.6.534

Google Scholar

[2] O. Buyukkokten et al., Power Browser: Efficient Web Browsing for PDAs, " Proc. The ACM Conference on Computers and Human Interaction 2000 (CHI, 00), (2000).

DOI: 10.1145/332040.332470

Google Scholar

[3] Y. Whang, C. Jung, J. Kim, and S. Chung, WebAlchemist: A Web Transcoding System for Mobile Web Access in Handheld Devices, Proc. ITCom 2001, (2001).

DOI: 10.1117/12.448023

Google Scholar

[4] N. Milic-Frayling and R. Sommerer, SmartView: Flexible Viewing of Web Page Contents, Proc. World Wide Web Conference 2002 (CD-ROM), (2002).

Google Scholar

[5] C. Elkan, Boosting and Naïve Bayesian Learning, Technical Report No. CS97-557, Department of Computer Science and Engineering, University of California, San Diego, (1997).

Google Scholar

[6] J. R. Quinlan, C4. 5: Programs for Machine Learning, Morgan Kauffman Publishers, (1993).

Google Scholar

[7] A. Vailaya et al., Bayesian framework for hierarchical semantic classification of vacation images, Proc. the IEEE International Conference on Multimedia Computing and Systems (ICMSC), pp.518-523, Florence, Italy, (1999).

Google Scholar

[8] R. Lienhart, and A. Hartmann, Classifying images on the web automatically, Journal of Electronic Imaging, vol. 4, no. 11, pp.445-454, October (2002).

DOI: 10.1117/1.1502259

Google Scholar

[9] S. Paek, Detecting image purpose in World-Wide Web documents, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science and Technology - Document Recognition, San Jose, CA, USA, January 1998. Title of Publication (to be inserted by the publisher).

DOI: 10.1117/12.304628

Google Scholar

[10] G. H. John, R. Kohavi, and K. Pfleger, Irrelevant features and the subset selection problem, Proc. the 11th International Conference on Machine Learning ICML94, Morgan Kauffman, pp.121-129, San Francisco, CA, (1994).

DOI: 10.1016/b978-1-55860-335-6.50023-4

Google Scholar

[11] M. Dunja and G. Morko, Feature selection on hierarchy of web documents, Decision support systems, vol. 35, no. 1, pp.45-87, (2003).

DOI: 10.1016/s0167-9236(02)00097-0

Google Scholar

[12] M. Kubat, D. Flotzinger, and G. Pfurtscheller, Discovering patterns in EEG-signals: comparative study of a few methods, Proc. the European conference on Machine Learning, LNCS vol. 667, pp.366-371, Vienna, (1993).

DOI: 10.1007/3-540-56602-3_152

Google Scholar

[13] C. A. Ratanamahatana and D. Gunopulos, Feature selection for the Naïve Bayesian Classifier Using Decision Trees, Applied Artificial Intelligence: International Journal, vol. 17, no. 5-6, pp.475-487, May-July (2003).

DOI: 10.1080/713827175

Google Scholar

[14] T. M. Mitchell, Machine Learning, McGraw-Hill, (1997).

Google Scholar

[15] ADEW, HTML Analyser, http: /www. htmlanalyser. com.

Google Scholar

[16] Knowledge Media Institute and The Open University, RoC: The Robust Bayesian Classifier, http: /kmi. open. ac. uk/projects/bkd.

Google Scholar

[17] A. K. Jain, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, January (2000).

Google Scholar