Multi-Channel MKL for Video Human Action Recognition

Article Preview

Abstract:

Human action recognition in videos plays an important role in the field of computer vision and image understanding. A novel method of multi-channel bag of visual words and multiple kernel learning is proposed in this paper. The videos are described by multi-channel bag of visual words, and a multiple kernel learning classifier is used for action classification, in which each kernel function of the classifier corresponds to a video channel in order to avoid the noise interference from other channels. The proposed approach improves the ability in distinguishing easily confused actions. Experiments on KTH show that the presented method achieves remarkable performance on the average recognition rate, and obtains comparable recognition rate with state-of-the-art methods.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1571-1574

Citation:

Online since:

August 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] I. Laptev, M. Marszalek, C. Schmid C, et al., Learning realistic human actions from movies, IEEE Conference on Computer Vision and Pattern Recognition, 2008, p.1–8.

DOI: 10.1109/cvpr.2008.4587756

Google Scholar

[2] J. Sun, X. Wu, S. Yan, et al., Hierarchical spatio-temporal context modeling for action recognition, IEEE Conference on Computer Vision and Pattern Recognition, 2009, p.2004–(2011).

DOI: 10.1109/cvpr.2009.5206721

Google Scholar

[3] S. Vishwanathan, Z. Sun, N. Theera-Ampornpunt N, et al., Multiple kernel learning and the SMO algorithm, Advances in Neural Information Processing Systems 23, 2010, p.2361–2369.

Google Scholar

[4] B. Li, O. I. Camps, M. Sznaier, Cross-view activity recognition using Hankelets, IEEE Conference on Computer Vision and Pattern Recognition, 2012, p.1362–1369.

DOI: 10.1109/cvpr.2012.6247822

Google Scholar

[5] S. Bhattacharya, R. Sukthankar, R. Jin, et al., A probabilistic representation for efficient large scale visual recognition tasks, IEEE Conference on Computer Vision and Pattern Recognition, 2011, p.2593–2600.

DOI: 10.1109/cvpr.2011.5995746

Google Scholar

[6] A. Kovashka, K. Grauman, Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, IEEE Conference on Computer Vision and Pattern Recognition, 2010, p.2046–(2053).

DOI: 10.1109/cvpr.2010.5539881

Google Scholar

[7] J. Wang, Z. Chen, Y. Wu, Action recognition with multiscale spatio-temporal contexts, IEEE Conference on Computer Vision and Pattern Recognition, 2011, p.3185–3192.

DOI: 10.1109/cvpr.2011.5995493

Google Scholar

[8] B. Chakraborty, M. B. Holte, T. B. Moeslund, et al., A selective spatio-temporal interest point detector for human action recognition in complex scenes, IEEE International Conference on Computer Vision, 2011, p.1776–1783.

DOI: 10.1109/iccv.2011.6126443

Google Scholar

[9] L. Cao, Z. Liu, T. Huang, Cross-dataset action detection, IEEE Conference on Computer Vision and Pattern Recognition, 2010, p.1998–(2005).

DOI: 10.1109/cvpr.2010.5539875

Google Scholar

[10] M. Kaâniche, F. Bremond F, Gesture recognition by learning local motion signatures, IEEE Conference on Computer Vision and Pattern Recognition, 2010, p.2745–2752.

DOI: 10.1109/cvpr.2010.5539999

Google Scholar

[11] Z. Jiang, Z. Lin, L. S. Davis, Recognizing human actions by learning and matching shape-motion prototype trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 533–547.

DOI: 10.1109/tpami.2011.147

Google Scholar