An Adaptive Stacked Denoising Auto-Encoder Architecture for Human Action Recognition

Article Preview

Abstract:

In this paper, a stacked denoising auto-encoder architecture method with adaptive learning rate for action recognition based on skeleton features of human is presented. Firstly a Kinect is used for capturing the skeleton images and extracting skeleton features. Then an adaptive stacked denoising auto-encoder with three hidden layers is constructed for unsupervised pre-training. So the trained weights are achieved. Finally, a neural network is constructed for action recognition, in which the trained weights are used as the initial value, covering the random value. Based on the experimental results from the Kinect dataset of human actions sampled in experiments, it is clear to see that our method possesses the better robustness and accuracy, compared with the classic classification methods.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

403-409

Citation:

Online since:

September 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Ronald Poppe. A survey on vision-based human action recognition[J]. Image and Vision Computing 28 (2010) 976–990.

DOI: 10.1016/j.imavis.2009.11.014

Google Scholar

[2] J. K. AGGARWAL, M.S. Ryoo (2011) Human activity analysis: a review, [J], ACM Computing Surveys, Article 16, Vol. 43, No. 3. p.1–43.

DOI: 10.1145/1922649.1922653

Google Scholar

[3] Hinton, Osindero, et. A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation July 2006, Vol. 18, No. 7, Pages 1527-1554.

DOI: 10.1162/neco.2006.18.7.1527

Google Scholar

[4] Bengio. Learning Deep Architectures for AI[J]. Foundations and Trends in Machine Learning Vol. 2, No. 1 (2009) 1–127.

DOI: 10.1561/2200000006

Google Scholar

[5] Lidong Xie, Wei Pan et al. A pyramidal deep learning architecture for human action recognition[J]. Int. J. Modelling, Identification and Control. Volume 21, Number 2/(2014).

Google Scholar

[6] Ho-Joon Kim, Joseph S. Lee et al. Human Action Recognition Using a Modified Convolutional Neural Network[J]. Lecture Notes in Computer Science Volume 4492, 2007, pp.715-723.

Google Scholar

[7] Quoc V. Le, Will Y. Zou et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on.

DOI: 10.1109/cvpr.2011.5995496

Google Scholar

[8] Chen, H-S., Chen, H-T. et al. Proceedings of the ACM international workshop on Video surveillance and sensor networks Pages 171 - 178 , ACM , NY, USA (2006).

Google Scholar

[9] Tran Thang Thanh , Fan Chen et al. Extraction of Discriminative Patterns from Skeleton Sequences for Human Action Recognition. Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2012 IEEE RIVF International Conference on.

DOI: 10.1109/rivf.2012.6169822

Google Scholar

[10] Munaro M, Ballin G, Michieletto S, et al. 3D flow estimation for human action recognition from colored point clouds[J]. Biologically Inspired Cognitive Architectures, 2013, 5: 42-51.

DOI: 10.1016/j.bica.2013.05.008

Google Scholar

[11] Zhu, Hong-Min, Pun, Chi-Man. Human action recognition with skeletal information from depth camera. 2013 IEEE International Conference on Information and Automation, ICIA (2013).

DOI: 10.1109/icinfa.2013.6720456

Google Scholar

[12] Pascal Vincent, Hugo Larochelle, et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion[J]. The Journal of Machine Learning Research archive Volume 11, 3/1/2010 Pages 3371-3408.

Google Scholar

[13] Palm, R.B. (2012) Prediction as a Candidate for Learning Deep Hierarchical Models of Data, Master's thesis, Technical University of Denmark, DTU Informatics.

Google Scholar

[14] P. Vincent, H. Larochelle, et al. Extracting and Composing Robust Features with Denoising Autoencoders. Proceedings of the 25th International Conference on Machine Learning, pp.1096-1103, (2008).

DOI: 10.1145/1390156.1390294

Google Scholar