Video Based Visual Speech Feature Model Construction

Article Preview

Abstract:

This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1367-1371

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] HongXun Yao, Wen Gao, Rui Wang. Acta Electronic Sinica. Vol 29. pp.239-249(2009). (in chinese).

Google Scholar

[2] H. Ertan Çetingül, Yücel Yemez, Engin Erzin and A. Murat Tekalp: Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading, Vol 15 of IEEE Transactions On Image Processing(2006).

DOI: 10.1109/tip.2006.877528

Google Scholar

[3] G. Potamianos, C. Neti, G. Gravier, A. Garg and A.W. Senior: Recent Advances in the Automatic Recognition of Audio-visual Speech, Proceedings of the IEEE. Vol 91(2003).

DOI: 10.1109/jproc.2003.817150

Google Scholar

[4] L. YePin, L. FengTing, C. ZhaoLong, ZH. RenYi. Journal of China Institute of Communications. Vol 25. pp.106-116(2004). (in chinese).

Google Scholar

[5] YunLong Wei , Mei Xie, Rui Sun, Tao Li: Face Location with LBP Scale Transform, IEEE, pp.347-350(2010).

DOI: 10.1109/icccas.2010.5581980

Google Scholar

[6] M. Li, RC. Staunton: Optimum Gabor Filter Design and Local Binary Patterns for Texture Segmentation, Pattern Recognition Letters. Vol 29. pp.664-672(2008).

DOI: 10.1016/j.patrec.2007.12.001

Google Scholar

[7] Kaynak, Zh. Qi: Analysis of Lip Geometric Features for Audio-visual Speech Recognition, IEEE Transactions on Systems Man and Cybernetics. Vol 34. pp.564-570(2004).

DOI: 10.1109/tsmca.2004.826274

Google Scholar

[8] J. Gao, R. T Collins, A. G Hauptmann: Wactlar H D Articulated Motion Modeling for Activity Analysis, Proceedings of The 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington DC, 20. ( 2004).

DOI: 10.1109/cvpr.2004.303

Google Scholar

[9] Q. Lu, Q. Ping: Applying Stochastic Process Tutorial, Tsinghua university press, BeiJing(2004).

Google Scholar

[10] JianHua Zhou. Journal of Jiamusi University(Natural Science Edition). Vol 28. pp.485-488(2010). (in chinese).

Google Scholar

[11] ZhiYong Wu, Shen Zhang, LiangHong Cai, Helen M. Real-time Synthsis of Chinese Visual Speech and Fiacal Expressions Using MPEG-4 FAP Features In a Three-dimensional Avatar. pp.1-5(2006).

DOI: 10.21437/interspeech.2006-498

Google Scholar