Noise-Robust Voice Activity Detector Based on Four States-Based HMM

Article Preview

Abstract:

Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

743-748

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] J. Sohn, N.S. Kim and W. Sung: A statistical model-based voice activity detection. IEEE Signal Processing Letter, vol. 6(1) (1999), pp.1-3.

Google Scholar

[2] N. Mesgarani and S. Shamma: Speech enhancement based on filtering the spectrotemporal modulations. IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)}, vol 1 (2005), pp.1520-6149.

DOI: 10.1109/icassp.2005.1415311

Google Scholar

[3] F. Beritelli, S. Casale and A. Cavallaero: A robust voice activity detector for wireless communications using soft computing, Ist. di Inf. e Telecommun., Catania Univ, vol 16(9) (1998), pp.1818-1829.

DOI: 10.1109/49.737650

Google Scholar

[4] S.G. Tanyer and H. Ozer: Voice activity detection in non-stationary noise data. IEEE Trans. Speech Audio Processing, vol 6(2) (2002), pp.478-482.

DOI: 10.1109/89.848229

Google Scholar

[5] T. Kinnunen, E. Chernenko, M. Tuononen, et al: Voice activity detection using MFCC features and support vector machine. Int. Conf. on Speech and Computer, (2007), pp.2685-2692.

Google Scholar

[6] A. Davis, S. Nordholm and R. Togneri: Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold. IEEE Trans. Audio Speech Lang. Process, vol 14(2) (2007), pp.2693-2709.

DOI: 10.1109/tsa.2005.855842

Google Scholar

[7] S. Valipour, F. Razzazi, et al: The reduced nearest neighbor rule. 2th International Conference on Computational Intelligence, Modelling and Simulation, vol 18(3) (2010), pp.345-350.

Google Scholar

[8] Y. Liang, X. Liu, Y. Lou and B. Shan: An improved noise robust voice activity detector based on hidden semi-Markov models. Pattern Recognition Letters, vol 32(7) (2011), pp.1044-1053.

DOI: 10.1016/j.patrec.2011.02.015

Google Scholar

[9] L.R. Rabiner: A tutorial on hidden Markov model and selected applications in speech recognition. IEEE Proceedings, vol 77(2) (1989), pp.257-286.

DOI: 10.1109/5.18626

Google Scholar

[10] S. Shafieea,F. Almasganj, B. Vazirnezhad and A. Jafari: A two-stage speech activity detection system considering fractal aspects of prosody. Pattern Recognition Letters, vol 31(9) (2007), pp.936-948.

DOI: 10.1016/j.patrec.2009.12.014

Google Scholar

[11] J.W. Shin, J.H. Chang and N.S. Kim: Voice activity detection based on a family of parametric distributions. Pattern Recognition Letters, vol 28(11) (2007), pp.1295-1299.

DOI: 10.1016/j.patrec.2006.11.015

Google Scholar

[12] R. Bakis: Continuous speech word recognition via centisecond acoustic states. In Proc. ASA Meeting, Washington, DC 179, (1976), pp.2273-2282.

Google Scholar

[13] Y. Hu and P. Loizou: Evaluation of objective quality measures for speech enhancement. IEEE Tran. Speech Audio Process, vol 16(1) (2008), pp.229-238.

DOI: 10.1109/tasl.2007.911054

Google Scholar

[14] K. Kokkinos and P. Maragos: Nonlinear speech analysis using models for chaotic systems, IEEE Tran. Speech Audio Process, vol 13 (2005), pp.1098-1109.

DOI: 10.1109/tsa.2005.852982

Google Scholar

[15] M. Banbrook and S. McLaughlin: Is speech chaotic?: Invariant geometrical measures for speech data, IEEE Colloquium on Exploiting Chaos in Signal Processing, vol 16(8) (1994), pp.1-8.

Google Scholar

[16] R. Esteller, G. Vachtsevanos, J. Echauz and B. Litt: Finding representative patterns with ordered projections. IEEE Trans. Circuits Syst., vol 48(2) (2001), pp.177-183.

DOI: 10.1109/81.904882

Google Scholar

[17] B. Luo, Z. Pei, L. Xu, D. Hu: A New Method Based on HMMs and K-means Algorithms for Noise-Robust Voice Activity Detector, Applied Mechanics and Materials Vols 128-129 (2012), pp.461-464.

DOI: 10.4028/www.scientific.net/amm.128-129.461

Google Scholar