Voice Activity Detection Based on Multiple Statistical Models

Article Preview

Abstract:

One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. For a class of VAD algorithms based on Gaussian model and Laplacian model, we incorporate complex Laplacian probability density function to our analysis of statistical properties. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in stationary noise environment is evaluated with the aid of an objective measure.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 181-182)

Pages:

765-769

Citation:

Online since:

January 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Y. Ephraim and D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE trans. Acoustic, Speech, Signal Process., Vol. 32(1984), p.1109.

DOI: 10.1109/tassp.1984.1164453

Google Scholar

[2] Y. D. Cho and A. Kondoz: Analysis and improvement of a statistical model-based voice activity detector, IEEE Signal Process. Letters, Vol. 8(2001), p.276.

DOI: 10.1109/97.957270

Google Scholar

[3] N. S. Kim and J. -H. Chang: Spectral enhancement based on global soft decision, IEEE Signal Process. Letters, Vol. 7(2000), p.108.

Google Scholar

[4] R. Martin: Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors, IEEE Int. Conf. Acoustic., Speech, Signal Process., Vol. 1(2002), p.1253.

DOI: 10.1109/icassp.2002.1005724

Google Scholar

[5] S. Gazor and W. Zhang: Speech probability distribution, IEEE Signal Process. Letter, Vol. 10(2003), p.204.

Google Scholar

[6] J. Sohn, N. S. Kim, and W. Sung: A statistical model-based voice activity detection, IEEE Signal Process. Letters, Vol. 6(1999), p.1.

Google Scholar

[7] R. C. Reininger and J. D. Gibson: Distributions of the two dimensional DCT coefficients for images, IEEE Trans. Commun., Vol. 31(1983), p.835.

DOI: 10.1109/tcom.1983.1095893

Google Scholar

[8] J. Sohn and W. Sung: A voice activity detector employing soft decision based noise spectrum adaptation, ICASSP 1998. p.365.

DOI: 10.1109/icassp.1998.674443

Google Scholar

[9] J. -H. Chang and N. S. Kim: Speech enhancement: New approaches to soft decision, IEICE Trans. Vol. 27(2001), p.1231.

Google Scholar

[10] I. Cohen and B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process., Vol. 81(2001), p.2403.

DOI: 10.1016/s0165-1684(01)00128-1

Google Scholar

[11] I. Cohen: Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator, IEEE Signal Process. Letters, Vol. 9(2002), p.113.

DOI: 10.1109/97.1001645

Google Scholar

[12] O. Cappé: Elimination of musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Speech Audio Process., Vol. 2(1994), p.345.

DOI: 10.1109/89.279283

Google Scholar

[13] J. A. Haigh and J. S. Mason: Robust voice activity detection using cepstral feature, IEEE TELCON, China, 1993, p.321.

Google Scholar