Combining Speech Enhancement and Cepstral Mean Normalization for LPC Cepstral Coefficients

Article Preview

Abstract:

A mismatch between the training and testing in noisy circumstance often causes a drastic decrease in the performance of speech recognition system. The robust feature coefficients might suppress this sensitivity of mismatch during the recognition stage. In this paper, we investigate the noise robustness of LPC Cepstral Coefficients (LPCC) by using speech enhancement with feature post-processing. At front-end, speech enhancement in the wavelet domain is used to remove noise components from noisy signals. This enhanced processing adopts the combination of discrete wavelet transform (DWT), wavelet packet decomposition (WPD), multi-thresholds processing etc to obtain the estimated speech. The feature post-processing employs cepstral mean normalization (CMN) to compensate the signal distortion and residual noise of enhanced signals in the cepstral domain. The performance of digit speech recognition systems is evaluated under noisy environments based on NOISEX-92 database. The experimental results show that the presented method exhibits performance improvements in the adverse noise environment compared with the previous features.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 474-476)

Pages:

349-354

Citation:

Online since:

April 2011

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] L.H. Chang, L.S. Young: Noise-Robust Speech Recognition Using Top-Down Selective Attention With an HMM Classifier. IEEE Signal Processing Letters Vol. 14 (2007), pp.489-491.

DOI: 10.1109/lsp.2006.891326

Google Scholar

[2] C. Xiaodong, G. Yifan: A Study of Variable- Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition. IEEE Trans. on Audio, Speech, and Language Processing Vol. 15 (2007), pp.1366-1376.

DOI: 10.1109/tasl.2006.889791

Google Scholar

[3] J.W. Picone: Signal modeling techniques in speech recognition. Proc. IEEE, Vol. 81 (1993), pp.1215-1247.

DOI: 10.1109/5.237532

Google Scholar

[4] H. Hermansky: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. Vol. 87 (1990), pp.1738-1752.

Google Scholar

[5] H. Hermansky, N. Morgan: RASTA Processing of Speech. IEEE Trans. on Speech and Audio Processing Vol. 2 (1994), pp.578-589.

DOI: 10.1109/89.326616

Google Scholar

[6] W. Zhenli, Z. Xiongwei, Z. Xiang: A new wavelet domain speech enhancement method. Signal Processing Vol. 22(2006), pp.325-328. (in Chinese).

Google Scholar

[7] F.H. Liu, A. Acero, and R. Stern: Efficient Joint Compensation of Speech For the Effects of Additive Noise and Linear Filtering. IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 1 (1992), pp.257-260.

DOI: 10.1109/icassp.1992.225923

Google Scholar

[8] O. Viildu, D. Bye, K. Iaurila: A recursive feature vector normalization approach for robust speech recognition in noise [A]. Proceedings'ICASSP'98 [C]. Seattle, WA, USA: IEEE Acoustics, Speech and Signal Processing Society, 1998, pp.733-736.

DOI: 10.1109/icassp.1998.675369

Google Scholar