Voice Activity Detection with Decision Trees in Noisy Environments

Article Preview

Abstract:

An improved project based on double thresholds method in noisy environments is proposed for robust endpoints detection. Firstly, in this method, the distribution of zero crossing rate (ZCR) on the preprocessed signal is taken into account, and then the speech signal is divided into different parts to obtain appropriate thresholds with decision trees on the basis of the ZCR distribution. Finally, the double thresholds method, focusing on different importance of the energy and ZCR, is taken in the corresponding situation to determine the input segment is speech or non-speech. Simulation results indicate that the proposed method with decision trees obtains more accurate data than the traditional double thresholds method.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

749-752

Citation:

Online since:

October 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Rabiner L. R and Sambur M. R, in: An algorithm for determining the endpoints of isolated utterances [J]. Bell System Technical Journal, 1975, 54: 297-315.

DOI: 10.1002/j.1538-7305.1975.tb02840.x

Google Scholar

[2] Misra Hemant, Ikbal Shajith etc, in: Multi-resolution spectral entropy feature for robust ASR. ICASSPC05, 2005, 1: 253-256.

Google Scholar

[3] Wu Bingfei and Wang Kunching, in: Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Trans on Speech Processing, 2005, 13: 762-775.

DOI: 10.1109/tsa.2005.851909

Google Scholar

[4] Long Hainan and Zhang Cuigai, in: An improved method for robust speech endpoint detection, Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 2009: 2067-(2071).

DOI: 10.1109/icmlc.2009.5212154

Google Scholar

[5] Martin Arnaud and Mauuary Laurent, in: Robust speech/non-speech detection based on LDA-derived parameter and voicing parameter for speech recognition in noisy environments, Speech Communication, 2006, 48: 191-206.

DOI: 10.1016/j.specom.2005.07.005

Google Scholar

[6] Hsieh Cheng-Hsiung, Feng Ting-Yu and Huang Po-Chin, in: Energy-based VAD with grey magnitude spectral subtraction. Speech Communication, 2009, 51: 810-819.

DOI: 10.1016/j.specom.2008.08.005

Google Scholar

[7] Liu Qingsheng, Xu Xiaopeng and Huang Wenhao, in: Research on a speech endpoint method, Computer Engineering, 2003, 29: 120-121.

Google Scholar