Connected Mandarin Digit Speech Recognition Using Two-Layer Acoustic Universal Structure

Article Preview

Abstract:

Because of the single-syllable of Chinese words and the confusing nature of Chinese pronunciation, connected mandarin digit speech recognition (CMDSR) is a challenging task in the field of speech recognition. This paper applied a novel acoustic representation of speech, called the acoustic universal structure (AUS) where the non-linguistic variations such as vocal tract length, lines and noises are well removed. A two-layer matching strategy based on the AUS models of speech, including the digit and string AUS models, is proposed for connected mandarin digit speech recognition. The speech recognition system for connected mandarin digits is described in detail, and the experimental results show that the proposed method can obtain the higher recognition rate.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 846-847)

Pages:

1380-1383

Citation:

Online since:

November 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] R. -C. Shyu, J. -F. Wang and J. -Y. Lee: Improvement in connected mandarin digit recognition by explicitly modeling coarticulatory information, Journal of Information Science and Engineering, Vol. 16, No. 4, pp.649-660, (2000).

Google Scholar

[2] J. Gemmeke and B. Cranen: Missing data imputation using compressive sensing techniques for connected digit recognition, International Conference on Digital Signal Processing, pp.1-8, (2009).

DOI: 10.1109/icdsp.2009.5201176

Google Scholar

[3] Y. Deng, T. Huang and B. Xu: Towards high performance continuous mandarin digit string recognition, International Conference on Spoken Language Processing (ICSLP), (2000).

DOI: 10.21437/icslp.2000-617

Google Scholar

[4] W. Chao, S. Stephanie: Robust pitch tracking for prosodic modeling in telephone speech, International Conference on Acoustics and Signal Processing (ICASSP), pp.1343-1346, (2000).

DOI: 10.1109/icassp.2000.861827

Google Scholar

[5] T. Murakami, K. Maruyama, N. Minematsu and K. Hirose: Japanese vowel recognition using external structure of speech, Proceedings of Automatic Speech Recognition and Understanding, pp.203-208, (2005).

DOI: 10.1109/asru.2005.1566481

Google Scholar

[6] D. Zeng, Yibiao Yu: Voice conversion using structured Gaussian mixture model, International Conference on Signal Processing (ICSP), Beijing, pp.541-544, (2010).

DOI: 10.1109/icosp.2010.5656960

Google Scholar

[7] N. Minematsu, S. Asakawam and K. Hirose: Structural representation of the pronunciation and its use for CALL, Workshop on Spoken Language Technology, pp.126-129, (2006).

DOI: 10.1109/slt.2006.326833

Google Scholar