Paper Titles

Processing Relational Top-N Queries with Text and Numeric Attributes
p.1326

AgCleaning: A Track Data Filling Algorithm Based on Movement Recency for RFID Track Data
p.1330

Research and Improvement on HMM-Based Face Recognition
p.1338

Managing Fragmented Database Replication for Mygrants Using Binary Vote Assignment on Cloud Quorum
p.1342

A Novel Weighted Dynamic Time Warping for Light Weight Speaker-Dependent Speech Recognition in Noisy and Bad Recording Conditions
p.1347

Seismic Data Denoising Simulation Research Based on Wavelet Transform
p.1356

Trajectory Pattern Mining: Methods and Applications
p.1361

Authentication Algorithm Based on Hash-Tree for Web Single Sign-On
p.1368

P300 Feature Extraction of Visual and Auditory Evoked EEG Signal
p.1374

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 490-491A Novel Weighted Dynamic Time Warping for Light...

A Novel Weighted Dynamic Time Warping for Light Weight Speaker-Dependent Speech Recognition in Noisy and Bad Recording Conditions

Article Preview

Abstract:

Lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a promising solution for the problems of possibility of disclosing personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English) names. Dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications, which have limited storage space and small vocabulary. In our previous work, we have successfully developed two fast and accurate DTW variations for clean speech data. However, speech recognition in adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy and bad recording conditions, such as too high or low recording volume, we introduce a novel weighted DTW method. This method defines a feature index for each time frame of training data, and then applies it to the core DTW process to tune the final alignment score. With extensive experiments on one representative SD dataset of three speakers' recordings, our method achieves better accuracy than DTW, where 0.5% relative reduction of error rate (RRER) on clean speech data and 7.5% RRER on noisy and bad recording speech data. To the best of our knowledge, our new weighted DTW is the first weighted DTW method specially designed for speech data in noisy and bad recording conditions.

You might also be interested in these eBooks

Mechanical Design and Power Engineering

Info:

Periodical:

Applied Mechanics and Materials (Volumes 490-491)

Pages:

1347-1355

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.490-491.1347

Citation:

Cite this paper

Online since:

January 2014

Authors:

Xiang Lilan Zhang*, Ji Ping Sun, Xu Hui Huang, Zhi Gang Luo

Keywords:

Computing Methodologies, Feature Index, Natural Language Processing, Speech Recognition, Weighted DTW

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] S. Furui, History and development of speech recognition, Speech Technology, no. doi: 10. 1007/978-0-387-73819-2-1., (2010).

[2] S. V. Chapaneri, Spoken digits recognition using weighted MFCC and improved features for dynamic time warping, International Journal of Computer Application, vol. 40, no. 3, pp.6-12, (2012).

DOI: 10.5120/5022-7167

[3] R. V. Cox, C. A. Kamm, L. R. Rabiner, J. Schroeter and J. G. Wilpon, Speech and language processing for next-millennum communications services, Proc. of the IEEE, vol. 88, no. 8, pp.1314-1337, (2000).

DOI: 10.1109/5.880086

[4] N. Y. Talking, Powerful New Language Tools Leverage AI, IEEE Intelligent Systems, vol. 27, no. 2, pp.2-7, (2012).

[5] G. E. Hinton, S. Osindero and Y. W. Teh, A Fast Learning Algorithm for Deep Belief Nets, Neural Computation, vol. 18, no. 7, pp.1527-1554, (2006).

DOI: 10.1162/neco.2006.18.7.1527

[6] G. E. Dahl, D. Yu, L. Deng and A. Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, IEEE Trans. on Audio, Speech, and Language Processing, vol. 20, no. 1, pp.30-42, (2012).

DOI: 10.1109/tasl.2011.2134090

[7] J. Sun, Y. Sun, K. Abida and F. Karray, A novel template matching approach to speaker-independent arabic spoken digit recognition, in AIS 2012, Aveiro, Portugal., (2012).

DOI: 10.1007/978-3-642-31368-4_23

[8] S. Kim, S. Park and W. Chu, An index-based approach for similarity search supporting time warping in large sequence databases, in Data Engineering, 2001 Proc. 17 th Conf. on, Heidelberg, Germany, (2001).

DOI: 10.1109/icde.2001.914875

[9] Y. Zhu and D. Shasha, Warping indexes with envelope transforms for query by humming, in SigMOD, San Diego, CA, (2003).

DOI: 10.1145/872757.872780

[10] M. Muller, H. Mattes and F. Kurth, An efficient multiscale approach to audio synchronization, in Proc. ISMIR, Victoria, BC, Canada., (2006).

[11] Y. Sakurai, M. Yoshikawa and C. Faloutsos, FTW: fast similarity search under the time warping distance, in PODS, Baltimore, Maryland., (2005).

DOI: 10.1145/1065167.1065210

[12] P. Papapetrou, V. Athistsos, M. Potamias, G. Kollios and D. Gunopulos, Embedding-based supsequence matching in time-series databases, " ACM Trans. on Database Systems, vol. 36, no. 3, p.17: 1-17: 39, 2011. A. Shanker and A. Rajagopalan, "Off-line signature verification using DTW, Pattern Recognition Letters, vol. 28, pp.1407-1414, (2007).

DOI: 10.1016/j.patrec.2007.02.016

[13] Jeong, Y. S., M. K. Jeong and O. A. Omitaomu, Weighted dynamic time warping for time series classification, Pattern Recognition, vol. 44, pp.2231-2240, (2011).

DOI: 10.1016/j.patcog.2010.09.022

[14] X. Zhang, J. Sun, Z. Luo and M. Li, Confidence Index Dynamic Ttime Warping for Language-Independent Embedded Speech Recognition, in ICASSP, Vancouver, Canada, (2013).

DOI: 10.1109/icassp.2013.6639236

[15] X. Zhang, J. Sun, Z. Luo and M. Li, Merge-weighted Dynamic Time Warping for Language-Independent Speaker-Dependent Embedded Speech Recognition, Journal of Computer Sicence and Techonology, 2013 (submitted).

[16] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, The HTK Book (for HTK Version 3. 4), Cambrideg, UK: Cambridge University Engineering Department, 2006, p.349.

[17] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, New Jersey: Prentice Hall, (1993).

[18] C. Levy, G. Linares and P. Nocera, Comparison of Several Acoustic Modeling Techniques and Decoding Algorithms for Embedded Speech Recognition Systems, in Workshop on DSP in Mobile and Vehicular Systems, Nagoya, Japan, (2003).