Long Mandarin Spoken Term Detection Using Two-Stage Search

Article Preview

Abstract:

For efficient collection of speech recordings, the ability to search for spoken terms in the speech stream is an essential capability. Although the Chinese spoken term detection (STD) does not suffer the out-of-vocabulary (OOV) problem as English, it is still hard to retrieve the long spoken terms which contain four characters or more. In this paper, we details our approach for long Mandarin spoken term detection which combines the search on inverted index produced by speech recognizer and linear scan on syllable confusion network. First, we split the long spoken terms into syllables and search the syllables on the inverted index _le to get the segments which may contain the long spoken terms. Then we use a linear scan algorithm on syllable confusion networks (SCNs). On two Mandarin conversation telephone speech sets, we compare performance using the method proposed with that of the baseline syllable-based systems, and our approach gives satisfying performance gains over the others.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2720-2723

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] I. Szoke, P. Schwarz, P. Matejka, L. Burget, M. Kara_at, M. Fapso, and J. Cernocky, \Comparison of keyword spotting approaches for informal continuous speech, " in Ninth European Conference on Speech Communication and Technology, (2005).

DOI: 10.21437/interspeech.2005-69

Google Scholar

[2] T. Mertens and D. Schneider, \E_cient subword lattice retrieval for german spoken term detection, " in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009, pp.4885-4888.

DOI: 10.1109/icassp.2009.4960726

Google Scholar

[3] J. Shao, Q. Zhao, P. Zhang, Z. Liu, and Y. Yan, \A fast fuzzy keyword spotting algorithm based on syllable confusion network, " in Eighth Annual Conference of the International Speech Communication Association, (2007).

DOI: 10.21437/interspeech.2007-185

Google Scholar

[4] L. Mangu, E. Brill, and A. Stolcke, \Finding consensus in speech recognition: word error minimization and other applications of confusion networks, " Computer Speech & Language, vol. 14, no. 4, pp.373-400, (2000).

DOI: 10.1006/csla.2000.0152

Google Scholar

[5] D. Hakkani-Tur and G. Riccardi, \A general algorithm for word graph matrix decomposition, in Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP, 03). 2003 IEEE International Conference on, vol. 1. IEEE, 2003, pp. I{596.

DOI: 10.1109/icassp.2003.1198851

Google Scholar

[6] D. R. Miller, M. Kleber, C. -L. Kao, O. Kimball, T. Colthurst, S. A. Lowe, R. M. Schwartz, and H. Gish, \Rapid and accurate spoken term detection, " in Eighth Annual Conference of the International Speech Communication Association, (2007).

DOI: 10.21437/interspeech.2007-174

Google Scholar