Adjustment Method between Phonological Attributes and Phone Boundaries

Article Preview

Abstract:

Two kinds of imperfections, namely the detection errors and the asynchrony between phonological attributes and phone boundaries, can cause a substantial decline in recognition accuracy of a detection-based automatic speech recognition system. To solve these problems, an adjustment method between phonological attributes and phone boundaries is proposed in this paper. At first the prior knowledge of corpus and the detection results are combined, then the asynchronies in the phone boundary area are compensated and the detection errors are corrected; additionally, by selectively deleting some frames with errors, the precision of the phone models are improved. After adoption of this adjustment method, 1.4% of phoneme recognition rate can be improved in the TIMIT phone classification experiments based on Conditional Random Fields.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

316-321

Citation:

Online since:

October 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Chin-Hui Lee, Mark A. Clements, Sorin Dusan. An Overview on Automatic Speech Attribute Transcription (ASAT), . In Proceeding of Interspeech 2007, Antwerp Belgium, 2007, p.1825–1828.

DOI: 10.21437/interspeech.2007-509

Google Scholar

[2] Afsaneh Asaei, Benjamin Picart, Hervé Bourlard. Analysis of Phone Posterior Feature space Exploiting Class-Specific Sparsity And MLP-Based Similarity Measure,. IEEE International Conference on ICASSP. Dallas, TX: 2010, p.4886–4889.

DOI: 10.1109/icassp.2010.5495121

Google Scholar

[3] S. King, P. Taylor. Detection of phonological features in continuous speech recognition using neural networks,. Computer, Speech and Language, 2000, 14(4), p.333–353.

DOI: 10.1006/csla.2000.0148

Google Scholar

[4] J. Morris, E. Fosler-Lussier. Further Experiments With Detector-Based Conditional Random Fields In Phonetic Recognition,. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007, p.441–444.

DOI: 10.1109/icassp.2007.366944

Google Scholar

[5] M. Wester, J. Frankel, and S. King. Asynchronous Articulatory Feature Recognition Using Dynamic Bayesian Networks,. Computer Speech & Language, Vol. 21, Issue 4, October, 2007, p.620–640.

DOI: 10.1016/j.csl.2007.03.002

Google Scholar

[6] L. Bosch ten, H. Baayen, and M. Ernestus, On Speech Variation and Word Type Differentiation by Articulatory Feature Representations,. In Proceedings of Interspeech, Pittsburgh, 2006, p.2230–2233.

DOI: 10.21437/interspeech.2006-319

Google Scholar

[7] P. Jyothi, K. Livescu, E. Fosler-Lussier. Lexical Access Experiments With Context-Dependent Articulatory,. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011, p.4900–4903.

DOI: 10.1109/icassp.2011.5947454

Google Scholar

[8] John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data,. Proceedings of Machine Learning. Morgan Kaufmann, SanFrancisco, CA, 2001, p.282–289.

DOI: 10.1145/1015330.1015422

Google Scholar

[9] Bingxi Wang, Dan Qu, Xuan Peng. Practical fundamentals of speech recognition,. National Defence Industry Press. (2005).

Google Scholar

[10] N. Strom, . The NICO Artificial Neural Network Toolkit,. http: /nico. nikkostrom. com.

Google Scholar

[11] R. Prabhavalkar, E. Fosler-Lussier, K. Livescu. A Factored Conditional Random Field Model For Articulatory Feature Forced Transcription,. IEEE workshop on ASRU, Hawaii, USA, (2011).

DOI: 10.1109/asru.2011.6163909

Google Scholar