Formant Speech Synthesis Based on Trainable Model

Article Preview

Abstract:

The authors proposed a trainable formant synthesis method based on the multi-channel Hidden Trajectory Model (HTM). In the method, the phonetic targets, formant trajectories and spectrum states from the oral, nasal, voiceless and background channels were designed to construct hierarchical hidden layers, and then spectrum were generated as observable features. In model training, the phonemic targets were learned from one-hour training speech data and the boundaries of phonemes were also aligned. The experimental results showed that the speech could be reconstructed with the formant trainable model by a source-filter synthesizer.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1334-1337

Citation:

Online since:

February 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] D.H. Klatt, Review of text-to-speech conversion for English, J. Acoust. Soc. Am. 82(3): 737-793, (1987).

Google Scholar

[2] K. Tokuda and T. Masuko, et al., An Algorithm for Speech Parameter Generation from Continuous Mixture HMMs with Dynamic Features, EUROSPEECH'95, Madrid, Spain (1995).

DOI: 10.21437/eurospeech.1995-173

Google Scholar

[3] R. D. Donavan, Trainable Speech Synthesis, doctor thesis of Cambridge University (1996).

Google Scholar

[4] J. Bridle, et al., An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition", in Final Report for the 1998 Workshop on Language Engineering, Center for Language and Speech Processing (1998).

Google Scholar

[5] M. J. Russell and P. J. B. Jackson, A Multiple-level Linear/Linear segmental HMM with a formant-based intermediate layer, Computer Speech and Language, 19 (2005) 205-225.

DOI: 10.1016/j.csl.2004.08.001

Google Scholar

[6] L. Deng, D. Yu, and A. Acero, A Bidirectinal Target-Filtering Model of Speech Coariticulaiton and Reduction: Two-Stage Implementation for Phonetic Recognition, IEEE Trans. Audio, Speech and Language Proc. 14 (2006) 256-265.

DOI: 10.1109/tsa.2005.854107

Google Scholar