p.2227
p.2231
p.2235
p.2239
p.2245
p.2249
p.2253
p.2258
p.2264
Design of Tibetan Continuous Speech Corpus Based on Triphone
Abstract:
Large vocabulary continuous speech recognition system performance largely depends on the quality of speech corpus and how to select corpus is the key of corpus design. By taking Tibetan Amdo dialect in XiaHe as the research object, this paper builds continuous speech corpus based on triphone. At first, we collected text corpus with 1000 thousand Tibetan sentences and transformed them into IPA according to real pronunciation in XiaHe dialect, and then summarized the structure of triphone juncture, analyzed the combination types and frequency of triphone in corpus statistically with text-processing platform in detail. At last by comprehensively considering coverage rate and sparseness of triphone and class-triphone we designed the algorithm for extraction of corpus and realized automatic selection to corpus.
Info:
Periodical:
Pages:
2245-2248
Citation:
Online since:
September 2014
Authors:
Keywords:
Price:
Сopyright:
© 2014 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: