Digital Storytelling Book Generator with MIDI-to-Singing

Article Preview

Abstract:

Creating a digital storytelling book is an important knowledge source for the blinds, but it usually takes a lot of time and efforts. In order to read the books from electronic contents, automatic procedures could be incorporated into a speech synthesis system. In this paper, we give a practical description using a free software Text-to-speech (TTS) program with a MIDI-to-Singing toolkit as a digital storytelling book generator. In this case, a certain amount of emotional TTS customization can be derived by using time-pitch manipulation of the synthesized acoustic waveform. MIDI-to-Singing voices can be generated automatically with special emphasis on lyrical or storytelling-styled contents that are usually discouraged by uninteresting natures of voices synthesized from traditional Text-to-speech (TTS) programs. Rule-based approaches rely on rules that describe the behavior of the pitch frequency along time to generate time-pitch values. Pitch values fluctuate within a certain range depending on the intended emotion. This MIDI-to-Singing voice synthesis relies on mapping the pitch frequency values to the 12 semi-tonal melodic scales and extracting semi-tonic intervals for each emotional state. In the current version of the system, a user can style the synthesized voice by selecting either male or female standard voice in combination with one of the predefined 12 expressive styles: Neutral, Monotonic, Lowly-pitched, Highly-pitched, Rising-pitched, Falling-pitched, Happy, Sad, Fear, Anger, Randomly-pitched, and Melody-aligning (singing) styles using a small set of musical notes. A subjective test shows that synthetic conversations based on MIDI-to-Singing with customized styles are more preferable, natural, intelligible and enjoyable than the traditional ones. Finally, the result of digital talking recordings can be heard on the web-site for the comparisons between human speech and MIDI-to-Singing synthesized speech.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

441-445

Citation:

Online since:

December 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] H.C. Shen and C.N. Lee, Playing MIDI-to-Singing songs in computer science class to reinforce concepts, 6th International WOCMAT & New Media Conference, Chungli, Taiwan, (2010).

Google Scholar

[2] M. Macon, L. Jensen-Link, J. Oliverio, M. Clements and E. George, A system for singing voice synthesis based on sinusoidal modeling, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP97, (1979).

DOI: 10.1109/icassp.1997.599668

Google Scholar

[3] MIDI-to-Singing Demo WWW: http: /140. 127. 182. 30/demo. php. Retrieved on July 28, (2011).

Google Scholar

[4] DAISY Consortium (2011). DAISY Standard, 2011. Retrieved July 28, 2011 from DAISY Consortium http: /www. daisy. org/daisy-standard.

DOI: 10.4016/40093.01

Google Scholar

[5] R. Sproat, A. Black, S. Chen, S. Kumar, M. Ostendorf and C. Richards, Normalization of nonstandard words, Computer Speech and Language, vol. 15, (2001) 287-333. http: /www. clsp. jhu. edu/ws99/projects/normal/slides/intro/nswintro. pdf.

DOI: 10.1006/csla.2001.0169

Google Scholar

[6] N. Chinathimatmongkhon, P. Punyabukkana and A. Suchato, HMM-based speech synthesis with direct glottal source and aspiration noise modeling in Proceedings of NCSEC2009 , Bangkok, Thailand, (2009).

Google Scholar

[7] B. Varveri. Audio-Mental System, Musical intervals, emotions and meanings. Psico Lab Online Journal, (2007).

Google Scholar

[8] J. J. Bharucha, T. C. Justus. Music Perception And Cognition, Stevens Handbook of Experimental Psychology (3rd Ed. ). New York: Wiley, (2002).

DOI: 10.1002/0471214426.pas0111

Google Scholar

[9] E. Moulines and J. Laroche, Non-Parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication, 16: 175-206, (1995).

DOI: 10.1016/0167-6393(94)00054-e

Google Scholar

[10] P. Messick, Maximum MIDI, Manning Publication, CT, U.S.A., pp.105-204, (1998).

Google Scholar

[11] Flinger (2011) www http: /speech. bme. ogi. edu/tts/flinger/fldoc. html.

Google Scholar

[12] Festival TTS: http: /www. cstr. ed. ac. uk/projects/festival.

Google Scholar