Comparison of Mel Frequency Ceptrum Coefficient and Perceptual Linear Predictive in Perceptual Measurement of Chinese Initials

Article Preview

Abstract:

Many works have been done in the methods of improving performance by proposing new speech characteristics and new perception measurements. However, they only focus on one of the two aspects. In this paper, we try to study the relationship between them. That is, we discuss which acoustic features or their combinations are the most consistent with the real perception of Chinese initials. We propose a method that can measure the acoustic distance and keep it monotonically related to the perceptual distance of Chinese initials. We first define the acoustic distance and perceptual distance between different Chinese initials, and single out a proper combination of acoustic features and two compatible distance metrics by conducting clustering analysis on the samples of all types of Chinese initials using MFCC and PLP. Based on the data provided by the General Hospital of the People's Liberation Army, we then calculate the acoustic distance and perceptual distance. Finally, we calculate the Spearman's rho between two types of distance corresponding to the two calculation method. The experiment results show that there is a relatively high strength of monotonic relationship with the selected acoustic features between two types of distance.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

291-297

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] J. Zhang, S. Qi, and S. Lv. An Study on Perceptual Structure of Chinese Initials., Acta Psychologica Sinica 1 (1981): 76-85.

Google Scholar

[2] J. Jia, Y. Wang, Y. Zhang, et al. An Investigation on Calculating Intelligibility Among Chinese Initials., PCC2012.

Google Scholar

[3] G. Huang, J. Jia, and L. Cai. A Study on Perceptual Metric Among Chinese Finals Based on LPC., PCC2010.

Google Scholar

[4] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, Journal Acoust. Soc. Amer., vol. 87, no. 4, p.1738–1752, (1990).

DOI: 10.1121/1.399423

Google Scholar

[5] Scharenborg, Odette, and M. P. Cooke. Comparing human and machine recognition performance on a VCV corpus., Proc. Workshop on Speech Analysis and Processing for Knowledge Discovery. (2008).

Google Scholar

[6] Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining: concepts and techniques. Morgan kaufmann, (2006).

Google Scholar

[7] Johnson, Stephen C. Hierarchical clustering schemes., Psychometrika 32. 3 (1967): 241-254.

DOI: 10.1007/bf02289588

Google Scholar

[8] E. Schukat-Talamazzini, Automatische Spracherkennung–Grundlagen, statistische Modelle und effiziente Algorithmen. Braunschweig: Vieweg, (1995).

Google Scholar

[9] Hönig, Florian, et al. Revising perceptual linear prediction (PLP)., Proceedings of INTERSPEECH. (2005).

DOI: 10.21437/interspeech.2005-138

Google Scholar

[10] Deza, Michel Marie, and Elena Deza. Encyclopedia of distances. Springer Berlin Heidelberg, (2009).

Google Scholar

[11] Murtagh, Fionn. Complexities of hierarchic clustering algorithms: State of the art., Computational Statistics Quarterly 1. 2 (1984): 101-113.

Google Scholar

[12] D. P. W. Ellis, PLP and RASTA (and MFCC, and inversion) in Matlab, 2005, online web resource. [Online]. Available: http: /www. ee. columbia. edu/dpwe/resources/matlab/rastamat.

Google Scholar

[13] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK book (for HTK version 3. 2), Cambridge University, Eng. Dept., 2002, techn. Report.

Google Scholar

[14] Boothroyd, Arthur. The performance/intensity function: an underused resource., Ear and hearing 29. 4 (2008): 479-491.

DOI: 10.1097/aud.0b013e318174f067

Google Scholar