Classification of Chinese Popular Songs Using a Fusion Scheme of GMM Model Estimate and Formant Feature Analysis

Article Preview

Abstract:

In this paper, a fusion scheme that combines Gaussian mixture model (GMM) calculations and formant feature analysis, called GMM-Formant, is proposed for classification of Chinese popular songs. Generally, automatic classification of popular music could be performed by two main categories of techniques, model-based and feature-based approaches. In model-based classification techniques, GMM is widely used for its simplicity. In feature-based music recognition, the formant parameter is an important acoustic feature for evaluation. The proposed GMM-Formant method takes use of linear interpolation for combining GMM likelihood estimates and formant evaluation results appropriately. GMM-Formant will effectively adjust the likelihood score, which is derived from GMM calculations, by referring to certain degree of formant feature evaluation outcomes. By considering both model-based and feature-based techniques for song classification, GMM-Formant provides a more reliable recognition classification result and therefore will maintain a satisfactory performance in recognition accuracy. Experimental results obtained from a musical data set of numerous Chinese popular songs show the superiority of the proposed GMM-Formant. Keywords: Song classification; Gaussian mixture model; Formant feature; GMM-Formant.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1006-1009

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] C. Wutiwiwatchai, S. Furui, Thai speech processing technology: A review, Speech Communication 49 (2007) 8–27.

DOI: 10.1016/j.specom.2006.10.004

Google Scholar

[2] M. Levy, M. Sandler, Structural segmentation of musical audio by constrained clustering, IEEE Transactions on Audio, Speech, Language Processing 16 (2008) 318–326.

DOI: 10.1109/tasl.2007.910781

Google Scholar

[3] H. Lukashevich, Towards quantitative measures of evaluating song segmentation, Proceedings of International Conference on Music Information Retrieval, 2008, p.375–380.

Google Scholar

[4] E. Peiszer, T. Lidy, A. Rauber, Automatic audio segmentation: Segment boundary and structure detection in popular music, Proceedings of International Workshop on Learning the Semantics of Audio Signals, (2008).

Google Scholar

[5] D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing 3 (1995) 72–83.

DOI: 10.1109/89.365379

Google Scholar

[6] C. H. You, K. A. Lee, H. Li, An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition, IEEE Signal Processing Letters 16(1) (2009) 49–52.

DOI: 10.1109/lsp.2008.2006711

Google Scholar

[7] P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Speaker and session variability in GMM-based speaker verification, IEEE Transactions on Audio, Speech, and Language Processing 15(4) (2007) 1448–1460.

DOI: 10.1109/tasl.2007.894527

Google Scholar