Applied Mechanics and Materials Vols. 333-335

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.333-335

Export:

MARCXML

ToC:

Table of Contents

Paper Title Page

Language-Pair Scoring Method Based on SVM for Language Recognition

Authors: Xian Liang Wang, Zhi Gang Wu, Ruo Hua Zhou, Yong Hong Yan

Abstract: Support vector machine one vs. rest classification and Gaussian backend classifier are commonly used for language recognition. The LDA matrix of Gaussian backend classifier is often singular when the traditional one vs. one classification is used, and the recognition performance is very poor. In this paper, we present language-pair scoring method, and the performance improved significantly after re-modeling the one vs. one scores of support vector machine. Our experiments are carried on NIST 2011 language recognition evaluation 30s data corpus. Results indicate that the proposed language-pair scoring method obtains equal to or even better performance than traditional one vs. rest classification for ivector and SVM-GSV language recognition systems. The experimental period is also shorten, and the linear fusion result of the proposed method and one vs. rest obtains significantly better performance.

737

Automatic Transcription of Piano Music Using Audio-Vision Fusion

Authors: Yu Long Wan, Zhi Gang Wu, Ruo Hua Zhou, Yong Hong Yan

Abstract: Over the last decade many sophisticated and application-specific methods have been proposed for transcription of polyphonic music. However, the performance seems to have reached a limit. This paper describes a high-performance piano transcription system with two main contributions. Firstly, a new onset detection method is proposed using a specific energy envelope matched filter, which has been proved very suitable for piano music. Secondly, a computer-vision method is proposed to enhance audio-only piano music transcription, using the recognition of the player's hands on the piano keyboard. We carried out comparable experiments respectively for onset detection and overall system based on the MAPS database and the video database. The results were compared with the best piano transcription system in MIREX 2008, which still kept the best performance in piano subset till MIREX 2012. The results show that the system outperforms the state-of-art method substantially.

742

An Improved Direct Current Prediction Method for Intra Coding

Authors: Chang Nian Chen, Rui Zhu

Abstract: Direct current mode is one of the primary prediction modes for intra coding. In the coding standard, it has not concerned the spatial correlations between neighboring pixels, which limits its coding performance. To address this problem, an improved prediction method is proposed in this paper. In the prediction scheme, prediction pixel uses its available neighbor samples and its value is set to be the mean of the references. The reference samples are not only from other encoder blocks but also pixels predicted in current block. Experimental results demonstrate that the proposed method advances in both coding performance and coding efficiency, compared with the existing method.

749

Improved Algorithm for Pitch Detection and Harmonic Separation

Authors: Yi Zhao, Sheng Zhang, Xiao Kang Lin

Abstract: In this paper, we have proposed a new algorithm for pitch detection and an idea for harmonic separation based on pitch detection. Firstly, we have introduced the pitch algorithm. It is mainly consisted of five parts: mean value removal, extraction of alternative pitch periods, best pitch transfer path search, accurate pitch period search with time-varying filter and the search of fractional pitch period. Then we have brought in a harmonic separation algorithm based on the pitch detection. The pitch detection algorithm and harmonic separation algorithm proposed in this paper is mutually beneficialExperiments results show that the new pitch detection algorithm can achieve higher accuracy. And compared with some other algorithms, this approach owns a better noise immunity. The harmonic separation algorithm can separate each harmonic signal accurately.

753

Acoustic Events Detection in Dissimilarity Measurement Space

Authors: Lin Bin Jia, Lin Li, Rong Nie

Abstract: The paper considers the problem of detecting acoustic events in a robust manner. The dissimilarity measurement is used to measure the distance between acoustic samples. Then this distance is used as the replacement of the Euclidean distance to build the detection model with the SVM algorithm. All the well-known features are considered when we build model in a way of feature subset ensemble. Experiments are conducted to detect events under a variety of environmental sounds. The model demonstrates the robustness of the ensemble method with dissimilarity measurement. The detection model has shown to produce comparable performance as human listeners.

764

Ensemble Learning Approach with Application to Chinese Dialect Identification

Authors: Yu Guo Xia, Ming Liang Gu

Abstract: In this paper we propose ensemble learning based approach to identify Chinese dialects. This new method firstly uses Gaussian Mixture Models and N-gram language models to produce a set of base learners. Then the two typical ensemble learning approach, Bagging and AdaBoost are conducted to combine the base learner to determine the dialect category. The ANN is selected as weak learner. The experimental results show that the ensemble approach not only enhances the performance of the system greatly, but also reduces the contradiction between the training data and the number of parameters in models.

769

Effect of Speaking Rate on Chinese Speech Recognition for Foreign Students

Authors: Shui Hui Cui, Jian Xin Peng

Abstract: Three room impulse responses with different reverberation time in mid-frequency were obtained at a listening position from a classroom model through room acoustical simulation. Chinese speech recognition for foreign students was performed under different speaking rate conditions using auralization method. Effect of reverberation time and speaking rate on Chinese speech recognition for foreign students was investigated. The results showed that reverberation time and speaking rate had a significant effect on Chinese recognition scores of foreign students. The shorter the reverberation time was and the slower the speaking rate was, the higher Chinese recognition scores of foreign students were. The longer the reverberation time was and the faster the speaking rate was, the lower Chinese recognition scores of foreign students were.

775

Design and Realization of the Birdsong Analysis and Identification System

Authors: Jun Zhou, Xing Wu, Yi Lin Chi, Pan Nan, Xu Wang, Yi Liu

Abstract: On the basis of LabVIEW platform, a birdsong analysis system –Song Lab which has variety of analysis functions and able to handle the large amount of data files was developed. The related key techniques of system design and development are researched in this paper, including large data processing, color adjustment, the processing of feature information, etc. The function of large data processing is used to solve data reading efficiency problem, color adjustment are applied to adjust the appearance of the spectrogram plot, and feature information is used to realize the characteristics of information extraction, analysis and comment. Finally, the birdsong analysis and identification system is built and some substantiation analysis has been made with the sampled data using this software, the results indicate that this system has accurate analysis results; it can provide a good technical support for researches in the field of animal acoustics.

779

A Novel Rate Control Algorithm for H.264/AVC Standard

Authors: Shu Qian He, Chun Shi, Zheng Jie Deng

Abstract: In this paper, we describe our rate distortion optimization framework for H.264/AVC standard. We investigate that the R-D characteristics of H.264 video signal in traditional transform-based video coding systems should be modeled for the Texture and Header components separately. Based on the proposed model, a R-D cost estimation function is also proposed to give a more accurate R-Q model. Built upon the above ideas, a rate control (RC) algorithm is developed for the H.264 encoder under the constant bit rate constraint. It is shown by experimental results that the new scheme can achieve better results in control bit rate and R-D performance compared to previously proposed approaches.

783

A Novel Header Bits Estimation Scheme for H.264/AVC Standard

Authors: Shu Qian He, Zheng Jie Deng, Chun Shi

Abstract: Rate estimation is useful for many H.264/AVC applications including rate-distortion optimization (RDO) for fast mode decision and precise rate control. In this paper, we propose a new header rate prediction model and an adaptive algorithm to provide more accurate estimation of the number of total coding bits for rate control compared to previously proposed methods. The header bit rate estimation is modeled by a linear combination of the number of mode block, and the sum of absolute values of all motion vectors for each block. Based on the proposed model, a header rate estimation function is also proposed to give a more accurate rate-distortion rate control. The proposed schemes can achieve better results in rate-distortion and rate control to previously proposed approaches.

787