Parameter Optimization in KPCA for Rotating Machinery Feature Vector Dimensionality Reduction

This research aims at revealing the rules that the impact of kernel function and its parameters on the performance of kernel principle component analysis (KPCA) for dimensionality reduction. KPCA was performed on nine databases by using different kernel functions and a series of equal space kernel parameters. The relation charters between kernel parameters and the number of kernel principle components were constituted. It found that the Gussian kernel and its parameter above 25 are the best choice for rotating machinery feature vector dimensionality reduction by using KPCA. This study presents a reference and gist for the application of KPCA in rotating machinery fault diagnostic case.


Introduction
The original features extracted from test data in mechanical fault diagnosis can characterize device state [1].Some features are valid to classification while others are invalid feature.Feature selection and extraction is necessary, which could remove invalid features and reduce feature space dimension to ensure the performance of classifiers and reduce the classification complexity [2].PCA is a common method for feature dimension reduction.But it takes linear combination for feature extraction.The problem of random association between different model category and feature vector is handled difficultly by PCA [3][4].Kernel principle component analysis (KPCA) is a nonlinear PCA developed by using the kernel method.It has a better performance than PCA for handling the nonlinear problem.Rotating mechanical failures often show a nonlinear behavior, the method based on KPCA is more suitable for rotating machinery fault feature dimensionality reduction [5][6].
KPCA is introduced into kernel function, so kernel function and its parameters selection must be involved.Existing studies showed that the kernel function and its parameters have a significant influence on the performance of KPCA [7][8].According to this problem, a method that searches an approximate kernel function parameter value in a finite range based on the similarity between the given matrix and the kernel function matrix was proposed by Wang Xinfeng, Qiu Jing et al [9].However, the kernel function and its parameters of reference matrix were assumed, while suitable reference matrix may not be found in practical application.The kernel function parameter optimized by particle swarm optimization algorithm was proposed by Wei Xiuye et al [10].Intelligent algorithm for kernel parameter optimization was widely researched in SVM, by using which can only get the final optimal value but cannot find the rules that the impact of kernel function and its parameters on the performance of KPCA [11][12].This paper took nine feature libraries as analysis purposes.They are time-domain statistical features, energy features and entropy features in different frequency band based on wavelet packet decomposition and empirical mode decomposition, time series features from rolling bearing, multi-sensor energy features, entropy features, correlative dimension features from gear.KPCA was performed on nine databases by using different kernel functions and a series of equal space kernel parameters.The rules of kernel function and its parameter optimization in KPCA was summarized, which can provide reference and gist for kernel function and its parameters optimization in KPCA for rotating machinery fault diagnostic case.

Theory
KPCA is one approach of generalizing linear PCA into nonlinear case using the kernel method.The idea of KPCA is to firstly map the original input vectors t x into a high-dimensional feature space ( ) x .By mapping t x into ( ) whose dimension is assumed to be larger than the number of training samples l , KPCA solves the eigenvalue problem (1). Where is the sample covariance matrix of ( ) u is the corresponding eigenvector.Eq. ( 1) can be transformed to the eigenvalue problem (2) .
(2) Where K is the l l × kernel matrix.The value of each element of K is equal to the inner product of two vectors i x and j x in the high-dimensional feature space ( ) . As the calculations of the dot product ( ) ( )

Mechatronics and Information Technology
Finally, based on the estimated i α , the principal components for t x is calculated by In addition, for making the sample input vectors in ( ) , in Eq. ( 3), the kernel matrix on the training set K and on the testing set t K are, respectively, modified by 1 1 ( 11 ) ( 11) where I is l -dimensional identity matrix.t l is the number of testing data points.1 l and 1 t l represent the vectors whose elements are all ones, with length l and t l , respectively.t K represents the t l l × kernel matrix for the testing data points.From Eq. ( 3), it can be found that KPCA can extract more number of principal components than PCA as the maximal number of principal components in KPCA is l , instead of m .Same as PCA, the dimension of t s can also be reduced in KPCA if only considering the first several eigenvectors.By using the kernel method to implement nonlinear PCA, the other properties of PCA as described in ( 1

Results and discussion
PCA / KPCA often take the principle components whose cumulative percentage is more than 0.85 in practical application.Because information of original feature vectors is concentrated in the principle components whose contribution ratio is high.After execute KPCA on 9 feature libraries， the relation between Gaussian and polynomial kernel parameters and the number of kernel principle component whose cumulative percentage is more than 0.85 are shown in Figure 2 and Figure 3.If it is the Gussian kernel, its parameter has a huge influence on KPCA for dimensionality reduction.The number of kernel principle components is even above 100 while the original feature vector dimension is under 10 if an inappropriate kernel parameter choice.As the kernel parameter increases, the number of kernel principle components is degressive.When the parameter is up to 2 5 , the number of kernel principle components begins to convergence.
Advanced Engineering Forum Vols.2-3 2. If it is the the polynomial kernel, its parameter has a smaller influence on KPCA for dimensionality reduction than using Gussian kernel.As the kernel parameter increases, the number of kernel principle components is degressive and convergence finally.
3.Cluster analysis found that the kernel principle component obtained a better cluster result by using the Gussian kernel than the polynomial kernel.

Conclusion
The Gussian kernel and its parameter above 2 5 are the best choice for rotating machinery feature vector dimensionality reduction by using KPCA.

(a)Feature library 1 Fig. 2 . 9 Fig. 3 .
Fig. 2. The impact of Gaussian kernel parameters on the performance of KPCA