Unsupervised Kernel Learning Vector Quantization

In this paper, we propose an unsupervised kernel learning vector quantization (UKLVQ) algorithm that combines the concepts of the kernel method and traditional unsupervised learning vector quantization (ULVQ). We first use the definition of the shadow kernel to give a general representation of the UKLVQ method and then easily implement the UKLVQ algorithm with a well-defined objective function in which traditional unsupervised learning vector quantization (ULVQ) becomes a special case of UKLVQ. We also analyze the robustness of our proposed learning algorithm by means of a sensitivity curve. In our simulations, the UKLVQ with Gaussian kernel has a bounded sensitivity curve and is thus robust to noise. The robustness and accuracy of the proposed UKLVQ algorithm is also demonstrated via numerical examples.


Introduction
The objective of vector quantization is to find a set of reference vectors 1 , , c W W that best represent a data set. The k-means clustering algorithm is the best known batch-type vector quantization method that minimizes the mean squared error cost function [1,2]. The problem with these batch-version quantization algorithms is that the process may not converge to an optimal input data set configuration [3]. However, by adapting the competitive learning neural network to vector quantization, the convergence rate and quantization quality can be improved.
Self-organizing map (SOM) [4,5] is a two-layer feed-forward competitive learning neural network that can discover the topological structure hidden in data and display it in one or two dimensional space. It is also widely used in the vector quantization problem [6][7][8]. Suppose that k W is the specified k th reference vector and the feature vector ( ) X t is inputted at time t, self-organization is then made using a two-step learning rule. First, it determines a winner neuron using the nearest neighbor condition: This means that the weight of neuron k matches best with ( ) X t . Second, it updates all neurons using the learning formula where ( ) t α is the learning rate factor confined to decrease monotonically with time t. The selection of ( ) t α for a ( ) i W t being an asymptotically unbiased estimate can be found in Ref. [9]. The most often used is ( ) Gaussian function [10,11]. Unsupervised learning vector quantization (ULVQ) adopts the winner-take-all neighborhood function to find a set of reference vectors i W so that the expected error value is minimized, where ( ) f x denotes the probability density function of X and 1 , , In this paper, we combine the concepts of the kernel method and traditional ULVQ to develop the unsupervised kernel learning vector quantization (UKLVQ) algorithm. In Section 2, we first give an overview of the kernel method and the shadow kernel then introduce our UKLVQ algorithm at the end of the section. We then illustrate the robustness of the algorithm by means of numerical examples in Section 3. Finally, we present our conclusions in Section 4.

Unsupervised Kernel Learning Vector Quantization
The advantages of using the kernel methods are that a robust non-Euclidean distance measure is induced and the robustness of the algorithm to noise and outliers is enhanced. In this section, we give an overview of these kernel methods and then introduce our proposed unsupervised kernel learning vector quantization (UKLVQ) algorithm.
Kernel Methods. Kernel methods have been widely studied and applied in pattern recognition and function approximation, and they are one of the most important subjects in machine learning [12][13][14]. The common concept underlying the kernel method is the transformation of the data space into a high-dimensional feature space F with : X F Φ → , where the inner products in F can be represented by a Mercer kernel function defined on the data space. A kernel ( , ) K x y in the feature space can then be denoted ( , ) , which is the inner product of ( ) If we choose the kernel functions that satisfy the conditions ( , ) 1 The most commonly used kernel is the Gaussian kernel The kernel method is implemented as the k-means algorithm by transforming the data space into a high-dimensional feature space and restricting the cluster center to being the sample/weighted means of the data points in their feature space [15][16][17][18][19][20]. That is, the distance between the data point and the cluster center can be computed from

Information Technology for Manufacturing Systems III
where i N denotes the number of j x that belong to cluster i C . Since we do not partition an online data set during the learning process, this implementation method is not available in an online sequential learning algorithm. Moreover, Zhang and Chen [12] have stated that these kernel clustering methods with cluster centers laid in the feature space may lack clear and intuitive descriptions. The Proposed Kernel Learning Algorithm. The second way to implement the kernel method is to transform both the data points and the cluster centers into the feature space-resulting in The objective of the unsupervised kernel learning vector quantization (UKLVQ) algorithm is to minimize the kernel-based expected error value For a more general representation of the UKLVQ algorithm, we now define the shadow kernels. Let where v is a constant. We then get and the gradient descent update equation for UKLVQ given as According to the definition of the shadow kernel, update equation (11) using kernel H will try to optimize the corresponding expected error objective (9) with shadow kernel K . (Note that one can adopt any suitable kernel H satisfying the conditions ( , ) 1 H x x = and ( , ) ( , ) H x y H y x = to create a UKLVQ algorithm with update equation (11).) However, these UKLVQ algorithms with an unknown shadow kernel will not optimize expected error objective (9).
If we choose the Gaussian kernel 2 2 ( , ) and our proposed UKLVQ algorithm will try to optimize the kernel-based expected error value using the following gradient descent update equation: In this paper, we adopt the Gaussian kernel function in our UKLVQ algorithm and the above update equation is used in all our simulations. The following is another example of UKLVQ.
using the following gradient descent update equation: This indicates that the traditional ULVQ is a special case of the UKLVQ with flat kernel.

Robust Analysis and Numerical Examples
Although a good clustering algorithm should have the ability to tolerate noise and outliers, only a small section of the existing literature discuss the robustness of an unsupervised learning algorithm.
Many criteria, such as sensitivity curves, breakdown points, local-shift sensitivity, gross error sensitivity, and influence functions, can be used to measure robustness. We will now use the sensitivity curve (SC) to discuss the robustness of our UKLVQ online learning algorithm. The sensitivity curve (SC) of the estimate µ for the sample 1 , , n x x is defined by as a function of location 0 x of the outlier. The SC statistic denotes the influence of an outlier point on the estimate. If there are a set of locational parameters 1 , , c µ µ that need to be estimated, we define the statistic * SC to measure the influence of an outlier point on these estimators with However, if the sensitivity curve is unbounded, an extreme outlier may cause problems. We will now give a simple example to simulate the * SC statistic of the UKLVQ and ULVQ.
We first generate 100 random samples from standard normal distribution (0,1) N and then add an outlier point 0 x into the data set. The coordinate of the outlier point is {1, 2, 3, …, 50}, respectively. Since the input order of the outlier will influence the learning performance, we consider two ways of inputting the outlier. The first is to input the outlier in order 1, which results in the inputted data order being 0 1 100 { , , , } x x x . The second is to input the outlier in order 101, which results in the inputted data order being 1 100 0 { , , , } x x x . The sensitivity curves of ULVQ for these two kinds of learning orders are denoted ULVQ_1 and ULVQ_101, respectively. Figure 1 shows that the sensitivity curve of ULVQ is dependent on the learning order and is a monotonically increasing function of the outlier

Information Technology for Manufacturing Systems III
coordinate. The resulting sensitivity curves of the UKLVQ for these two kinds of learning order are virtually identical, and give bounded outlier coordinates, as shown in Fig. 1. The details of these simulation results with the outlier coordinates {1, 2, 3, …, 20} are listed in Table 1.
The above example shows the robustness of our proposed UKLVQ algorithm. We also tested its accuracy in parameter estimations. Figure 2(a) shows a histogram of 500 data points that form a three-class normal mixture. The MSE values obtained by ULVQ, UKLVQ, and k-means versus time t are shown in Fig. 2(b). We added an outlier point with coordinate 50 to the mixture data that resulted in the MSE values shown in Fig. 2(c). Since the MSE values of ULVQ with outlier are always greater than 10, we did not include it in Fig. 2(c). Moreover, the k-means is a batch version algorithm and the MSE values will not change after algorithm convergence. This example demonstrates the robustness and accuracy of the UKLVQ algorithm in parameter estimation. Figure 3(a) shows a two-dimensional 16-group data set with sample size n = 800 consisting of 50 points in each of 16 clusters. The data points in each group are generated uniformly from each rectangle. We also added a noisy point with coordinate (10, 10), denoted by an asterisk. The dots and solid circles represent the data points and initial values. The vector quantization results (solid circles) obtained by k-means, ULVQ, and UKLVQ are illustrated in Figs. 3(b), 3(c), and 3(d), respectively. In this example, the k-means results cannot properly represent the data structure. However, it works well if the initial values are properly specified. This is a common problem of batch-version algorithms-that is, the quantization process may not converge to an optimal configuration of the input data set with a poor initialization. Although the ULVQ is less sensitive to initializations than k-means, it is influenced by the outlier. However, by adopting the kernel method of the ULVQ, the UKLVQ becomes robust to initials and noise with good quantization results.  Advanced Engineering Forum Vols. 6-7

Conclusions
We combined the concepts of kernel method and learning vector quantization to propose an unsupervised kernel learning vector quantization (UKLVQ) algorithm. In order to have a general representation of the UKLVQ method, we used the definition of shadow kernel to connect the relationship of the learning rule and kernel-based expected error objective. According to this general representation of the UKLVQ, we could easily implement a UKLVQ algorithm with a well-defined objective function-resulting in the traditional unsupervised learning vector quantization (ULVQ) with flat kernel becoming a special case of UKLVQ. We also analyzed the robustness of our proposed learning algorithm using sensitivity curves. In our simulations, the UKLVQ with Gaussian kernel had a bounded sensitivity curve, which indicates that it is robust to noise. The results of numerical examples also indicate that our proposed UKLVQ algorithm is robust and accurate.