Stochastic Gradient Descent Based K-Means Algorithm on Large Scale Data Clustering

Jie Ding; Li Peng Zhu; Bin Hu; Ren Long Hang; Yu Bao Sun

doi:10.4028/www.scientific.net/AMM.687-691.1342

Paper Titles

Study of Topic Life Cycle Based on Hierarchical HMM
p.1324

The Solution Method of Time-Varying Dynamics Model Based on Recursion
p.1328

Rough Fuzzy Set Based Strength Calculation Method for Transmission Tower
p.1333

Cross Sellingusing Association Rule Mining
p.1337

Stochastic Gradient Descent Based K-Means Algorithm on Large Scale Data Clustering
p.1342

The Predication of the Adhesion Rate Based on NARX Model
p.1346

Research on Spectral Clustering
p.1350

Modified Quadrature Method for Solving BIEs of Steady-State Anisotropic Heat Conduction Problems
p.1354

Learning Algorithm for Fuzzy Perceptron with Max-Product Composition
p.1359

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 687-691Stochastic Gradient Descent Based K-Means...

Stochastic Gradient Descent Based K-Means Algorithm on Large Scale Data Clustering

Abstract:

With the rapid advance of data collection and storage technique, it is easy to acquire tens of millions or even billions of data sets. How to explore and exploit the useful or interesting information for human beings from these data sets has become an urgent issue. Traditional k-means clustering algorithm has been widely used in data mining community. First, randomly initialize k clustering centres. Then, all instances are classified into k different classes according to their distances to clustering centres. Lastly, update the clustering centres by the mean of its corresponding constituent instances. This whole process will be iterated until convergence. Obviously, at each iteration, distance matrix from all instances to k clustering centres must be calculated which will cost so much time when encounter large scale data sets. To address this issue, in this paper, we proposed a fast optimization algorithm based on stochastic gradient descent (SGD). At each iteration, randomly choose an instance, search its corresponding clustering centre and then update it immediately. Experimental results show that our proposed method achieves a competitive clustering results with less time cost.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 687-691)

Pages:

1342-1345

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.687-691.1342

Citation:

Cite this paper

Online since:

November 2014

Authors:

Jie Ding, Li Peng Zhu, Bin Hu, Ren Long Hang, Yu Bao Sun

Keywords:

Data Mining (DM), K-Means Clustering, Stochastic Gradient Descent

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Aggarwal C C, Zhai C X. A survey of text classification algorithms [M]/Mining text data. Springer US, 2012: 163-222.

DOI: 10.1007/978-1-4614-3223-4_6

Google Scholar

[2] Joachims T. Making large scale SVM learning practical[J]. (1999).

Google Scholar

[3] Cesari L. Optimization—theory and applications [M]. Springer, (1983).

Google Scholar

[4] Meir R. Empirical risk minimization versus maximum-likelihood estimation: a case study [J]. Neural computation, 1995, 7(1): 144-157.

DOI: 10.1162/neco.1995.7.1.144

Google Scholar

[5] Battiti R. First-and second-order methods for learning: between steepest descent and Newton's method [J]. Neural computation, 1992, 4(2): 141-166.

DOI: 10.1162/neco.1992.4.2.141

Google Scholar

[6] Bottou L. Large-scale machine learning with stochastic gradient descent [M]/Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010: 177-186.

DOI: 10.1007/978-3-7908-2604-3_16

Google Scholar

[7] Hartigan J A, Wong M A. Algorithm AS 136: A k-means clustering algorithm [J]. Applied statistics, 1979: 100-108.

DOI: 10.2307/2346830

Google Scholar