Multi-Class Logistic Regression Classifier with BFGS Method

Article Preview

Abstract:

The classical multi-class logistic regression classifier uses Newton method to optimize its loss function and suffers the expensive computations and the un-stable iteration process. In our work, we apply the state-of-art optimization techniques BFGS to train multi-class logistic regression and compare them with Newton method on the classification accuracy of 25 datasets experimentally. The results show that BFGS achieves better classification accuracy than the Newton method. Moreover, BFGS have the lower time complexity, in contrast with Newton method. Finally, we also observe that logistic classifier with BFGS method demonstrate comparable performance with the SVM classifier.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1912-1916

Citation:

Online since:

June 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] R. Johnson and D. Wichern, in: Methods of Multivariate Statistics, Wiley (2005).

Google Scholar

[2] Vladimir N.Vapnik, in: The Nature of Statistical Learning Theory, Springer (2000).

Google Scholar

[3] Christopher M. Bishop, in: Pattern Recognition and Machine Learning. Springer, New York. (2006)

Google Scholar

[4] J. Nocedal and S. J. Wright, in: Numerical Optimization, Springer (2006).

Google Scholar

[5] Xiao-Bo Jin, Cheng-Lin Liu, Xinwen Hou, in: Regularized Margin-based Conditional Log-likelihood Loss for Prototype Learning,43(7), 2428-2438 (2010).

DOI: 10.1016/j.patcog.2010.01.013

Google Scholar

[6] Shalev-Shwartz, S., Singer, Y., and Srebro, N. in: Pegasos: Primal Estimated Sub-gradient Solver for SVM. ICML (2007).

DOI: 10.1145/1273496.1273598

Google Scholar

[7] Bordes, A., Bottou, L., and Gallinari, P. in: SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent. JMLR, p.1737–1754 (2010).

Google Scholar

[8] Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng, in: on Optimization methods for Deep Learning. ICML (2011).

Google Scholar

[9] Vishwanathan, S. V. N., Schraudolph, N. N., Schmidt, M. W., and Murphy, K. P. in: Accelerated training of conditional random fields with stochastic gradient methods. ICML (2007).

DOI: 10.1145/1143844.1143966

Google Scholar

[10] Frank, A. & Asuncion, A. in: UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2010).

Google Scholar

[11] W. Wang, Z. Xu, W. Lu, X. Zhang, Determination of the spread parameter in the Gaussian kernel for classification and regression, Neurocomputing 55(3–4) 643–663(2003).

DOI: 10.1016/s0925-2312(02)00632-x

Google Scholar