Intrinsic Dimensional Correlation Discretization for Mining Task

Yu Sang; Hong Wen Song; Jun Zhao

doi:10.4028/www.scientific.net/AMM.404.548

Paper Titles

An Extrema Extension Method Based on Support Vector Regression for Restraining the End Effects in Empirical Mode Decomposition
p.526

An Improve Firefly Algorithm and its Application
p.533

A Novel Bat Algorithm of Solving III-Conditioned Linear Equation Group
p.538

Testing and Analysis of Several Global Optimization Algorithms
p.543

Intrinsic Dimensional Correlation Discretization for Mining Task
p.548

Near-Infrared Spectroscopy Signal Extraction and Processing Design
p.555

Experimental Analysis of Laser Scattering Patterns for the Surface Inspection of Crystalline Wafers in Solar Cell
p.560

The Preliminary Study of Wall Climbing Firefighting Rescue Robot for High-Rise and Super High-Rise Building
p.569

Automatic Current Sharing Technology in the DCS Power System Applications Research
p.575

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vol. 404Intrinsic Dimensional Correlation Discretization...

Intrinsic Dimensional Correlation Discretization for Mining Task

Abstract:

Discretization is a necessary pre-processing step of the mining task, and a way of performance improvement for many machine learning algorithms. Existing techniques mainly focus on 1-dimension discretization in lower dimensional data space. In this paper, we present an intrinsic dimensional correlation discretization technique in high-dimensional data space. The approach estimates the intrinsic dimensionality (ID) of the data by using maximum likelihood estimation (MLE). Further, we project data onto eigenspace of the estimated lower ID by using principle component analysis (PCA) that can discover the potential correlation structure in the multivariate data. Thus, all the dimensions of the data can be transformed into new independent eigenspace of the ID, and each dimension can be discretized separately in the eigenspace based on the promising Bayes discretization model by using outstanding MODL discretization method. We design a heuristic framework to find better discretization scheme. Our approach demonstrates that there is a significantly improvement on the mean learning accuracy of the classifiers than traditional discretization methods.

You might also be interested in these eBooks

Engineering Decisions for Manufacturing Systems

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volume 404)

Pages:

548-554

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.404.548

Citation:

Cite this paper

Online since:

September 2013

Authors:

Yu Sang, Hong Wen Song, Jun Zhao

Keywords:

Discretization, Intelligent System, Intrinsic Dimensionality (ID), Machine Learning (ML), Maximum Likelihood-Estimation (MLE), Principal Component Analysis (PCA)

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] H. Liu, F. Hussain, C. L. Tan, and M. Dash, Discretization: an enabling technique, Journal of Data Mining and Knowledge Discovery, vol. 6, no. 4, p.393–423, (2002).

Google Scholar

[2] C. T. Su and J. H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, p.437–441, (2005).

DOI: 10.1109/tkde.2005.39

Google Scholar

[3] C. J. Tsai, C. I. Lee, and W. P. Yang, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, vol. 178, p.714–731, (2008).

DOI: 10.1016/j.ins.2007.09.004

Google Scholar

[4] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, p.1022–1027, (1993).

Google Scholar

[5] M. Boulle, MODL: A bayes optimal discretization method for continuous attributes, Machine Learning, vol. 65, p.131–165, (2006).

DOI: 10.1007/s10994-006-8364-x

Google Scholar

[6] R. M. Jin, Y. Breitbart, and C. Muoh, Data discretization unification, the Seventh IEEE International Conference on Data Mining (ICDM Best Paper), p.183–192, (2007).

DOI: 10.1109/icdm.2007.35

Google Scholar

[7] S. D. Bay, Multivariate discretization for set mining, Knowledge and Information Systems, vol. 3, no. 4, p.491–512, (2001).

DOI: 10.1007/pl00011680

Google Scholar

[8] M. Mehta, S. Parthasarathy, and H. Yang, Toward unsupervised correlation preserving discretization, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 8, p.1–14, (2005).

DOI: 10.1109/tkde.2005.153

Google Scholar

[9] I. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, (1986).

Google Scholar

[10] E. Levina and P. J. Bickel, Maximum likelihood estimation of intrinsic dimension, Advances in NIPS, vol. 17, (2005).

Google Scholar

[11] J. Ramirez and F. G. Meyer, Machine learning for seismic signal processing: Seismic phase classification on a manifold, Proceedings of 10th International Conference on Machine Learning and Applications, p.382–388, (2011).

DOI: 10.1109/icmla.2011.91

Google Scholar

[12] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences, vol. 11, no. 5, p.341–356, (1982).

Google Scholar

[13] S. Hettich and S. D. Bay, The uci kdd archive [db/ol], http: /kdd. ics. uci. edu/, (1999).

Google Scholar