A Spheriform Quantization Method Based on Sub-Region Inherent Dimension

Article Preview

Abstract:

Quantization methods are very significant mining task for presenting some operations, i.e., learning and classification in machine learning and data mining since many mining and learning methods in such fields require that the testing data set must include the partitioned features. In this paper, we propose a spheriform quantization method based on sub-region inherent dimension, which induces the quantified interval number and size in data-driven way. The method assumes that a quantified cluster of points can be contained in a lower intrinsic m-dimensional spheriform space of expected radius. These sample points in the spheriform can be obtained by adaptively selecting the neighborhood at initial observation based on sub-region inherent dimension. Experimental results and analysis on UCI real data sets demonstrate that our method significantly enhances the accuracy of classification than traditional quantization methods by implementing C4.5 decision tree.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

4244-4247

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] J. R. Quinlan, C4. 5: Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann, (1993).

Google Scholar

[2] J. Dougherty, R. Kohavi, and M. Sahami, Supervised and unsupervised discretization of continuous feature, Machine learning: Proc. 12th Intl Conf., p.194–202, (1995).

DOI: 10.1016/b978-1-55860-377-6.50032-3

Google Scholar

[3] C. T. Su and J. H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, p.437–441, (2005).

DOI: 10.1109/tkde.2005.39

Google Scholar

[4] C. J. Tsai, C. I. Lee, and W. P. Yang, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, vol. 178, no. 19, p.714–731, (2008).

DOI: 10.1016/j.ins.2007.09.004

Google Scholar

[5] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, p.1022–1027, (1993).

Google Scholar

[6] I. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, (1986).

Google Scholar

[7] E. Levina and P. J. Bickel, Maximum likelihood estimation of intrinsic dimension, Advances in NIPS, vol. 17, (2005).

Google Scholar