Partition Real Data in Decision Tree Using Statistical Criterion

Article Preview

Abstract:

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1469-1472

Citation:

Online since:

August 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] H. Liu, F. Hussain, C. L. Tan, and M. Dash, Discretization: an enabling technique, Journal of Data Mining and Knowledge Discovery, vol. 6, no. 4, p.393–423, (2002).

Google Scholar

[2] J. Dougherty, R. Kohavi, and M. Sahami, Supervised and unsupervised discretization of continuous feature, Proceedings of 12th International Conference of Machine learning, p.194–202, (1995).

DOI: 10.1016/b978-1-55860-377-6.50032-3

Google Scholar

[3] L. A. Kurgan and K. J. Cios, CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, p.145–153, (2004).

DOI: 10.1109/tkde.2004.1269594

Google Scholar

[4] J. Catlett, On changing continuous attributes into ordered discrete attributes, In Proceedings of Fifth European Working Session on Learning. Berlin: Springer-Verlag, p.164–177, (1991).

DOI: 10.1007/bfb0017012

Google Scholar

[5] E. H. Tay and L. Shen, A modified chi2 algorithm for discretization, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, p.666–670, (2002).

DOI: 10.1109/tkde.2002.1000349

Google Scholar

[6] C. T. Su and J. H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, p.437–441, (2005).

DOI: 10.1109/tkde.2005.39

Google Scholar

[7] M. Biba, F. Esposito, S. Ferilli, N. D. Mauro, and T. Basile, Unsupervised discretization using kernel density estimation, the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), p.696–701, (2007).

Google Scholar

[8] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, p.1022–1027, (1993).

Google Scholar

[9] A. Bondu, M. Boulle, V. Lemaire, S. Loiseau, and B. Duval, A non-parametric semi-supervised discretization method, 2008 Eighth IEEE International Conference on Data Mining (ICDM), p.53–62, (2008).

DOI: 10.1109/icdm.2008.35

Google Scholar

[10] E. Armengol and A. Garcia-Cerdana, Refining discretizations of continuous-valued attributes, Modeling of Decisions of Artificial Intelligence Conference, LNAI, Springer, Heidelberg, p.258–269, (2012).

DOI: 10.1007/978-3-642-34620-0_24

Google Scholar

[11] G. Salvador, L. Julian, S. J. Antonio, L. Victoria, and H. Francisco, A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, p.734–750, (2013).

DOI: 10.1109/tkde.2012.35

Google Scholar

[12] S. Hettich and S. D. Bay, The UCI KDD archive [db/ol], http: /kdd. ics. uci. edu/, (1999).

Google Scholar