Partition Real Data in Decision Tree Using Statistical Criterion

Gui Jun Shan

doi:10.4028/www.scientific.net/AMM.380-384.1469

Paper Titles

The Symmetrical Least Square Method on Zero-Crossing Linear Fitting
p.1448

An Effective Optimization Method Based on the Genetic Algorithm to Solve TSP
p.1454

A Knowledge Rule Mining Method for the Evaluation of Library Readers Satisfaction Rate
p.1460

Improved Genetic Algorithms for Software Testing Cases Generation
p.1464

Partition Real Data in Decision Tree Using Statistical Criterion
p.1469

The Application of Simulation Technology in the Inventory System
p.1473

Fast Block Matching Algorithm for Ray Space Data Compression
p.1477

A Hierarchical Conflict Resolution Method for Multi-Robot Path Planning
p.1482

A Comparative Analysis of the Direct Clustering Algorithms Based on Different Similarity Measure of Vague Sets
p.1488

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 380-384Partition Real Data in Decision Tree Using...

Partition Real Data in Decision Tree Using Statistical Criterion

Abstract:

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 380-384)

Pages:

1469-1472

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.380-384.1469

Citation:

Cite this paper

Online since:

August 2013

Authors:

Gui Jun Shan

Keywords:

C4.5, Decision Tree, Partition, Statistical Criterion

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] H. Liu, F. Hussain, C. L. Tan, and M. Dash, Discretization: an enabling technique, Journal of Data Mining and Knowledge Discovery, vol. 6, no. 4, p.393–423, (2002).

Google Scholar

[2] J. Dougherty, R. Kohavi, and M. Sahami, Supervised and unsupervised discretization of continuous feature, Proceedings of 12th International Conference of Machine learning, p.194–202, (1995).

DOI: 10.1016/b978-1-55860-377-6.50032-3

Google Scholar

[3] L. A. Kurgan and K. J. Cios, CAIM discretization algorithm, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, p.145–153, (2004).

DOI: 10.1109/tkde.2004.1269594

Google Scholar

[4] J. Catlett, On changing continuous attributes into ordered discrete attributes, In Proceedings of Fifth European Working Session on Learning. Berlin: Springer-Verlag, p.164–177, (1991).

DOI: 10.1007/bfb0017012

Google Scholar

[5] E. H. Tay and L. Shen, A modified chi2 algorithm for discretization, IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, p.666–670, (2002).

DOI: 10.1109/tkde.2002.1000349

Google Scholar

[6] C. T. Su and J. H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, p.437–441, (2005).

DOI: 10.1109/tkde.2005.39

Google Scholar

[7] M. Biba, F. Esposito, S. Ferilli, N. D. Mauro, and T. Basile, Unsupervised discretization using kernel density estimation, the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), p.696–701, (2007).

Google Scholar

[8] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, p.1022–1027, (1993).

Google Scholar

[9] A. Bondu, M. Boulle, V. Lemaire, S. Loiseau, and B. Duval, A non-parametric semi-supervised discretization method, 2008 Eighth IEEE International Conference on Data Mining (ICDM), p.53–62, (2008).

DOI: 10.1109/icdm.2008.35

Google Scholar

[10] E. Armengol and A. Garcia-Cerdana, Refining discretizations of continuous-valued attributes, Modeling of Decisions of Artificial Intelligence Conference, LNAI, Springer, Heidelberg, p.258–269, (2012).

DOI: 10.1007/978-3-642-34620-0_24

Google Scholar

[11] G. Salvador, L. Julian, S. J. Antonio, L. Victoria, and H. Francisco, A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, p.734–750, (2013).

DOI: 10.1109/tkde.2012.35

Google Scholar

[12] S. Hettich and S. D. Bay, The UCI KDD archive [db/ol], http: /kdd. ics. uci. edu/, (1999).

Google Scholar