A New Data Classification Algorithm for Data-Intensive Computing Environments

Qi Zhi Deng; Long Bo Zhang; Xin Qian; Ya Li Chen; Feng Ying Wang

doi:10.4028/www.scientific.net/AMR.756-759.3318

Paper Titles

Algorithm of Digital Watermark Based on Wavelet
p.3298

A Novel Digital Watermarking Algorithm for Medical Color Image
p.3303

Wavelet-Domain High Resolution Image Reconstruction
p.3309

Based on Local Structural Similarity Image Denoising Algorithm
p.3313

A New Data Classification Algorithm for Data-Intensive Computing Environments
p.3318

A Neural Network Based on Canonical Correlation for Multicollinearity Diagnosis
p.3324

A Principal Components Analysis Self-Organizing Neural Network Model and Computational Experiment
p.3330

A Novel Approach to Improve the Frequency Resolution Based on Sparse Representation
p.3336

High Quality Algorithm for Chinese Short Messages Text Clustering Based on Semantic
p.3341

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 756-759A New Data Classification Algorithm for...

A New Data Classification Algorithm for Data-Intensive Computing Environments

Abstract:

In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3318-3323

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.756-759.3318

Citation:

Cite this paper

Online since:

September 2013

Authors:

Qi Zhi Deng, Long Bo Zhang, Xin Qian, Ya Li Chen, Feng Ying Wang

Keywords:

Data-Intensive, Gini Index, MapReduce, SPRINT

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] W. Peng, M. Dan, Review of Programming Models for Data-Intensive Computing, 11th ed., vol. 47. Journal of Computer Research and Development, 2010, p.1993-(2002).

Google Scholar

[2] T. Richard, Kouzes, et al, The Changing Paradigm of Data-Intensive Computing, 1th ed., vol. 42, Computer, 2009, pp.26-34.

Google Scholar

[3] J. Dean, S. Ghemawat, Mapreduce: Simplified data processing on large clusters. In Symposium on Operating System Design and Implementation(OSDI), (2004).

Google Scholar

[4] T. Ashish, S. Joydeep, et al, Hive-A Warehousing Solution Over a Map-Reduce Framework, PVLDB, Vol. 2, no. 2, 2009, pp.1626-1629.

Google Scholar

[5] M. Mehta, R. Agrawal and J. Rissanen, SLIQ: A fast scalable classifier for data mining, Lecture Notes in Computer Science, Vol. 1057, Advances in Database Technology , 1996, pp.18-32.

DOI: 10.1007/bfb0014141

Google Scholar

[6] J. Shafer, R. Agrawal, M. Mehta, SPRINT: a Scalable Parallel Classifier for Data Mining, /Proceedings of the 22nd VLDB Conference Mumbai( Bombay). Mumbai M organ Kaufmann, 1996, pp.544-555.

Google Scholar

[7] D. Caragea, A. Silvescu, Decision tree induction from distributed heterogeneous autonomous data sources, In Proc of the Conference on intelligent Systems Design and Applications. USA, (2003).

DOI: 10.1007/978-3-540-44999-7_33

Google Scholar

[8] D. Nan, J. Genlin, Research and Implementation of ID3 Based on Distributed Database System, Journal of Nanjing Normal University (Engineering and Technology), Vol. 5, no. 4, 2005, pp.46-48.

Google Scholar

[9] P. Biswanath, S. Joshua, et al, PLANT: Massively Parallel Learning of Tree Ensembles with MapReduce, VLDB Endowment, 2009, pp.24-28.

Google Scholar