Equivalence Class Based Parallel Algorithm for Mining MFI

Article Preview

Abstract:

We present a novel and powerful parallel algorithm, PMFI, for mining all the maximal frequent itemsets from a big database. PMFI utilizes novel technologies to make the I/O overhead down drastically. The key principle is to utilize prefix-based equivalence classes to decompose the search space. It distributes the work among the processors by equivalence class weights. It re-represents the database with vertical format, so the frequency counting can be done by simple tid-list intersection operations. It bases a novel serial algorithm MaxMining which utilizes multiple-level backtrack pruning strategy, so that each processor can count the maximal frequent itemsets independently by selectively duplicating the pieces of database. These techniques eliminate the need for synchronization. The dynamic load balance schema is applied in PMFI, it would be hopeful to achieve better performance.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1712-1715

Citation:

Online since:

January 2015

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Xingdong Wu, Gong-Qing Wu, Wei Ding. Data Mining with Big Data[J]. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 1, January 2014. pp.97-107.

DOI: 10.1109/tkde.2013.109

Google Scholar

[2] A. Rajaraman and J. Ullman, Mining of Massive Data Sets. Cambridge University Press, (2011).

Google Scholar

[3] A. Labrinidis and H. Jagadish, Challenges and Opportunities with Big Data, Proc. VLDB Endowment, vol. 5, no. 12, 2012, p.2032-(2033).

DOI: 10.14778/2367502.2367572

Google Scholar

[4] E. Schadt, The Changing Privacy Landscape in the Era of Big Data, Molecular Systems, Vol. 8, (2012).

Google Scholar

[5] C. Wang, S.S.M. Chow, Q. Wang, K. Ren, and W. Lou, Privacy-preserving Public Auditing for Secure Cloud Storage, IEEE Trans. Computers, vol. 62, no. 2 , Feb. 2013: pp.362-375.

DOI: 10.1109/tc.2011.245

Google Scholar

[6] Zdanowicz, J. Detecting money laundering and terrorist financing via data mining. Communications of the ACM, 2004, pp.53-55.

DOI: 10.1145/986213.986239

Google Scholar

[7] Stephen Lavelle, Fabricating Synthetic Data in Support of Training for Domestic Terrorist Activity Data Mining Research[D]. Naval Postgraduate School, Monterey, CA, (2010).

Google Scholar

[8] Office of the Director of National Intelligence. Data Mining Report. Office of the Director of National Intelligence, (2009).

Google Scholar

[9] L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang, Distributed data mining: A survey. Information Technology and Management, 2012, pp.403-409.

Google Scholar

[10] A. Ghoting, P. Kambadur, E. Pednault, and R. Kannan. NIMBLE: A toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. in: Proc. of 2011 ACM SIGKDD, pp.334-342.

DOI: 10.1145/2020408.2020464

Google Scholar

[11] Fumarola, Fabio, A parallel algorithm for approximate frequent itemset mining using MapReduce, in: proceedings of 2014 International Conference on High Performance Computing & Simulation (H PCS), Bologna, Italy 21-25 , July 2014 , pp.335-342.

DOI: 10.1109/hpcsim.2014.6903705

Google Scholar

[12] Farzanyar Z. , Cercone N. , Accelerating Frequent Itemsets Mining on the Cloud: A MapReduce -Based Approach, in: proceedings of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), pp.592-598.

DOI: 10.1109/icdmw.2013.106

Google Scholar