Research on Parallel Association Rules Mining Algorithm Based on Hadoop

Article Preview

Abstract:

The purpose of association rules mining is to find rules which can meet the minimum support and minimum confidence from a large quantity of data. To find the valid association rules efficiently, we had a comprehensive analysis on some well-know parallel association rules mining algorithms and proposes a new parallel association rules mining algorithm (Array Based on Hadoop, short for ABH) based on the cloud computing platform. The ABH scans the database only once, uses the 0/1 array to represent one of the transactions and to record the frequency of the same transaction. Moreover, by utilizing the random access characteristics of the array and the special nature of the frequent itemset, the ABH can reduce the quantity of frequent candidate itemset effectively and find the frequent itemset quickly. We have compared the ABH with two classical algorithms CD and DD through experiment; we can find that ABH outperforms CD and DD.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

3625-3631

Citation:

Online since:

March 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Zhongzhi Shi. Knowledge Discovery(the second edition)[M]. Beijing: TsingHua University Press. 2011, 1, pages 140-183.

Google Scholar

[2] Agrawal R, Imieliski T, Swami A. 1993. Mining association rules between sets of items in large database. In: Proceedings of ACM SIGMOD International conference on Management of Data (SIGMOD'93), 207-216.

DOI: 10.1145/170035.170072

Google Scholar

[3] Agrawal R, Shafer J.C. 1996. Parallel mining of association rules: Design, Implementation and Experience. Special Issue in Data Mining, IEEE Trans, on Knowledge and Data Engineering, IEEE Computer Society, 8(6): 962-969.

DOI: 10.1109/69.553164

Google Scholar

[4] J.S. Park, M.S. Chen, P.S. Yu. Using a hash-based method with transaction trimming for mining association rules, IEEE Transactions on knowledge and data engineering, 1997, 9(5), 813-825.

DOI: 10.1109/69.634757

Google Scholar

[5] Han E H, Kaprypis G, Kumar V. 1997. Scalable parallel data mining for association rules[C]. Proceedings of ACM SIGMOD International Conference on Management of Data(SIGMOD'97), Tucson: ACM Press, Pages 277-288.

DOI: 10.1145/253260.253330

Google Scholar

[6] Han E H, Karypis G, Kumar V. Scalable parallel data mining for association rules[M]. ACM, (1997).

Google Scholar

[7] Zaiane O R, EI-Hajj M, Lu P. Fast Parallel Association Rule Mining Without Candidate Generation[M]. Technical Report TROI-12, Department of Computing Science, University of Alberta, Canada, (2001).

DOI: 10.1109/icdm.2001.989600

Google Scholar

[8] Cheung D W, Jiawei Han, Ng V T, etal. A Fast Distributed Algorithm for Mining Association Rules[C]. Proceedings of IEEE 4th International Conference Parallel and Distributed Information Systems. Miami Beach, Florida, 1996, 31-44.

DOI: 10.1109/pdis.1996.568665

Google Scholar

[9] Cheung, D., Xiao, Y. Effect of data skewness in parallel mining of association rules[J], Lecture Notes in Computer Science, Volume 1394, Aug 1998, Pages 48-60.

DOI: 10.1007/3-540-64383-4_5

Google Scholar

[10] Manning, A., Keane, J., Data Allocation Algorithm for Parallel Association Rule Discovery[J]. Lecture Notes in Computer Science, Volume 2035, Page 413-420.

DOI: 10.1007/3-540-45357-1_44

Google Scholar

[11] Apache Hadoop. Hadoop [EB/OL]. http: /hadoop. apache. org.

DOI: 10.1002/9781119281320.ch7

Google Scholar

[12] Apache HDFS. HDFS [EB/OL]. http: /hadoop. apache. org/hdfs.

DOI: 10.1007/978-1-4842-2424-3_2

Google Scholar

[13] Apache MapReduce. MapReduce[EB/OL]. http: /hadoop. apache. org/mapreduce.

Google Scholar

[14] Dean J, Ghemawat S. MapReduce: Simplied data processing on large clusters[C]. OSD I'04: Proceedings of the 6th Symposium on Operating System Design and Implementation. New Work: ACM Press, 2004: 137-150.

Google Scholar

[15] Jiawei Han. Micheline Kamber creation. FanMing, Xiaofeng Meng translation. Data Mining: Concepts and Techniques[M]. Beijing: China Machine Press, 2007, 3, Pages 146-183.

Google Scholar