A Hybrid Algorithm of Mining Closed Itemsets for Large Databases

Article Preview

Abstract:

Data Mining means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining closed large itemsets is a further work of mining association rules, which aims to find the set of necessary subsets of large itemsets that could be representative of all large itemsets. In this paper, we design a hybrid approach, considering the character of data, to mine the closed large itemsets efficiently. Two features of market basket analysis are considered – the number of items is large; the number of associated items for each item is small. Combining the cut-point method and the hash concept, the new algorithm can find the closed large itemsets efficiently. The simulation results show that the new algorithm outperforms the FP-CLOSE algorithm in the execution time and the space of storage.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

292-296

Citation:

Online since:

December 2011

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] R. Agrawal and R. Srikant, ``Fast Algorithms for Mining Association Rules, Proc. of the 20th Int. Conf. on Very Large Data Bases, pp.487-499, (1994).

Google Scholar

[2] R. Agrawal and R. Srikant, ``Mining Sequential Patterns, Proc. of the 11th Int. Conf. on Data Engineering, pp.3-14, (1995).

Google Scholar

[3] J. Han, H. Cheng, D. Xin, and X. Yan, ``Frequent Pattern Mining: Current Status and Future Directions, Data Mining and Knowledge Discovery, Vol. 15, No. 1, pp.55-86, Aug. (2007).

DOI: 10.1007/s10618-006-0059-1

Google Scholar

[4] L. W. Huang and Y. I. Chang, ``An Efficient Graph-Based Approach to Mining Association Rules for Large Databases, International Journal of Intelligent Information and Database Systems, Vol. 3, No. 3, pp.274-259, (2009).

DOI: 10.1504/ijiids.2009.027686

Google Scholar

[5] C. Kamath, ``The Role of Parallel and Distributed Processing in Data Mining, Tech. Rep. UCRL-JC-142468, Newsletter of the IEEE Technical Committee on Distributed Processing, (2001).

Google Scholar

[6] C. Lucchese, S. Orlando, and R. Perego, ``Fast and Memory Efficient Mining of Frequent Closed Itemsets, IEEE Trans. on Knowledge and Data Engineering, Vol. 18, No. 1, pp.21-36, Jan. (2006).

DOI: 10.1109/tkde.2006.10

Google Scholar

[7] C. Lucchese, S. Orlando, and R. Perego, `` Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures, ICDM, pp.1-10, (2007).

DOI: 10.1109/icdm.2007.13

Google Scholar

[8] Y. Ohsawa, and K. Yada, ``Data Mining for Design and Marketing, Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, (2009).

DOI: 10.1201/9781420070224.axa

Google Scholar

[9] R. Srikant and R. Agrawal, ``Mining Sequential Patterns: Generalizations and Performance Improvements, Proc. of the 5th Int. Conf. on Extending Database Technology, pp.3-17, (1996).

DOI: 10.1007/bfb0014140

Google Scholar

[10] J. M. Wei, W. G. Yi, and M. Y. Wang, ``Novel Measurement for Mining Effective Association Rules, Knowledge-Based Systems, Vol. 19, No. 8, pp.739-743, Dec. (2006).

DOI: 10.1016/j.knosys.2006.05.011

Google Scholar

[11] M. A. Weiss, Data Structure and Algorithm Analysis in C++. Addison Wesley, (1993).

Google Scholar