Maximal Frequent Item Sequences Mining of Datasets with few Attributes and Large Instances
This work proposes a new fast algorithm named MFISS-FG (maximal frequent item sequence sets fast generating) finding maximal frequent item sequences from relational database. It adapts to datasets with few attributes and large instances. Itemset is defined as IS (item sequence) for mining. Two lists called ISL (Item Sequence List) and FISL (Frequent Item Sequence List) are created by scanning database once for dividing n-IS into two categories depending on whether the IS to achieve minimum support number (n is the number of attributes). SIS (Sub item sequences) whose n-superset is in ISL are generated by recursion to make sure that each k-SIS appeared before its (k+1)-superset (k range from 1 to n-1). As current k-SIS being joined to FISL, its (k-1)-SIS are pruned (k range from 2 to n-1). At last, all SISs whose n-superset is in FISL are pruned from FISL to hold all maximal frequent item sequences. We compare our MFISS-FG and FP-Growth by a set of time-consuming experiments to prove the superiority of MFISS-FG both not only with increasing datasets but also with changing mini-support.
L. J. Zhou et al., "Maximal Frequent Item Sequences Mining of Datasets with few Attributes and Large Instances", Applied Mechanics and Materials, Vols. 44-47, pp. 3304-3308, 2011