Rare Association Rule Mining


Petr Glos
Knowledge Discovery Lab
Faculty of Informatics
Masaryk University
glos@ics.muni.cz
Brno, October 18th 2011

Agenda
•References
•Introduction
•MSApriori
•RSAA
•Clustering for pre-procesing
•Temporal sequence associations
•Co-Location Patterns with Rare Events
•
•
•Questions
•
–

References
•Chandola V., Banerjee A. and Kumar V. Anomaly Detection: A Survey. ACM Computing Surveys, Vol. 41,
No. 3, July 2009
•
•Hyunyoon Y., Danshim H.,Buhyun H.,Keun H. R. Mining association rules on significant rare data
using relative support, The Journal of Systems and Software 67,  Elsevier, 2003
•
•Koh Y. S., Pears R.  Rare Association Rule Mining via Transaction Clustering, In Proc. Seventh
Australasian Data Mining Conference (AusDM 2008), Glenelg, South Australia. CRPIT
•
•Chen J., He H., Williams G., Jin H. Temporal Sequence Associations for Rare Events, Advances in
Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, 2004, Volume 3056/2004,
235-239
•
•Vilalta R., Ma S. Predicting Rare Events in Temporal Domains, Proccedings of the 2002 IEEE
International Conference on Data Mining
•
•
•
•
•
–

Introduction
•Association  Rule Mining
–Rule Ant(ecendent) => Con(sequent)  X  => Y
–High Support                                    a/(a+b+c+d)
–High Confidence                               a/(a+b)
–Supp > minSupp => frequent itemsets
–Conf > minConf => rules
–Apriori algorithm – k-1 itemsets => k itemsets
–
–
–
•Rare Association  Rule Minining
–Low Support
–High Confidence
–Supp < minSupp => rare itemsets
–Conf > minConf =>  “rare” rules
–Apriori algorithm extension or modification
–Seeking frequent patterns with occurrences before rare events
•
–
•
–
Suc
-Suc
Con
a
b
-Con
c
d

Multiple Support Apriori Algorithm
MSApriori
•Support depends on frequency of data items
•Minimum item support MIS for data item i
• MIS(i) = MI(i) if MI(i) > LS
•   = LS otherwise
•MI(i) =  β * f(i)
•0 <= β <= 1
•f(i)   data frequency
•LS    least support
•
•

Relative Support Apriori Algorithm
RSAA
•Significant rare data is one which its frequency in the database does not satisfy the minimum
support but appears associated with the specific data in high proportion of its frequency.
•
•1st support – used in process of frequent items discovery
•2nd support / used in process of rare items discovery
•1st support > 2nd support
•
•Relative support
• Rsup(i1,…ik) =  max{ sup(i1,…ik)/sup(i1) ,…, sup(i1,…ik)/sup(ik) }
•
•Group of itemsets satisfied 1st support
•Group of itemsets not satisfied 1st but satisfied 2nd support
•Iteration process to generate  “rare itemset” candidates
•
•
–

Rare Association Rule Mining via Transaction Clustering
•Pre-process by clustering transactions before performing association rule mining
–Common set of large items – min support treshold
–Seed Generation Phase  - based on relative support
–Allocation Phase – based on Jaccard similarity
•
•Apriori-Inverse on clusters generated
•
•minsup < sup(i) < maxsup
•
•
–

Temporal Sequence Associations for Rare Events
Predicting Rare Events in Temporal Domains
•Collection of entities εi  є E  (i=1,...,n)
•Event sequence – si = { (ei1, ti1) ,…, (eij, tij),…, (eini, tini) },
• (eij event type, ti1 timestamp)
•Target events T – events of given type from E
•Time window [ts,te], constant length
•Windowed segment  { (eip, tip) ,…, (eiq, tiq) },
•ts <=tip <= tip+1 <= … <= tiq <=te
•Target segment – window segment with first occurrence time of target event
•Supp(p) in T
•Risk ratio
•Interesting patterns for target events
•Seeking frequent patterns for occurences of rare events
•
–
–
•
–
•
–
•
–

Mining Co-Location Patterns with Rare Events from Spatial Datasets
•Co-Location Pattern C – group of spatial feature/events that are frequently co-located in the same
region.
•Spatial feature f is rare if its instances are substantially less than those of other features in
a co-location.
•
•Participation ratio - Wherever the feature f is observed, with probability pr(C,f), all other
features in C are also observed in neighbor-set.
•Participation index  - Wherever any feature from C is observed, with probability of at least
PI(C), all other features in C can be observed in neighbor-set.
•
•Seeking of Co-Location Patterns
•Modification of Apriori algorithm
•maxPrune algorithm
–
•
–
–
•
–
•
–
•
–

–
–Questions ?
–
–
–Thank you for your attention.
–
–
–
–
–