Outline Motivation VFDR Multi-class Experiments Summary Very Fast Decision Rules for Multi-class Problems P. Kosina1 J. Gama2 1LIAAD-INESC Porto, FI MU Brno 2LIAAD-INESC Porto, FEP-University of Porto 27th Symposium On Applied Computing Outline Motivation VFDR Multi-class Experiments Summary Contents Motivation VFDR Multi-class Experiments Summary Outline Motivation VFDR Multi-class Experiments Summary Data Streams • Highly detailed, automatic, rapid data feeds. • Radar: meteorological observations. • Satellite: geodetics, radiation. • Astronomical surveys: optical, radio,. • Internet: traffic logs, user queries, email, financial, • Sensor networks: many more observation points ... • Most of these data will never be seen by a human! • Need for near-real time analysis of data feeds. Outline Motivation VFDR Multi-class Experiments Summary Data Stream Computational Model Data stream processing algorithms require: • small constant time per record; • restricted use of main memory; • be able to build a model using at most one scan of the data; • be able to detect and react to concept drift; • make a usable model available at anytime; • produce a model with similar performance to the one that would be obtained by the corresponding memory based algorithm, operating without the above constraints. Outline Motivation VFDR Multi-class Experiments Summary Decision Trees and Rule Sets • Decision trees and Rule Sets are almost equivalent: • High degree of interpretability Outline Motivation VFDR Multi-class Experiments Summary Rule Sets • Decision trees • A decision tree covers all the instance space • Each node has a context defined by previous nodes in the path • Large decision trees are difficult to understand because of a specific context established by the antecedent nodes • Rules • Each rule covers a specific region of the instance space • The Union of all rules can be smaller than the Universe • Rules can be interpretable per si: • Remove conditions in a rule without removing in another rule. • Loss the distinction between tests near the root and near the leaves. • Advantage of rule sets: modularity and consequently interpretability Outline Motivation VFDR Multi-class Experiments Summary Rule Learning Systems • Learning as search • Two approaches: • From the most general to the most specific: Top-down Model driven • From the most specific to the most general: Bottom-up Data driven • In this work we focus on top-down learning decision rules in multi-class classification problems. Outline Motivation VFDR Multi-class Experiments Summary Multi-class Rule Learning Systems Different strategies to handle multi-class problems: • direct multi-class find the best literal that discriminates between all the classes • one vs. all find the best literal that discriminates one (positive) class from all the other classes • all vs. all transform c-class problem into c×(c−1) 2 two-class problems learn a classifier for each pair of classes. Outline Motivation VFDR Multi-class Experiments Summary Rule Learning Systems direct multi-class one vs. all all vs. all Outline Motivation VFDR Multi-class Experiments Summary VFDR Learning • VFDR is one-pass, any-time, incremental algorithm for data stream classification • The classifier incrementally learns from labeled examples - creates new rules, - expands existing ones • A rule covers an example when all the literals are true for the example • A rule set is a set of rules plus a default rule • Default rule’s statistics are updated if none of the rules in the set covers the example Outline Motivation VFDR Multi-class Experiments Summary VFDR Rule Sets • VFDR starts from a default rule {} → L where L is a structure containing the sufficient statistics to expand rule and the information needed to classify examples. • A rule has the form of {set of literals} → Lr A rule covers an example when all the literals are true for the example • A rule set is a set of rules plus a default rule • Only the labeled examples covered by a rule update its Lr • Default rule’s statistics are updated if none of the rules in the set covers the example Outline Motivation VFDR Multi-class Experiments Summary VFDR Multi-class • Rule Expansion • Rule expansion considers new literals in a one vs. all fashion • Uses FOIL gain computed on its Lr • The expansion of a rule is controlled by Hoeffding bound • Prediction • Simple prediction strategy: uses the class distribution in Lr and selects the majority class • Bayes prediction strategy: select the class that maximizes posteriori probability given by the Bayes rule using the statistics in Lr . Outline Motivation VFDR Multi-class Experiments Summary VFDR Multi-class - FOIL’ Gain • Change in gain between rule r and a candidate rule r Gain(r , r) = s × log2 N+ N − log2 N+ N N: number of examples covered by r N+: number of positive examples covered by r N : number of examples covered by r N+ number of positive examples covered by r s % of true positives in r that are still true positives in r Measures the effect of adding another literal in the rule. Outline Motivation VFDR Multi-class Experiments Summary VFDR Multi-class - FOIL’ Gain Normalized gain: GainNorm(r , r) = Gain(r , r) N+ × − log2 N+ N Outline Motivation VFDR Multi-class Experiments Summary VFDR Multi-class - Two Approaches • VFDR-MC learns two types of rule sets: • Unordered • Rules are independent • All rules that cover an example update their statistics • Prediction using a weighted sum classification strategy • Ordered • First rule that covers an example updates its statistics • Prediction uses a first hit classification strategy Outline Motivation VFDR Multi-class Experiments Summary VFDR MC: Illustrative Example STAGGER SET: Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square, triangle} If Size = small and Color = red then true else false {} −→ L Outline Motivation VFDR Multi-class Experiments Summary VFDR MC: Illustrative Example STAGGER SET: Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square, triangle} If Size = small and Color = red then true else false 1 : Size = small −→ true 2 : Size = medium −→ false Outline Motivation VFDR Multi-class Experiments Summary VFDR MC: Illustrative Example STAGGER SET: Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square, triangle} If Size = small and Color = red then true else false 1 : Size = small −→ true expands for each class: and(Size = small; Color = red; ) −→ true and(Size = small; Color = green; ) −→ false 2 : Size = medium −→ false Outline Motivation VFDR Multi-class Experiments Summary VFDR MC: Illustrative Example STAGGER SET: Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square, triangle} If Size = small and Color = red then true else false 1 : and(Size = small; Color = red; ) −→ true 2 : Size = medium −→ false 3 : and(Size = small; Color = green; ) −→ false Outline Motivation VFDR Multi-class Experiments Summary Datasets Table: Datasets Set type attributes noise (%) training set size test SEA artificial numerical 10 100,000 100,000 LED artificial nominal 10 1,000,000 100,000 RT artificial nominal 0 1,000,000 100,000 Hyperplane artificial numerical 5 1,000,000 100,000 Covertype real mix ? 464,810 116,202 KDDCup99 real mix ? 4,898,431 311,029 Outline Motivation VFDR Multi-class Experiments Summary Results - prequential error Outline Motivation VFDR Multi-class Experiments Summary Results - Train and Test Outline Motivation VFDR Multi-class Experiments Summary Results - Train and Test Outline Motivation VFDR Multi-class Experiments Summary Results • VFDR-MC algorithms correctly learn simpler nominal tasks • VFDR-MC has improved accuracy for numerical and mixed sets • For multi-class problems due to one vs. all approach • Also for two-class problems due to the fact it can learn the concept faster by inducing more rules • Using Naive Bayes within the rules facilitates faster learning curve and better any-time prediction capabilities • Disadvantage is that VFDR-MCu may produce large rule sets Outline Motivation VFDR Multi-class Experiments Summary Summary • We introduced ordered and unordered VFDR-MC rule classifier for data streams • Uses FOIL gain that distinguishes one class from all the others • Hoeffding Bound guarantees with given confidence that new literal is the best one • Achieves promising results on data with stationary distribution • Has high potential for extensions to handle non-stationary data