Outline Motivation VFDR Multi-class Experiments Summary
Very Fast Decision Rules for Multi-class Problems
P. Kosina1 J. Gama2
1LIAAD-INESC Porto, FI MU Brno
2LIAAD-INESC Porto, FEP-University of Porto
27th Symposium On Applied Computing
Outline Motivation VFDR Multi-class Experiments Summary
Contents
Motivation
VFDR Multi-class
Experiments
Summary
Outline Motivation VFDR Multi-class Experiments Summary
Data Streams
• Highly detailed, automatic, rapid data feeds.
• Radar: meteorological observations.
• Satellite: geodetics, radiation.
• Astronomical surveys: optical, radio,.
• Internet: traﬃc logs, user queries, email, ﬁnancial,
• Sensor networks: many more observation points ...
• Most of these data will never be seen by a human!
• Need for near-real time analysis of data feeds.
Outline Motivation VFDR Multi-class Experiments Summary
Data Stream Computational Model
Data stream processing algorithms require:
• small constant time per record;
• restricted use of main memory;
• be able to build a model using at most one scan of the data;
• be able to detect and react to concept drift;
• make a usable model available at anytime;
• produce a model with similar performance to the one that
would be obtained by the corresponding memory based
algorithm, operating without the above constraints.
Outline Motivation VFDR Multi-class Experiments Summary
Decision Trees and Rule Sets
• Decision trees and Rule Sets are almost equivalent:
• High degree of interpretability
Outline Motivation VFDR Multi-class Experiments Summary
Rule Sets
• Decision trees
• A decision tree covers all the instance space
• Each node has a context deﬁned by previous nodes in the path
• Large decision trees are diﬃcult to understand because of a
speciﬁc context established by the antecedent nodes
• Rules
• Each rule covers a speciﬁc region of the instance space
• The Union of all rules can be smaller than the Universe
• Rules can be interpretable per si:
• Remove conditions in a rule without removing in another rule.
• Loss the distinction between tests near the root and near the
leaves.
• Advantage of rule sets: modularity and consequently
interpretability
Outline Motivation VFDR Multi-class Experiments Summary
Rule Learning Systems
• Learning as search
• Two approaches:
• From the most general to the most speciﬁc:
Top-down
Model driven
• From the most speciﬁc to the most general:
Bottom-up
Data driven
• In this work we focus on top-down learning decision rules in
multi-class classiﬁcation problems.
Outline Motivation VFDR Multi-class Experiments Summary
Multi-class Rule Learning Systems
Diﬀerent strategies to handle multi-class problems:
• direct multi-class
ﬁnd the best literal that discriminates between all the classes
• one vs. all
ﬁnd the best literal that discriminates one (positive) class
from all the other classes
• all vs. all
transform c-class problem into c×(c−1)
2 two-class problems
learn a classiﬁer for each pair of classes.
Outline Motivation VFDR Multi-class Experiments Summary
Rule Learning Systems
direct multi-class one vs. all all vs. all
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Learning
• VFDR is one-pass, any-time, incremental algorithm for data
stream classiﬁcation
• The classiﬁer incrementally learns from labeled examples
- creates new rules,
- expands existing ones
• A rule covers an example when all the literals are true for the
example
• A rule set is a set of rules plus a default rule
• Default rule’s statistics are updated if none of the rules in the
set covers the example
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Rule Sets
• VFDR starts from a default rule
{} → L where L is a structure containing
the suﬃcient statistics to expand rule and the information
needed to classify examples.
• A rule has the form of {set of literals} → Lr
A rule covers an example when all the literals are true for the
example
• A rule set is a set of rules plus a default rule
• Only the labeled examples covered by a rule update its Lr
• Default rule’s statistics are updated if none of the rules in the
set covers the example
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Multi-class
• Rule Expansion
• Rule expansion considers new literals in a one vs. all fashion
• Uses FOIL gain computed on its Lr
• The expansion of a rule is controlled by Hoeﬀding bound
• Prediction
• Simple prediction strategy:
uses the class distribution in Lr and selects the majority class
• Bayes prediction strategy:
select the class that maximizes posteriori probability given by
the Bayes rule using the statistics in Lr .
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Multi-class - FOIL’ Gain
• Change in gain between rule r and a candidate rule r
Gain(r , r) = s × log2
N+
N
− log2
N+
N
N: number of examples covered by r
N+: number of positive examples covered by r
N : number of examples covered by r
N+ number of positive examples covered by r
s % of true positives in r that are still true positives in r
Measures the eﬀect of adding another literal in the rule.
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Multi-class - FOIL’ Gain
Normalized gain:
GainNorm(r , r) =
Gain(r , r)
N+ × − log2
N+
N
Outline Motivation VFDR Multi-class Experiments Summary
VFDR Multi-class - Two Approaches
• VFDR-MC learns two types of rule sets:
• Unordered
• Rules are independent
• All rules that cover an example update their statistics
• Prediction using a weighted sum classiﬁcation strategy
• Ordered
• First rule that covers an example updates its statistics
• Prediction uses a ﬁrst hit classiﬁcation strategy
Outline Motivation VFDR Multi-class Experiments Summary
VFDR MC: Illustrative Example
STAGGER SET:
Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square,
triangle}
If Size = small and Color = red then true else false
{} −→ L
Outline Motivation VFDR Multi-class Experiments Summary
VFDR MC: Illustrative Example
STAGGER SET:
Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square,
triangle}
If Size = small and Color = red then true else false
1 : Size = small −→ true
2 : Size = medium −→ false
Outline Motivation VFDR Multi-class Experiments Summary
VFDR MC: Illustrative Example
STAGGER SET:
Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square,
triangle}
If Size = small and Color = red then true else false
1 : Size = small −→ true expands for each class:
and(Size = small; Color = red; ) −→ true
and(Size = small; Color = green; ) −→ false
2 : Size = medium −→ false
Outline Motivation VFDR Multi-class Experiments Summary
VFDR MC: Illustrative Example
STAGGER SET:
Size = {small,medium,large}, Color = {red,green,blue}, Shape = {circle, square,
triangle}
If Size = small and Color = red then true else false
1 : and(Size = small; Color = red; ) −→ true
2 : Size = medium −→ false
3 : and(Size = small; Color = green; ) −→ false
Outline Motivation VFDR Multi-class Experiments Summary
Datasets
Table: Datasets
Set type attributes noise (%) training set size test
SEA artiﬁcial numerical 10 100,000 100,000
LED artiﬁcial nominal 10 1,000,000 100,000
RT artiﬁcial nominal 0 1,000,000 100,000
Hyperplane artiﬁcial numerical 5 1,000,000 100,000
Covertype real mix ? 464,810 116,202
KDDCup99 real mix ? 4,898,431 311,029
Outline Motivation VFDR Multi-class Experiments Summary
Results - prequential error
Outline Motivation VFDR Multi-class Experiments Summary
Results - Train and Test
Outline Motivation VFDR Multi-class Experiments Summary
Results - Train and Test
Outline Motivation VFDR Multi-class Experiments Summary
Results
• VFDR-MC algorithms correctly learn simpler nominal tasks
• VFDR-MC has improved accuracy for numerical and mixed sets
• For multi-class problems due to one vs. all approach
• Also for two-class problems due to the fact it can learn the
concept faster by inducing more rules
• Using Naive Bayes within the rules facilitates faster learning
curve and better any-time prediction capabilities
• Disadvantage is that VFDR-MCu may produce large rule sets
Outline Motivation VFDR Multi-class Experiments Summary
Summary
• We introduced ordered and unordered VFDR-MC rule classiﬁer
for data streams
• Uses FOIL gain that distinguishes one class from all the others
• Hoeﬀding Bound guarantees with given conﬁdence that new
literal is the best one
• Achieves promising results on data with stationary distribution
• Has high potential for extensions to handle non-stationary
data