Ensemble Methods - University Of Waikato

Transcription

Ensemble MethodsAlbert BifetMay 2012

COMP423A/COMP523A Data Stream MiningOutline1. Introduction2. Stream Algorithmics3. Concept drift4. Evaluation5. Classification6. Ensemble Methods7. Regression8. Clustering9. Frequent Pattern Mining10. Distributed Streaming

Data StreamsBig Data & Real Time

Ensemble Learning: The Wisdom of CrowdsDiversity of opinion, IndependenceDecentralization, Aggregation

BaggingExampleDataset of 4 Instances : A, B, C, DClassifier 1:Classifier 2:Classifier 3:Classifier 4:Classifier 5:B, A, C, BD, B, A, DB, A, C, BB, C, B, BD, C, A, CBagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.

BaggingExampleDataset of 4 Instances : A, B, C, DClassifier 1: A, B, B, CClassifier 2: A, B, D, DClassifier 3: A, B, B, CClassifier 4: B, B, B, CClassifier 5: A, C, C, DBagging builds a set of M base models, with a bootstrapsample created by drawing random samples withreplacement.

BaggingExampleDataset of 4 Instances : A, B, C, DClassifier 1: A, B, B, C: A(1) B(2) C(1) D(0)Classifier 2: A, B, D, D: A(1) B(1) C(0) D(2)Classifier 3: A, B, B, C: A(1) B(2) C(1) D(0)Classifier 4: B, B, B, C: A(0) B(3) C(1) D(0)Classifier 5: A, C, C, D: A(1) B(0) C(2) D(1)Each base model’s training set contains each of the originaltraining example K times where P(K k) follows a binomialdistribution.

BaggingFigure: Poisson(1) Distribution.Each base model’s training set contains each of the originaltraining example K times where P(K k) follows a binomialdistribution.

Oza and Russell’s Online Bagging for M modelsInitialize base models hm for all m {1, 2, ., M}for all training examples do3:for m 1, 2, ., M do4:Set w Poisson(1)5:Update hm with the current example with weight w1:2:anytime output:PT7: return hypothesis: hfin (x) arg maxy Yt 1 I(ht (x) y)6:

Hoeffding Option TreeHoeffding Option TreesRegular Hoeffding tree containing additional option nodes thatallow several tests to be applied, leading to multiple Hoeffdingtrees as separate paths.

Random Forests (Breiman, 2001)Adding randomization to decision treesIthe input training set is obtained by sampling withreplacement, like BaggingIthe nodes of the tree only may use a fixed number ofrandom attributes to splitIthe trees are grown without pruning

Accuracy Weighted EnsembleMining concept-drifting data streams using ensembleclassifiers. Wang et al. 2003IProcess chunks of instances of size WIBuilds a new classifier for each chunkIRemoves old classifierIWeight each classifier using errorwi MSEr MSEiwhereMSEr Xp(c)(1 p(c))2candMSEi 1 Sn X(x,c) Sn(1 fci (x))2

ADWIN BaggingADWINAn adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.ADWIN has rigorous guarantees (theorems)IOn ratio of false positives and negativesIOn the relation of the size of the current window andchange ratesADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.

ADWIN Bagging for M models1:2:3:4:5:6:7:Initialize base models hm for all m {1, 2, ., M}for all training examples dofor m 1, 2, ., M doSet w Poisson(1)Update hm with the current example with weight wif ADWIN detects change in error of one of theclassifiers thenReplace classifier with higher error with a new oneanytime output:PT9: return hypothesis: hfin (x) arg maxy Yt 1 I(ht (x) y)8:

Leveraging Bagging for EvolvingData StreamsRandomization as a powerful tool to increase accuracy anddiversityThere are three ways of using randomization:IManipulating the input dataIManipulating the classifier algorithmsIManipulating the output targets

Input Randomization0,400,350,30P(X k)0,25λ 1λ 6λ 100,200,150,100,050,000123456789 10 11 12 13 14 15 16 17 18 19 20kFigure: Poisson Distribution.

ECOC Output RandomizationTable: Example matrix of random output codes for 3 classes and 6classifiersClassifier 1Classifier 2Classifier 3Classifier 4Classifier 5Classifier 6Class 1001110Class 2010101Class 3110010

Leveraging Bagging for Evolving Data StreamsLeveraging BaggingIUsing Poisson(λ)Leveraging Bagging MCIUsing Poisson(λ) and Random Output CodesFast Leveraging Bagging MEIif an instance is misclassified: weight 1Iif not: weight eT /(1 eT ),

Empirical evaluationHoeffding TreeOnline BaggingADWIN BaggingLeveraging BaggingLeveraging Bagging MCLeveraging Bagging Hours0.012.981.4820.1722.040.87Leveraging BaggingILeveraging BaggingIILeveraging Bagging MCIIUsing Poisson(λ)Using Poisson(λ) and Random Output CodesLeveraging Bagging MEIUsing weight 1 if misclassified, otherwise eT /(1 eT )

BoostingThe strength of Weak Learnability, Schapire 90A boosting algorithm transforms a weak learnerinto a strong one

BoostingA formal description of Boosting (Schapire)Igiven a training set (x1 , y1 ), . . . , (xm , ym )Iyi { 1, 1} correct label of instance xi Xfor t 1, . . . , TIIIconstruct distribution Dtfind weak classifierht : X { 1, 1}with small error t PrDt [ht (xi ) 6 yi ] on DtIoutput final classifier

BoostingOza and Russell’s Online Boostingsw1: Initialize base models hm for all m {1, 2, ., M}, λscm 0, λm 02: for all training examples do3:Set “weight” of example λd 14:for m 1, 2, ., M do5:Set k Poisson(λd )6:for n 1, 2, ., k do7:Update hm with the current example8:if hm correctly classifies the example thensc9:λscm λm λdλswm10: m λsw λscm11:12:13:14:λd λd m12(1 m ) Decrease λdelseswλswm λm λdλswmscλswm λ m λd λd 2 1m m Increase λd15:16: anytime output:P17: return hypothesis: hfin (x) arg maxy Y m:hm (x) y log m /(1 m )

StackingUse a classifier to combine predictions of base classifiersIExample: use a perceptron to do stackingRestricted Hoeffding TreesTrees for all possible attribute subsets of size k I m subsetsk mm!I m kk!(m k)! m kExample for 10 attributes 101010 10 45 120123 1010 210 25245

Ensemble Methods Albert Bifet May 2012. COMP423A/COMP523A Data Stream Mining Outline 1.Introduction 2.Stream Algorithmics 3.Concept drift 4.Evaluation 5.Classification 6. Ensemble Methods 7.Regression 8.Clustering 9.Frequent Pattern Mining 10.Distributed Streaming. Data Streams Big Data & Real Time.