Ensemble Techniques Introduction To Data Mining, 2 Edition By Tan .

Transcription

Data MiningEnsemble TechniquesIntroduction to Data Mining, 2nd EditionbyTan, Steinbach, Karpatne, Kumar10/11/2021Introduction to Data Mining, 2nd Edition11Ensemble Methods Construct a set of base classifiers learned fromthe training data Predict class label of test records by combiningthe predictions made by multiple classifiers (e.g.,by taking majority vote)10/11/20212Introduction to Data Mining, 2nd Edition2

Example: Why Do Ensemble Methods Work?10/11/2021Introduction to Data Mining, 2nd Edition33Necessary Conditions for Ensemble Methods Ensemble Methods work better than a single base classifier if:1. All base classifiers are independent of each other2. All base classifiers perform better than random guessing(error rate 0.5 for binary classification)Classification error for anensemble of 25 base classifiers,assuming their errors areuncorrelated.10/11/20214Introduction to Data Mining, 2nd Edition4

Rationale for Ensemble Learning Ensemble Methods work best with unstablebase classifiers– Classifiers that are sensitive to minor perturbations intraining set, due to high model complexity– Examples: Unpruned decision trees, ANNs, 10/11/2021Introduction to Data Mining, 2nd Edition55Bias-Variance Decomposition Analogous problem of reaching a target y by firingprojectiles from x (regression problem) For classification, the generalization error of model 𝑚 canbe given by:𝑔𝑒𝑛. 𝑒𝑟𝑟𝑜𝑟 𝑚10/11/20216𝑐𝑏𝑖𝑎𝑠 𝑚𝑐Introduction to Data Mining, 2nd Edition𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑚6

Bias-Variance Trade-off and OverfittingOverfittingUnderfitting Ensemble methods try to reduce the variance of complexmodels (with low bias) by aggregating responses ofmultiple base classifiers10/11/2021Introduction to Data Mining, 2nd Edition77General Approach of Ensemble LearningUsing majority vote orweighted majority vote(weighted according to theiraccuracy or relevance)10/11/20218Introduction to Data Mining, 2nd Edition8

Constructing Ensemble Classifiers By manipulating training set– Example: bagging, boosting, random forests By manipulating input features– Example: random forests By manipulating class labels– Example: error-correcting output coding By manipulating learning algorithm– Example: injecting randomness in the initial weights of ANN10/11/2021Introduction to Data Mining, 2nd Edition99Bagging (Bootstrap AGGregatING) Bootstrap sampling: sampling with replacementOriginal DataBagging (Round 1)Bagging (Round 2)Bagging (Round 3)171128483109548110522565357102981076953310927 Build classifier on each bootstrap sample Probability of a training instance being selected ina bootstrap sample is: 1 – (1 - 1/n)n (n: number of training instances) 0.632 when n is large10/11/202110Introduction to Data Mining, 2nd Edition10

Bagging Algorithm10/11/2021Introduction to Data Mining, 2nd Edition1111Bagging Example Consider 1-dimensional data set:xy er is a decision stump (decision tree of size 1)– Decision rule:x k versus x k– Split point k is chosen based on entropyx k10/11/202112TrueFalseyleftyrightIntroduction to Data Mining, 2nd Edition12

Bagging ExampleBagging Round gging Round 2:x0.10.2y110.310.4-10.5-10.5-10.91111111Bagging Round agging Round agging Round 21Introduction to Data Mining, 2nd Edition1313Bagging ExampleBagging Round gging Round 2:x0.10.2y110.310.4-10.5-10.5-10.91111111Bagging Round agging Round agging Round 2114Introduction to Data Mining, 2nd Edition14

Bagging ExampleBagging Round gging Round ing Round ging Round ng Round /2021Introduction to Data Mining, 2nd Edition1515Bagging Example Summary of Trained Decision Stumps:Round1234567891010/11/202116Split Point Left Class Right 10.75-110.75-110.0511Introduction to Data Mining, 2nd Edition16

Bagging Example Use majority vote (sign of sum of predictions) todetermine class of ensemble classifierPredictedClass Round x 0.1 x 0.2 x 0.3 x 0.4 x 0.5 x 0.6 x 0.7 x 0.8 x 0.9 x -1111Bagging can also increase the complexity (representationcapacity) of simple classifiers such as decision stumps10/11/2021Introduction to Data Mining, 2nd Edition1717Boosting An iterative procedure to adaptively changedistribution of training data by focusing more onpreviously misclassified records– Initially, all N records are assigned equalweights (for being selected for training)– Unlike bagging, weights may change at theend of each boosting round10/11/202118Introduction to Data Mining, 2nd Edition18

BoostingRecords that are wrongly classified will have theirweights increased in the next round Records that are classified correctly will havetheir weights decreased in the next round Original DataBoosting (Round 1)Boosting (Round 2)Boosting (Round 3)1754234432984841057246955741481076964310324 Example 4 is hard to classify Its weight is increased, therefore it is morelikely to be chosen again in subsequent rounds10/11/2021Introduction to Data Mining, 2nd Edition1919AdaBoost Base classifiers: C1, C2, , CT Error rate of a base classifier: Importance of a classifier:1 1 i i ln 2 i 10/11/202120Introduction to Data Mining, 2nd Edition20

AdaBoost Algorithm Weight update:If any intermediate rounds produce error ratehigher than 50%, the weights are reverted backto 1/n and the resampling procedure is repeated Classification: 10/11/2021Introduction to Data Mining, 2nd Edition2121AdaBoost Algorithm10/11/202122Introduction to Data Mining, 2nd Edition22

AdaBoost Example Consider 1-dimensional data set:xy er is a decision stump– Decision rule:x k versus x k– Split point k is chosen based on entropyx kTrueFalseyleftyright10/11/2021Introduction to Data Mining, 2nd Edition2323AdaBoost Example Training sets for the first 3 boosting rounds:Boosting Round oosting Round ng Round -1Summary:Round12310/11/202124Split Point Left Class Right Class0.75-110.05110.31-1Introduction to Data Mining, 2nd Editionalpha1.7382.77844.119524

AdaBoost Example WeightsRound x 0.1 x 0.2 x 0.3 x 0.4 x 0.5 x 0.6 x 0.7 x 0.8 x 0.9 x 1.010.10.10.10.10.10.10.10.10.10.120.311 0.311 0.311 0.01 0.01 0.01 0.01 0.01 0.01 0.0130.029 0.029 0.029 0.228 0.228 0.228 0.228 0.009 0.009 0.009 ClassificationRound x 0.1 x 0.2 x 0.3 x 0.4 x 0.5 x 0.6 x 0.7 x 0.8 x 0.9 x Sum5.16 5.16 5.16 -3.08 -3.08 -3.08 -3.08 0.397 0.397 roduction to Data Mining, 2nd Edition2525Random Forest Algorithm Construct an ensemble of decision trees bymanipulating training set as well as features– Use bootstrap sample to train every decisiontree (similar to Bagging)– Use the following tree induction algorithm:At every internal node of decision tree, randomlysample p attributes for selecting split criterion Repeat this procedure until all leaves are pure(unpruned tree) 10/11/202126Introduction to Data Mining, 2nd Edition26

Characteristics of Random Forest10/11/2021Introduction to Data Mining, 2nd Edition2727Gradient Boosting Constructs a series of models– Models can be any predictive model that hasa differentiable loss function– Commonly, trees are the chosen modelXGboost (extreme gradient boosting) is a popularpackage because of its impressive performance Boosting can be viewed as optimizing the lossfunction by iterative functional gradient descent. Implementations of various boosted algorithmsare available in Python, R, Matlab, and more. 10/11/202128Introduction to Data Mining, 2nd Edition28

Data Mining Ensemble Techniques Introduction to Data Mining, 2nd Edition by Tan, Steinbach, Karpatne, Kumar 10/11/2021 Introduction to Data Mining, 2nd Edition 1 Ensemble Methods Construct a set of base classifiers learned from the training data Predict class label of test records by combining the predictions made by multiple classifiers (e.g.,