CSE4334/5334 Data Mining

Transcription

CSE4334/5334 Data Mining7 Classification: Naïve Bayes ClassifierChengkai LiDepartment of Computer Science and EngineeringUniversity of Texas at ArlingtonFall 2018 (Slides courtesy of Pang-Ning Tan, Michael Steinbach and Vipin Kumar)

Bayes ClassifierA probabilistic framework for solving classification problemsP( A, C )Conditional Probability: P(C A) P( A)P( A, C )P( A C ) P(C )Bayes theorem:P( A C ) P(C )P(C A) P( A)2Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Example of Bayes TheoremGiven:o A doctor knows that meningitis causes stiff neck 50% of the timeo Prior probability of any patient having meningitis is 1/50,000o Prior probability of any patient having stiff neck is 1/20If a patient has stiff neck, what’s the probabilityhe/she has meningitis?P( S M ) P( M ) 0.5 1 / 50000P( M S ) 0.0002P( S )1 / 203Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Bayesian ClassifiersConsider each attribute and class label as random variablesGiven a record with attributes (A1, A2, ,An)o Goal is to predict class Co Specifically, we want to find the value of C that maximizes P(C A1,A2, ,An )Can we estimate P(C A1, A2, ,An ) directly from data?4Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Bayesian ClassifiersApproach:o compute the posterior probability P(C A1, A2, , An) for allvalues of C using the Bayes theoremP (C A1 A2 ! An ) P ( A1 A2 ! An C ) P (C )P ( A1 A2 ! An )o Choose value of C that maximizes5P(C A1, A2, , An)o Equivalent to choosing value of C that maximizesP(A1, A2, , An C) P(C)How to estimate P(A1, A2, , An C )?Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Naïve Bayes ClassifierAssume independence among attributes Ai whenclass is given:o P(A1, A2, , An C) P(A1 Cj) P(A2 Cj) P(An Cj)o Can estimate P(Ai Cj) for all Ai and Cj.o New point is classified to Cj if P(Cj) P P(Ai Cj) is maximal.66Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

icoricoruoussHowcato lass: P(C) Nc/No e.g., P(No) 7/10, P(Yes) 3/10For discrete attributes:P(Ai Ck) Aik / Nco where Aik is number ofinstances having attribute Ai andbelongs to class Cko Examples:P(Status Married No) 4/7P(Refund Yes Yes) 0Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

How to Estimate Probabilities from Data?For continuous attributes:o Discretize the range into bins and thus transform the attribute into an ordinalattribute.o Probability density estimation:o Assume attribute follows a normal distributiono Use data to estimate parameters of distribution(e.g., mean and standard deviation)o Once probability distribution is known, can use it to estimate the conditionalprobability P(Ai c)8Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

llsHow to Estimate Probabilities from gle85KYes9NoMarried75KNo10NoSingle90KYessNormal distribution:P( Ai c j ) -12ps2ije( Ai - µ ij ) 22s ij2o One for each (Ai,cj) pairFor (Income, Class No):o If Class Noo sample mean 110o sample variance 29751P( Income 120 No) e2p (54.54)(120 -110 ) 22 ( 2975 )Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved. 0.0072

Example of Naïve Bayes ClassifierGiven a Test Record:X (Refund No, Married, Income 120K)P(X Class No) P(Refund No Class No) P(Married Class No) P(Income 120K Class No) 4/7 4/7 0.0072 0.0024P(X Class Yes) P(Refund No Class Yes) P(Married Class Yes) P(Income 120K Class Yes) 1 0 1.2 10-9 0Since P(X No)P(No) P(X Yes)P(Yes)Therefore P(No X) P(Yes X) Class No10Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Naïve Bayes ClassifierIf one of the conditional probability is zero, then theentire expression becomes zeroProbability estimation:NOriginal : P ( Ai C ) icNcN ic 1Laplace : P ( Ai C ) Nc cc: number of classesp: prior probabilitym: parameterN ic mpm - estimate : P ( Ai C ) Nc m11Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Example of Naïve Bayes ClassifierNameGive eopard sharkturtlepenguinporcupineeelsalamandergila esyesnonoyesnononononoyesnoGive Birth12yesCan FlynonononononoyesyesnononononononononoyesnoyesCan FlynoLive in Water Have snoyesyesyesnoyesyesyesyesnoyesLive in Water Have ammalsClass?A: attributesM: mammalsN: non-mammals6 6 2 2P ( A M ) 0.067 7 7 71 10 3 4P ( A N ) 0.004213 13 13 137P ( A M ) P ( M ) 0.06 0.0212013P ( A N ) P ( N ) 0.004 0.002720P(A M)P(M) P(A N)P(N) MammalsCopyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

Naïve Bayes (Summary)Robust to isolated noise pointsHandle missing values by ignoring the instance during probabilityestimate calculationsRobust to irrelevant attributesIndependence assumption may not hold for some attributeso Use other techniques such as Bayesian Belief Networks (BBN)13Copyright 2007-2018 The University of Texas at Arlington. All Rights Reserved.

How to Estimate Probabilities from Data? For continuous attributes: oDiscretizethe range into bins and thus transform the attribute into an ordinal attribute. oProbability density estimation: oAssume attribute follows a normal distribution oUse data to estimate parameters of distribution (e.g., mean and standard deviation)