WEKA: A Machine Machine Learning With WEKA

Transcription

Machine Learning withWEKA WEKA: A MachineLearning ToolkitThe Explorer Eibe Frank Department of Computer Science,University of Waikato, New Zealand Classification andRegressionClusteringAssociation RulesAttribute SelectionData VisualizationThe ExperimenterThe KnowledgeFlow GUIConclusions

WEKA: the birdCopyright: Martin Kramer (mkramer@wxs.nl)2/22/2011University of Waikato2

WEKA: the software Machine learning/data mining software written inJava (distributed under the GNU Public License)Used for research, education, and applicationsComplements “Data Mining” by Witten & FrankMain features:Comprehensive set of data pre-processing tools,learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 2/22/2011University of Waikato3

WEKA: versions There are several versions of WEKA:WEKA 3.0: “book version” compatible withdescription in data mining book WEKA 3.2: “GUI version” adds graphical userinterfaces (book version is command-line only) WEKA 3.3: “development version” with lots ofimprovements This talk is based on the latest snapshot of WEKA3.3 (soon to be WEKA 3.4)2/22/2011University of Waikato4

WEKA only deals with “flat” files@relation heart-disease-simplified@attribute age numeric@attribute sex { female, male}@attribute chest pain type { typ angina, asympt, non anginal, atyp angina}@attribute cholesterol numeric@attribute exercise induced angina { no, yes}@attribute class { present, not present}@data63,male,typ angina,233,no,not t,229,yes,present38,female,non anginal,?,no,not present.2/22/2011University of Waikato5

WEKA only deals with “flat” files@relation heart-disease-simplified@attribute age numeric@attribute sex { female, male}@attribute chest pain type { typ angina, asympt, non anginal, atyp angina}@attribute cholesterol numeric@attribute exercise induced angina { no, yes}@attribute class { present, not present}@data63,male,typ angina,233,no,not t,229,yes,present38,female,non anginal,?,no,not present.2/22/2011University of Waikato6

2/22/2011University of Waikato7

2/22/2011University of Waikato8

2/22/2011University of Waikato9

Explorer: pre-processing the data Data can be imported from a file in variousformats: ARFF, CSV, C4.5, binaryData can also be read from a URL or from an SQLdatabase (using JDBC)Pre-processing tools in WEKA are called “filters”WEKA contains filters for: 2/22/2011Discretization, normalization, resampling, attributeselection, transforming and combining attributes, University of Waikato10

2/22/2011University of Waikato11

2/22/2011University of Waikato12

2/22/2011University of Waikato13

2/22/2011University of Waikato14

2/22/2011University of Waikato15

2/22/2011University of Waikato16

2/22/2011University of Waikato17

2/22/2011University of Waikato18

2/22/2011University of Waikato19

2/22/2011University of Waikato20

2/22/2011University of Waikato21

2/22/2011University of Waikato22

2/22/2011University of Waikato23

2/22/2011University of Waikato24

2/22/2011University of Waikato25

2/22/2011University of Waikato26

2/22/2011University of Waikato27

2/22/2011University of Waikato28

2/22/2011University of Waikato29

2/22/2011University of Waikato30

2/22/2011University of Waikato31

Explorer: building “classifiers” Classifiers in WEKA are models for predictingnominal or numeric quantitiesImplemented learning schemes include: Decision trees and lists, instance-based classifiers,support vector machines, multi-layer perceptrons,logistic regression, Bayes’ nets, “Meta”-classifiers include: 2/22/2011Bagging, boosting, stacking, error-correcting outputcodes, locally weighted learning, University of Waikato32

2/22/2011University of Waikato33

2/22/2011University of Waikato34

2/22/2011University of Waikato35

2/22/2011University of Waikato36

2/22/2011University of Waikato37

2/22/2011University of Waikato38

2/22/2011University of Waikato39

2/22/2011University of Waikato40

2/22/2011University of Waikato41

2/22/2011University of Waikato42

2/22/2011University of Waikato43

2/22/2011University of Waikato44

2/22/2011University of Waikato45

2/22/2011University of Waikato46

2/22/2011University of Waikato47

2/22/2011University of Waikato48

2/22/2011University of Waikato49

2/22/2011University of Waikato50

2/22/2011University of Waikato51

2/22/2011University of Waikato52

2/22/2011University of Waikato53

2/22/2011University of Waikato54

2/22/2011University of Waikato55

2/22/2011University of Waikato56

2/22/2011University of Waikato57

2/22/2011University of Waikato58

2/22/2011University of Waikato59

2/22/2011University of Waikato60

2/22/2011University of Waikato61

2/22/2011University of Waikato62

2/22/2011University of Waikato63

2/22/2011University of Waikato64

2/22/2011QuickTime and a TIFF (LZW) decompressor are needed to see this picture.University of Waikato65

2/22/2011QuickTime and a TIFF (LZW) decompressor are needed to see this picture.University of Waikato66

2/22/2011QuickTime and a TIFF (LZW) decompressor are needed to see this picture.University of Waikato67

2/22/2011University of Waikato68

2/22/2011University of Waikato69

2/22/2011University of Waikato70

2/22/2011University of Waikato71

2/22/2011University of Waikato72

2/22/2011University of Waikato73

2/22/2011University of Waikato74

Quic k Tim e and a TIFF (LZW) dec om pres s or are needed to s ee this pic ture.2/22/2011University of Waikato75

2/22/2011University of Waikato76

2/22/2011University of Waikato77

2/22/2011University of Waikato78

2/22/2011University of Waikato79

QuickTime and a TIFF (LZW) decompressor are needed to see this picture.2/22/2011University of Waikato80

QuickTime and a TIFF (LZW) decompressor are needed to see this picture.2/22/2011University of Waikato81

2/22/2011University of Waikato82

QuickTime and a TIFF (LZW) decompressor are needed to see this picture.2/22/2011University of Waikato83

2/22/2011University of Waikato84

2/22/2011University of Waikato85

2/22/2011University of Waikato86

2/22/2011University of Waikato87

2/22/2011University of Waikato88

2/22/2011University of Waikato89

2/22/2011University of Waikato90

2/22/2011University of Waikato91

Explorer: clustering data WEKA contains “clusterers” for finding groups ofsimilar instances in a datasetImplemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirstClusters can be visualized and compared to “true”clusters (if given)Evaluation based on loglikelihood if clusteringscheme produces a probability distribution2/22/2011University of Waikato92

2/22/2011University of Waikato93

2/22/2011University of Waikato94

2/22/2011University of Waikato95

2/22/2011University of Waikato96

2/22/2011University of Waikato97

2/22/2011University of Waikato98

2/22/2011University of Waikato99

2/22/2011University of Waikato100

2/22/2011University of Waikato101

2/22/2011University of Waikato102

2/22/2011University of Waikato103

2/22/2011University of Waikato104

2/22/2011University of Waikato105

2/22/2011University of Waikato106

2/22/2011University of Waikato107

Explorer: finding associations WEKA contains an implementation of the Apriorialgorithm for learning association rules Can identify statistical dependencies betweengroups of attributes: Works only with discrete datamilk, butter bread, eggs (with confidence 0.9 andsupport 2000)Apriori can compute all rules that have a givenminimum support and exceed a given confidence2/22/2011University of Waikato108

2/22/2011University of Waikato109

2/22/2011University of Waikato110

2/22/2011University of Waikato111

2/22/2011University of Waikato112

2/22/2011University of Waikato113

2/22/2011University of Waikato114

2/22/2011University of Waikato115

Explorer: attribute selection Panel that can be used to investigate which(subsets of) attributes are the most predictive onesAttribute selection methods contain two parts:A search method: best-first, forward selection,random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper,information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrarycombinations of these two2/22/2011University of Waikato116

2/22/2011University of Waikato117

2/22/2011University of Waikato118

2/22/2011University of Waikato119

2/22/2011University of Waikato120

2/22/2011University of Waikato121

2/22/2011University of Waikato122

2/22/2011University of Waikato123

2/22/2011University of Waikato124

Explorer: data visualization Visualization very useful in practice: e.g. helps todetermine difficulty of the learning problemWEKA can visualize single attributes (1-d) andpairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)Color-coded class values“Jitter” option to deal with nominal attributes (andto detect “hidden” data points)“Zoom-in” function2/22/2011University of Waikato125

2/22/2011University of Waikato126

2/22/2011University of Waikato127

2/22/2011University of Waikato128

2/22/2011University of Waikato129

2/22/2011University of Waikato130

2/22/2011University of Waikato131

2/22/2011University of Waikato132

2/22/2011University of Waikato133

2/22/2011University of Waikato134

2/22/2011University of Waikato135

2/22/2011University of Waikato136

2/22/2011University of Waikato137

Conclusion: try it yourself! WEKA is available athttp://www.cs.waikato.ac.nz/ml/wekaAlso has a list of projects based on WEKAWEKA contributors:Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, BernhardPfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang2/22/2011University of Waikato138

2/22/2011 University of Waikato 3 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining”