Machine Learning With Rattle And R - Togaware

Transcription

Machine Learningwith Rattle and R open source data scienceGraham WilliamsDirector of Data ScienceAsia Pacific, Microsoft“Science resolves the whole into parts,the organism into organs, the obscure intothe known. Science gives us knowledge,but only philosophy can give us wisdom [to] synthesize knowledge to resolve theobscure into the known.”After the Philosopher Durant.

OutlineAI and ML – Trees and EnsemblesOpen Source ML – R and RattleElastic Data Science - AzureLinux Data Science VM

What is Machine Learning, Actually?How to identify patterns in the data – associatedwith the outcome of interest – Rain Tomorrow?Recursive Partitioning aka Divide and Conquer aka Map and ReduceOne of the earliest algorithms and still going very strong!Even though deep learning with massive data and massivecompute characterises our current AI/ML surge.

Ensembles as Foundation for all Learning Introduced the concept of an ensemble of decision trees(1987 Australian AI Conference) Idea of multiple models was challenging at the timeWhy not just build the single best model? Ensembles are now the approach of choice(Journeys to Data Mining)Everything is anensemble .(hammers and nails?)Rattle and Other Data MiningTales in Journeys to DataMining, Experiences from 15Renowned Researchers,Springer, 2012, 211-230.

LanguagePlatformWhat is The most popular statistical programming language A data visualisation tool Open source Community Ecosystem3 million users?Taught in most universitiesNew and recent gradusates use itThriving user groups worldwide 10,000 contributed packages Rich application & platform integration

#1 Software for Advanced AnalyticsR Usage GrowthRexer Data Miner Survey, 2007-2015Language PopularityIEEE Spectrum Top Programming Languages76% of datascientists reportusing R36% select R astheir primary tool“C is No. 1, but big data is still the big winner”

Rattle – GUI for Data MiningUsing–Glade point and click GUI builder(XML)–RGtk2 bindings for the cross platform GUI–R to implement all the callbacks– 20,000 downloads per monthLog tab collects documented, formatted, R scripts asa starting point for real work in R

Demo: Rattle Data MiningFirst Model in 4 Clicks

Open Source R but In-Memory Operation Data Movement& Duplication Lack of Parallelism

Alpha Version: Rattle with MRSNow supportsMicrosoft RServer – Big DataNo limit on the dataset sizesParallel data processing and model building

Linux Data Science Virtual Machine - Azurehttps://aka.ms/linuxdsvmVowpal WabbitCNTKRattle

Resources Overview of the Linux Data Science Virtual Machinehttps://aka.ms/linuxdsvm Essentials Guide to Setting Up a Linux DSVM andR and Rstudio and Rattlehttps://aka.ms/ldsvm Rattle Home Pagehttps://rattle.togaware.com

18.03.2017 · Rattle – GUI for Data Mining Using – Glade point and click GUI builder (XML) – RGtk2 bindings for the cross platform GUI – R to implement all the callbacks – 20,000 downloads per month Log tab collects documented, formatted, R scripts as a starting point for real work in R