Mastering Machine Learning With Python In Six Steps

Transcription

Mastering MachineLearning withPython in Six StepsA Practical Implementation Guide toPredictive Data Analytics Using Python—Manohar Swamynathanwww.allitebooks.com

Mastering MachineLearning withPython in Six StepsA Practical Implementation Guideto Predictive Data Analytics UsingPythonManohar Swamynathanwww.allitebooks.com

Mastering Machine Learning with Python in Six StepsManohar SwamynathanBangalore, Karnataka, IndiaISBN-13 (pbk): 978-1-4842-2865-4DOI 10.1007/978-1-4842-2866-1ISBN-13 (electronic): 978-1-4842-2866-1Library of Congress Control Number: 2017943522Copyright 2017 by Manohar SwamynathanThis work is subject to copyright. All rights are reserved by the Publisher, whether thewhole or part of the material is concerned, specifically the rights of translation, reprinting,reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in anyother physical way, and transmission or information storage and retrieval, electronicadaptation, computer software, or by similar or dissimilar methodology now known orhereafter developed.Trademarked names, logos, and images may appear in this book. Rather than use atrademark symbol with every occurrence of a trademarked name, logo, or image weuse the names, logos, and images only in an editorial fashion and to the benefit of thetrademark owner, with no intention of infringement of the trademark.The use in this publication of trade names, trademarks, service marks, and similar terms,even if they are not identified as such, is not to be taken as an expression of opinion as towhether or not they are subject to proprietary rights.While the advice and information in this book are believed to be true and accurate at thedate of publication, neither the authors nor the editors nor the publisher can accept anylegal responsibility for any errors or omissions that may be made. The publisher makesno warranty, express or implied, with respect to the material contained herein.Cover image designed by FreepikManaging Director: Welmoed SpahrEditorial Director: Todd GreenAcquisitions Editor: Celestin Suresh JohnDevelopment Editor: Anila Vincent and James MarkhamTechnical Reviewer: Jojo MoolayilCoordinating Editor: Sanchita MandalCopy Editor: Karen JamesonCompositor: SPi GlobalIndexer: SPi GlobalArtist: SPi GlobalDistributed to the book trade worldwide by Springer Science Business Media New York,233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201)348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com.Apress Media, LLC is a California LLC and the sole member (owner) is SpringerScience Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is aDelaware corporation.For information on translations, please e-mail rights@apress.com, or ss titles may be purchased in bulk for academic, corporate, or promotional use. eBookversions and licenses are also available for most titles. For more information, reference ourPrint and eBook Bulk Sales web page at http://www.apress.com/bulk-sales.Any source code or other supplementary material referenced by the author in this book isavailable to readers on GitHub via the book’s product page, located at www.apress.com/978-1-4842-2865-4. For more detailed information, please visit http://www.apress.com/source-code.Printed on acid-free paperwww.allitebooks.com

Contents at a GlanceAbout the Author . xiiiAbout the Technical Reviewer . xvAcknowledgments . xviiIntroduction . xix Chapter 1: Step 1 – Getting Started in Python . 1 Chapter 2: Step 2 – Introduction to Machine Learning . 53 Chapter 3: Step 3 – Fundamentals of Machine Learning . 117 Chapter 4: Step 4 – Model Diagnosis and Tuning . 209 Chapter 5: Step 5 – Text Mining and Recommender Systems . 251 Chapter 6: Step 6 – Deep and Reinforcement Learning . 297 Chapter 7: Conclusion . 345Index . 351iiiwww.allitebooks.com

ContentsAbout the Author . xiiiAbout the Technical Reviewer . xvAcknowledgments . xviiIntroduction . xix Chapter 1: Step 1 – Getting Started in Python . 1The Best Things in Life Are Free . 1The Rising Star . 2Python 2.7.x or Python 3.4.x?. 3Windows Installation . 4OSX Installation . 4Linux Installation . 4Python from Official Website . 4Running Python . 5Key Concepts. 5Python Identifiers. 5Keywords . 6My First Python Program . 6Code Blocks (Indentation & Suites) . 6Basic Object Types . 8When to Use List vs. Tuples vs. Set vs. Dictionary. 10Comments in Python. 10Multiline Statement . 11vwww.allitebooks.com

CONTENTSBasic Operators . 12Control Structure . 20Lists . 22Tuple . 26Sets. 29Dictionary . 37User-Defined Functions . 42Module . 45File Input/Output . 47Exception Handling . 48Endnotes . 52 Chapter 2: Step 2 – Introduction to Machine Learning . 53History and Evolution . 54Artificial Intelligence Evolution . 57Different Forms . 58Statistics . 58Data Mining . 61Data Analytics . 61Data Science. 64Statistics vs. Data Mining vs. Data Analytics vs. Data Science . 66Machine Learning Categories. 67Supervised Learning. 67Unsupervised Learning . 68Reinforcement Learning . 69Frameworks for Building Machine Learning Systems. 69Knowledge Discovery Databases (KDD) . 69Cross-Industry Standard Process for Data Mining . 71viwww.allitebooks.com

CONTENTSSEMMA (Sample, Explore, Modify, Model, Assess) . 74KDD vs. CRISP-DM vs. SEMMA . 75Machine Learning Python Packages . 76Data Analysis Packages . 76NumPy . 77Pandas . 89Matplotlib. 100Machine Learning Core Libraries . 114Endnotes . 116 Chapter 3: Step 3 – Fundamentals of Machine Learning . 117Machine Learning Perspective of Data. 117Scales of Measurement. 118Nominal Scale of Measurement . 118Ordinal Scale of Measurement . 119Interval Scale of Measurement. 119Ratio Scale of Measurement . 119Feature Engineering . 120Dealing with Missing Data . 121Handling Categorical Data . 121Normalizing Data . 123Feature Construction or Generation. 125Exploratory Data Analysis (EDA) . 125Univariate Analysis . 126Multivariate Analysis . 128Supervised Learning– Regression . 131Correlation and Causation . 133Fitting a Slope. 134How Good Is Your Model? . 136viiwww.allitebooks.com

CONTENTSPolynomial Regression . 139Multivariate Regression . 143Multicollinearity and Variation Inflation Factor (VIF) . 145Interpreting the OLS Regression Results . 149Regression Diagnosis . 152Regularization. 156Nonlinear Regression . 159Supervised Learning – Classification . 160Logistic Regression . 161Evaluating a Classification Model Performance . 164ROC Curve. 166Fitting Line . 167Stochastic Gradient Descent . 168Regularization. 169Multiclass Logistic Regression . 171Generalized Linear Models . 173Supervised Learning – Process Flow . 175Decision Trees . 176Support Vector Machine (SVM) . 180k Nearest Neighbors (kNN) . 183Time-Series Forecasting. 185Unsupervised Learning Process Flow . 194Clustering . 195K-means . 195Finding Value of k . 199Hierarchical Clustering . 203Principal Component Analysis (PCA) . 205Endnotes . 208viiiwww.allitebooks.com

CONTENTS Chapter 4: Step 4 – Model Diagnosis and Tuning . 209Optimal Probability Cutoff Point . 209Which Error Is Costly? . 213Rare Event or Imbalanced Dataset . 213Known Disadvantages . 216Which Resampling Technique Is the Best? . 217Bias and Variance . 218Bias . 218Variance . 218K-Fold Cross-Validation . 219Stratified K-Fold Cross-Validation . 221Ensemble Methods . 221Bagging . 222Feature Importance . 224RandomForest . 225Extremely Randomized Trees (ExtraTree) . 225How Does the Decision Boundary Look? . 226Bagging – Essential Tuning Parameters . 228Boosting . 228Example Illustration for AdaBoost. 229Gradient Boosting . 233Boosting – Essential Tuning Parameters . 235Xgboost (eXtreme Gradient Boosting). 236Ensemble Voting – Machine Learning’s Biggest Heroes United . 240Hard Voting vs. Soft Voting . 242Stacking . 244ixwww.allitebooks.com

CONTENTSHyperparameter Tuning . 246GridSearch . 247RandomSearch . 248Endnotes . 250 Chapter 5: Step 5 – Text Mining and Recommender Systems . 251Text Mining Process Overview . 252Data Assemble (Text) . 253Social Media . 255Step 1 – Get Access Key (One-Time Activity). 255Step 2 – Fetching Tweets .

Mastering Machine Learning with Python in Six Steps Manohar Swamynathan Bangalore, Karnataka, India ISBN-13 (pbk):