Python Data Science - EBook-Shop Der Quolibris GmbH

Transcription

Python forData Science

Python forData Science 2nd Editionby John Paul Muellerand Luca Massaron

Python for Data Science For Dummies , 2nd EditionPublished by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.comCopyright 2019 by John Wiley & Sons, Inc., Hoboken, New JerseyMedia and software compilation copyright 2019 by John Wiley & Sons, Inc. All rights reserved.Published simultaneously in CanadaNo part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requeststo the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and relatedtrade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without writtenpermission. Python is a registered trademark of Python Software Foundation Corporation. All other trademarks arethe property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendormentioned in this book.LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NOREPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTSOF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIESOF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES ORPROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOREVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGEDIN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE ISREQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THEPUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT ANORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OFFURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATIONTHE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERSSHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEAREDBETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.For general information on our other products and services, please contact our Customer Care Department withinthe U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please es.Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included withstandard print versions of this book may not be included in e-books or in print-on-demand. If this book refers tomedia such as a CD or DVD that is not included in the version you purchased, you may download this material athttp://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.Library of Congress Control Number: 2018967877ISBN: 978-1-119-54762-4; ISBN: 978-1-119-54766-2 (ebk); ISBN: 978-1-119-54764-8 (ebk)Manufactured in the United States of America10 9 8 7 6 5 4 3 2 1

Contents at a GlanceIntroduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Part 1: Getting Started with Data Science and Python. . . . . . 7CHAPTER 1:CHAPTER 2:CHAPTER 3:CHAPTER 4:Discovering the Match between Data Science and Python. . . . . . . . . . . . . 9Introducing Python’s Capabilities and Wonders. . . . . . . . . . . . . . . . . . . . . 21Setting Up Python for Data Science. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Working with Google Colab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Part 2: Getting Your Hands Dirty with Data. . . . . . . . . . . . . . . . . . 81CHAPTER 5:CHAPTER 6:CHAPTER 7:CHAPTER 8:CHAPTER 9:Understanding the Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Working with Real Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Conditioning Your Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Shaping Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Putting What You Know in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169Part 3: Visualizing Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .CHAPTER 10:CHAPTER 11:183Getting a Crash Course in MatPlotLib. . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Visualizing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201Part 4: Wrangling Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227Stretching Python’s Capabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229Exploring Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251CHAPTER 14: Reducing Dimensionality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275CHAPTER 12:CHAPTER 13:CHAPTER 15:CHAPTER 16:Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295Detecting Outliers in Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313Part 5: Learning from Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .327Exploring Four Simple and Effective Algorithms. . . . . . . . . . . . . . . . . . .CHAPTER 18: Performing Cross-Validation, Selection, and Optimization. . . . . . . . . .CHAPTER 19: Increasing Complexity with Linear and Nonlinear Tricks. . . . . . . . . . . .CHAPTER 20: Understanding the Power of the Many. . . . . . . . . . . . . . . . . . . . . . . . . . .329347371411Part 6: The Part of Tens. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .429CHAPTER 17:CHAPTER 21:CHAPTER 22:Ten Essential Data Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431Ten Data Challenges You Should Take. . . . . . . . . . . . . . . . . . . . . . . . . . . 437Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .447

Table of ContentsINTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Foolish Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Icons Used in This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Beyond the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13445PART 1: GETTING STARTED WITH DATASCIENCE AND PYTHON. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7CHAPTER 1:Discovering the Match betweenData Science and Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Defining the Sexiest Job of the 21st Century. . . . . . . . . . . . . . . . . . . . . . 11Considering the emergence of data science. . . . . . . . . . . . . . . . . . . 12Outlining the core competencies of a data scientist . . . . . . . . . . . . 12Linking data science, big data, and AI. . . . . . . . . . . . . . . . . . . . . . . . . 13Understanding the role of programming . . . . . . . . . . . . . . . . . . . . . 14Creating the Data Science Pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Preparing the data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Performing exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . . 15Learning from data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Visualizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Obtaining insights and data products . . . . . . . . . . . . . . . . . . . . . . . . 16Understanding Python’s Role in Data Science. . . . . . . . . . . . . . . . . . . . .16Considering the shifting profile of data scientists . . . . . . . . . . . . . . 16Working with a multipurpose, simple, andefficient language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Learning to Use Python Fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Loading data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Training a model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Viewing a result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19CHAPTER 2:Introducing Python’s Capabilitiesand Wonders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Why Python?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Grasping Python’s Core Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . .Contributing to data science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Discovering present and future development goals . . . . . . . . . . . .Table of Contents22232324vii

Working with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting a taste of the language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Understanding the need for indentation. . . . . . . . . . . . . . . . . . . . . .Working at the command line or in the IDE . . . . . . . . . . . . . . . . . . .Performing Rapid Prototyping and Experimentation . . . . . . . . . . . . . .Considering Speed of Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Visualizing Power. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using the Python Ecosystem for Data Science . . . . . . . . . . . . . . . . . . . .Accessing scientific tools using SciPy . . . . . . . . . . . . . . . . . . . . . . . . .Performing fundamental scientific computing using NumPy. . . . .Performing data analysis using pandas. . . . . . . . . . . . . . . . . . . . . . .Implementing machine learning using Scikit-learn . . . . . . . . . . . . .Going for deep learning with Keras and TensorFlow. . . . . . . . . . . .Plotting the data using matplotlib. . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating graphs with NetworkX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Parsing HTML documents using Beautiful Soup. . . . . . . . . . . . . . . .CHAPTER 3:Setting Up Python for Data Science. . . . . . . . . . . . . . . . . . . 39Considering the Off-the-Shelf Cross-PlatformScientific Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting Continuum Analytics Anaconda . . . . . . . . . . . . . . . . . . . . . .Getting Enthought Canopy Express . . . . . . . . . . . . . . . . . . . . . . . . . .Getting WinPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Installing Anaconda on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Installing Anaconda on Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Installing Anaconda on Mac OS X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Downloading the Datasets and Example Code. . . . . . . . . . . . . . . . . . . .Using Jupyter Notebook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Defining the code repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Understanding the datasets used in this book. . . . . . . . . . . . . . . . .CHAPTER 4:4040414242464748495057Working with Google Colab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Defining Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Understanding what Google Colab does. . . . . . . . . . . . . . . . . . . . . .Considering the online coding difference . . . . . . . . . . . . . . . . . . . . .Using local runtime support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting a Google Account. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating the account. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Signing in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Working with Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating a new notebook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Opening existing notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Saving notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Downloading notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viii25252627313233353536363637383838Python for Data Science For Dummies606061636364646565666871

Performing Common Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating code cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating text cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating special cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Editing cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Moving cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Executing the Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Viewing Your Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Displaying the table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting notebook information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Checking code execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Sharing Your Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting Help. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7171727374757576767777787980PART 2: GETTING YOUR HANDS DIRTY WITH DATA . . . . . . . . 81CHAPTER 5:Understanding the Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Using the Jupyter Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Interacting with screen text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Changing the window appearance. . . . . . . . . . . . . . . . . . . . . . . . . . .Getting Python help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Getting IPython help. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using magic functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Discovering objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using Jupyter Notebook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Working with styles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Restarting the kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Restoring a checkpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performing Multimedia and Graphic Integration. . . . . . . . . . . . . . . . . .Embedding plots and other images . . . . . . . . . . . . . . . . . . . . . . . . . .Loading examples from online sites. . . . . . . . . . . . . . . . . . . . . . . . . .Obtaining online graphics and multimedia. . . . . . . . . . . . . . . . . . . .CHAPTER 6:848486878990919393949596969696Working with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Uploading, Streaming, and Sampling Data . . . . . . . . . . . . . . . . . . . . . .Uploading small amounts of data into memory. . . . . . . . . . . . . . .Streaming large amounts of data into memory. . . . . . . . . . . . . . .Generating variations on image data. . . . . . . . . . . . . . . . . . . . . . . .Sampling data in different ways . . . . . . . . . . . . . . . . . . . . . . . . . . . .Accessing Data in Structured Flat-File Form . . . . . . . . . . . . . . . . . . . . .Reading from a text file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Reading CSV delimited format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Reading Excel and other Microsoft Office files. . . . . . . . . . . . . . . .Table of Contents100101102103104105106107109ix

CHAPTER 7:CHAPTER 8:xSending Data in Unstructured File Form . . . . . . . . . . . . . . . . . . . . . . . .Managing Data from Relational Databases. . . . . . . . . . . . . . . . . . . . . .Interacting with Data from NoSQL Databases . . . . . . . . . . . . . . . . . . .Accessing Data from the Web. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111113115116Conditioning Your Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121Juggling between NumPy and pandas. . . . . . . . . . . . . . . . . . . . . . . . . .Knowing when to use NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Knowing when to use pandas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Validating Your Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Figuring out what’s in your data . . . . . . . . . . . . . . . . . . . . . . . . . . . .Removing duplicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating a data map and data plan . . . . . . . . . . . . . . . . . . . . . . . . .Manipulating Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . .Creating categorical variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Renaming levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Combining levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dealing with Dates in Your Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Formatting date and time values . . . . . . . . . . . . . . . . . . . . . . . . . . .Using the right time transformation. . . . . . . . . . . . . . . . . . . . . . . . .Dealing with Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Finding the missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Encoding missingness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Imputing missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Slicing and Dicing: Filtering and Selecting Data . . . . . . . . . . . . . . . . . .Slicing rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Slicing columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dicing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Concatenating and Transforming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Adding new cases and variables. . . . . . . . . . . . . . . . . . . . . . . . . . . .Removing data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Sorting and shuffling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Aggregating Data at Any Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37138139140140141142142144145146Shaping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149Working with HTML Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Parsing XML and HTML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using XPath for data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . .Working with Raw Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Dealing with Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Stemming and removing stop words. . . . . . . . . . . . . . . . . . . . . . . .Introducing regular expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . .Using the Bag of Words Model and Beyond . . . . . . . . . . . . . . . . . . . . .Understanding the bag of words model . . . . . . . . . . . . . . . . . . . . .150150151153153153155158159Python for Data Science For Dummies

Working with n-grams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Implementing TF-IDF transformations. . . . . . . . . . . . . . . . . . . . . . .Working with Graph Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Understanding the adjacency matrix. . . . . . . . . . . . . . . . . . . . . . . .Using NetworkX basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161162165165166Putting What You Know in Action. . . . . . . . . . . . . . . . . . .169Contextualizing Problems and Data. . . . . . . . . . . . . . . . . . . . . . . . . . . .Evaluating a data science problem. . . . . . . . . . . . . . . . . . . . . . . . . .Researching solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Formulating a hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Preparing your data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Considering the Art of Feature Creation . . . . . . . . . . . . . . . . . . . . . . . .Defining feature creation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Combining variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Understanding binning and discretization . . . . . . . . . . . . . . . . . . .Using indicator variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Transforming distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performing Operations on Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Using vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Performing simple arithmetic on vectors and matrices . . . . . . . .Performing matrix vector multiplication . . . . . . . . . . . . . . . . . . . . .Performing matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . ART 3: VISUALIZING INFORMATION. . . .

viii Python for Data Science For Dummies Working with Python . . . . . . . . . . . . . . . . . . . . . . . . . .