Table Of Contents - PythonAnywhere

Transcription

PYTHON FOR DATA ANALYSIS USING NUMPY & PANDASNotes by Michael Brothershttp://titaniumventures.net/library/Table of ContentsWhat's What . 6Vocabulary . 6Jupyter Notebook Tips & Tricks . 6NUMPY. 7Documentation. 7Creating Arrays . 7Array Data Types . 7Built-in Array Construction Methods. 8Random Number Arrays. 82D Random Number Arrays . 9Array Attributes and Methods . 9Array Arithmetic . 10Array Arithmetic with Scalars . 10Broadcasting . 10Summary Statistics on Arrays . 11Axis Logic . 11Universal Array Functions (ufuncs) . 11Binary Functions (require two arrays) . 11Array Slices . 12Slicing a 2D Array . 12Fancy Indexing . 13Conditional Selection . 13Random Choice Arrays . 13Insert an element into an array . 14Append an element to an array . 14Reassign values with broadcasting . 14Array Transposition . 15Array Dot Products . 15Using numpy.where . 16Any and All for processing Boolean arrays . 16Sorting arrays . 16PANDAS. 17Documentation. 17WORKING WITH SERIES. 17Creating a Series . 17Creating a Series with axis labels . 17Creating a Series from a dictionary. 18Converting a Series to a Python dictionary . 18Passing an index with the dictionary can reload a Series in a new order . 18Adding two Series together . 18Naming Series Indexes . 18Selecting, Changing Series Entries . 19Checking for Unique Values and their Counts . 19Removing Rows . 19Removing Rows Permanently . 19Rank and Sort . 201REV 0121

Sort by Index using .sort index . 20Sort by Value using .sort values . 20Rank . 20WORKING WITH DATAFRAMES . 21Creating a DataFrame. 21from a numpy array. 21from a dictionary . 21from a Series . 21from a random set . 22Get column and index labels . 22EXPLORATORY DATA ANALYSIS . 23Display a specific number of rows . 23Display a random collection of rows . 23Selecting columns . 23Creating a new column . 23Removing a column . 24Permanently removing a column . 24Selecting rows. 24Selecting subsets of rows and columns . 24Selecting slices of rows and columns. 24Conditional Selection . 25Selections based on two conditions . 25Selections based on categorical data . 25Summary Statistics on DataFrames . 26Grabbing a row based on min/max values. 27Unique Values and Value Counts . 27Identifying, Removing Duplicate Rows . 27Filtering using Between. 28Filtering by Largest & Smallest Values. 28Transposing data . 28Sorting by values along either axis . 29INDEXING . 30Setting a named index. 30Resetting an index . 30INDEX HIERARCHY . 30Constructing a multilevel index from a list of tuples . 30Constructing a multilevel index from the product of two collections. 31Using a multilevel index when constructing a DataFrame . 31Making selections on a multilevel DataFrame . 31Examining index levels . 31Renaming index levels. 31Selecting a cross-section . 32Using slicers . 32Swapping index levels . 32Sorting index levels . 33Constructing a multilevel index directly by passing in a list of arrays . 33Making selections on a multi-level Series . 33COLUMN HIERARCHY . 34Adding column level names . 34Swapping rows and columns . 34Operations on column levels . 342REV 0121

Avoid chained indexing . 35MISSING DATA . 36Finding, Dropping missing data in a Series . 36Dropping missing data in a DataFrame (Be Careful!) . 36Filling in missing data points . 36APPLYING FUNCTIONS TO DATA . 37Running aggregate methods on selected columns . 37Applying methods to a single column . 37Applying methods to multiple columns . 37Running user-defined functions on selected columns . 37using a single column . 37using multiple columns . 38Running multiple functions on selected columns . 38Running user-defined aggregate functions on selected columns. 38GROUPBY ON DATAFRAMES . 39Create a GroupBy object . 39GroupBy methods (aggregate functions) return a DataFrame . 40GroupBy sorting. 40Running aggregate methods on selected columns . 41Running multiple aggregate methods on selected columns. 41Group by multiple column keys . 41Group one column by multiple column keys. 41Assign keys to a column and group by them instead. 41Iterate over groups . 42Iteration across multiple keys . 42Create a dictionary from grouped data pieces . 42Apply GroupBy using Dictionaries and Series . 43PIVOTING DATAFRAMES . 44DataFrame.pivot. 44DataFrame.pivot table . 45Cross Tabulation . 45STACKING . 46UNSTACKING . 47Unstacking a MultiIndex DataFrame . 47Unstacking a MultiIndex Series returns a DataFrame . 47COMBINING DATAFRAMES . 48APPEND . 48CONCATENATE. 48In numpy, to concatenate two or more arrays . 48In pandas, to concatenate two or more Series . 48Concatenate two or more DataFrames – columns match . 49Concatenate two or more DataFrames – indices match . 49Add a hierarchical index using "keys" . 49MERGE . 51Merging on multiple keys . 52Merge key indicator . 52Handle duplicate key names with suffixes . 52JOIN . 533REV 0121

HANDLING OVERLAPPING DATA. 53The Long Way, using numpy's where method . 53The Shortcut, using pandas' combine first method . 53DATA INPUT/OUTPUT - READING & WRITING FILES . 54Determine the current working directory in Jupyter . 54Setting path names . 54CSV (Comma Separated Value) FILES. 54EXCEL FILES . 54Writing to Excel . 55Writing multiple sheets to the same Excel file . 55JSON (JavaScript Object Notation) FILES . 55HTML FILES. 56To target a specific table on the page. 56BINARY FILES. 56Saving an array to a binary (.npy) file . 56Saving multiple arrays into a zip (.npz) file . 56Loading multiple arrays . 56TEXT FILES . 56FROM THE CLIPBOARD . 57Reading a DataFrame from a webpage using edit/copy:. 57ADDITIONAL PANDAS OPERATIONS . 58REPLACE . 58MAP . 59Adding a Series to an existing DataFrame . 59RENAME index and column labels . 59Dictionary method . 59Function method . 59REINDEX . 60Interpolating values between indices . 60Inserting rows by reindexing on a DataFrame . 61Inserting columns by reindexing on a DataFrame. 61BINNING .

6 REV 0121 Related content can be found in these companion files: Python Programming Python Data Visualizations Python Probability and Statistics Python Machine Learning Using this guide Code is written in a fix