Introduction To Data Science

Transcription

Introduction toData ScienceGIRI NARASIMHAN, SCIS, FIU

Momentos Survey!2Survey Consenthttps://users.cs.fiu.edu/ rse Code: 295MFN!Survey linktinyurl.com/premomentosprePersonal Code: XXXXGiri Narasimhan6/26/18

Case History!!3MovieLens1M.ipynbGiri Narasimhan6/26/18

!4NumPy: numerical computing packages!!!!!!Fast and efficient multidimensional array object ndarrayFunctions for element-wise array computations and array operationsTools for reading and writing array-based data sets to diskLinear algebra operations, Fourier transform, and random numbergenerationTools for integrating connecting C, C , and Fortran code to PythonNumPy arrays are more efficient way of storing and manipulating dataand better for passing between algorithms. Libraries in C or Fortran canoperate on NumPy arrays without copying any data.Giri Narasimhan6/26/18

!5Pandas: package for structured data!DataFrame: more general than R’s data.frame!Combines NumPy arrays with manipulations similar to spreadsheets andrelational databases!Sophisticated indexing facilities!Reshape, slice and dice, aggregations, subselections, etc.!Time series processing functionalityGiri Narasimhan6/26/18

pandas DataFramesGiri Narasimhan!66/26/18

Index objectsGiri Narasimhan!76/26/18

More on IndexGiri Narasimhan!86/26/18

SciPy: scientific computing packages!!!!!!!!!9scipy.integrate: numerical integration routines and differential equation solversscipy.linalg: linear algebra, matrix decompositions extending beyond numpy.linalg.scipy.optimize: function optimizers (minimizers) and root finding algorithmsscipy.signal: signal processing toolsscipy.sparse: sparse matrices and sparse linear system solversscipy.special: wrapper around SPECFUN, a Fortran library implementing manycommon mathematical functions, such as the gamma functionscipy.stats: standard continuous and discrete probability distributions (densityfunctions, samplers, continuous distribution functions), various statistical tests, and moredescriptive statisticsscipy.weave: tool for using inline C code to accelerate array computationsGiri Narasimhan6/26/18

matplotlib: for visualization!10!Matplotlib: Python library for publication-quality visualizations!Creator: John D. Hunter, but maintained by team of developers!Can be used in notebooks with interactive features; zoom in on sectionof plot and pan around using the toolbar in plot window.Giri Narasimhan6/26/18

Two kinds of data structures!!11Structured Lists: Arrays, Tables and SpreadsheetsStrings Matrices: Images Dictionaries: for Associations (Key, Value) PairsTime Series & Trajectories Audio, VideoUnstructured e.g., text! Maps: (functions, data) pair!Giri Narasimhan6/26/18

Giri Narasimhan SciPy: scientific computing packages! scipy.integrate: numerical integration routines and differential equation solvers ! scipy.linalg: linear algebra, matrix decompositions extending beyond numpy.linalg.! scipy.optimize: function optimizers (minimizers) and root finding algorithms ! scipy.signal: signal processing tools ! scipy.sparse: sparse matrices and sparse linear system .