CS 102: Working With Data

Transcription

CS 102: Working with DataTools and TechniquesSpring 2020

Course StaffInstructorJennifer WidomCourse AssistantsLeo Mehr (head)Kyle D’SouzaTara IyerAamir Rasheed2CS102

Zoom Lecture ProtocolAll students on mute, prefer video on1) For general questions: “Everyone” chat,Prof. Widom will keep an eye on it2) For private questions: Chat to one of thefour TAs, preferably Kyle or Aamir3) For Prof. Widom’s questions to class: Use“raise hand” feature, will be called on andunmuted3CS102

What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”4CS102

What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”5CS102

What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level,providing a basis for future exploration and application. Prerequisites:comfort with basic logic and mathematical concepts, along with highschool AP computer science, CS106A, or other equivalent programmingexperience.”6CS102

Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”7CS102

Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want tolearn a variety of tools and techniques for working with data. Many ofthe world's biggest discoveries and decisions in science, technology,business, medicine, politics, and society as a whole, are now being madeon the basis of analyzing data sets. This course provides a broad andpractical introduction to working with data: data analysis techniquesincluding databases, data mining, machine learning, and datavisualization; data analysis tools including spreadsheets, Tableau,relational databases and SQL, Python, and R; introduction to networkanalysis and unstructured data. Tools and techniques are hands-on but at acursory level, providing a basis for future exploration and application.Prerequisites: comfort with basic logic and mathematical concepts, alongwith high school AP computer science, CS106A, or other equivalentprogramming experience.”8CS102

Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programmingexperience.”9CS102

Who Shouldn’t Take It?Computer Science or MCS students(except by petition)10CS102

Who’s Taking It – Spring 2020Undergraduate, Masters, MBA, MD, PhDAll seven of Stanford’s schools, 42 different majorsAmerican StudiesAsian American StudiesBiologyBusiness AdministrationChemistryCivil EngineeringCivil & Environmental EngineeringComparative Studies in Race & EthnicityComparative LiteratureComputer ScienceEarth System ScienceEarth SystemsEast Asian StudiesEconomicsEducationElectrical EngineeringEngineeringEnergy Resources EngineeringEnglishEnvironment and ResourcesEnvironmental Systems Engineering11Feminist, Gender, & Sexuality StudiesGeological SciencesHistoryHuman BiologyIndividually Designed MajorInternational RelationsLawLinguisticsManagementManagement Science & EngineeringMaterials Science & EngineeringMath & Computational ScienceMechanical EngineeringMedicinePhilosophyPolitical SciencePublic PolicyScience, Technology, & SocietySociologyTheater and Performance StudiesUndeclaredCS102

Who’s Taking It12CS102

Who’s Taking It13CS102

Who’s Taking It14CS102

Who’s Taking It15CS102

Ordering of Course Topics§§§§§§§§§§§§16Data Analysis & Visualization Using SpreadsheetsAdvanced Data Visualization Using TableauRelational Databases and SQLPython for Data Analysis & VisualizationMachine Learning – Regression, Classification, ClusteringUsing Python for Machine LearningThe R LanguageData Mining AlgorithmsData Mining Using Python (and SQL)Network AnalysisUnstructured DataCorrelation and CausationCS102

Assigned Work17Assignment/ProjectAssignedDueAssignment #1Spreadsheets for Data Analysis and VisualizationApril 13April 20Project #1Personal Data AnalysisApril 13April 27May 18Assignment #2Data Visualization Using Tableau, SQLApril 20April 30Assignment #3Python for Data Analysis and VisualizationApril 30May 9Assignment #4Machine Learning, R LanguageMay 18May 25Project #2Movie-Rating PredictionsMay 18June 1Assignment #5Data Mining, Network AnalysisMay 28June 5CS102

ExamsExamDateExam #1During class time*May 12Exam #2During class time*June 9*Alternate times (but not dates) available by petition18CS102

Honor CodeUnder the Honor Code at Stanford, you are expected to submit yourown original work for assignments, projects, and exams. On manyoccasions when working on assignments or projects (but never exams!)it is useful to ask others – the instructor, the TAs, or other students –for hints, or to talk generally about aspects of the assignment. Suchactivity is both acceptable and encouraged, but you must indicate onall submitted work any assistance that you received. Any assistancereceived that is not given proper citation will be considered aviolation of the Honor Code. In any event, you are responsible forunderstanding, writing up, and being able to explain all work thatyou submit. The course staff will pursue aggressively all suspectedcases of Honor Code violations, and they will be handled throughofficial University channels.19CS102

Logistics§ Units - 4 for undergraduates, 3-4 for graduates§ WAYS requirement - Applied QuantitativeReasoning (WAY-AQR)§ Textbook? No Readings? Recommended§ Class “attendance” – ExpectedØ Hand-on activitiesØ Only cursory notesØ All class material game for exams20CS102

Logistics§ Grading - Letter grades calculated, C- or abovefor S, otherwise NC§ Grade weighting - 1/3 each assignments,projects, exams§ Graded on a curve? Not really§ Late policy - 10%/30% for 24/48 hours late,four free late days21CS102

Office HoursTA office hours – via Zoom 15 hours per week Times can varyAlways check thecourse calendarfor times and links!Prof. Widom office hours – via Zoom Wednesdays 4:00-5:00 PM (usually)22CS102

OnlineWebsite - http://cs102.stanford.eduCanvas - Zoom lectures and help sessions,recordings posted afterwardPiazza Announcements Q&A (private and public) DiscussionGradescope - Assignment submission & grading23CS102

For Thursday’s Class1) Get set up on Google Drive if you’re notalready2) Download Europe city temperatures data fromcourse website (three files)3) Copy data files into Google Drive, make sureyou can open with Google Sheets4) Be prepared to work on your computeralongside the videoSet-up help sessionOn Wednesday24CS102

CS 102: Working with DataTools and TechniquesQuestions?

introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Too