Transcription
CS 102: Working with DataTools and TechniquesSpring 2020
Course StaffInstructorJennifer WidomCourse AssistantsLeo Mehr (head)Kyle D’SouzaTara IyerAamir Rasheed2CS102
Zoom Lecture ProtocolAll students on mute, prefer video on1) For general questions: “Everyone” chat,Prof. Widom will keep an eye on it2) For private questions: Chat to one of thefour TAs, preferably Kyle or Aamir3) For Prof. Widom’s questions to class: Use“raise hand” feature, will be called on andunmuted3CS102
What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”4CS102
What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”5CS102
What’s This Course About?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level,providing a basis for future exploration and application. Prerequisites:comfort with basic logic and mathematical concepts, along with highschool AP computer science, CS106A, or other equivalent programmingexperience.”6CS102
Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programming experience.”7CS102
Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want tolearn a variety of tools and techniques for working with data. Many ofthe world's biggest discoveries and decisions in science, technology,business, medicine, politics, and society as a whole, are now being madeon the basis of analyzing data sets. This course provides a broad andpractical introduction to working with data: data analysis techniquesincluding databases, data mining, machine learning, and datavisualization; data analysis tools including spreadsheets, Tableau,relational databases and SQL, Python, and R; introduction to networkanalysis and unstructured data. Tools and techniques are hands-on but at acursory level, providing a basis for future exploration and application.Prerequisites: comfort with basic logic and mathematical concepts, alongwith high school AP computer science, CS106A, or other equivalentprogramming experience.”8CS102
Who Should Take It?“Aimed at non-CS undergraduate and graduate students who want to learna variety of tools and techniques for working with data. Many of theworld's biggest discoveries and decisions in science, technology, business,medicine, politics, and society as a whole, are now being made on thebasis of analyzing data sets. This course provides a broad and practicalintroduction to working with data: data analysis techniques includingdatabases, data mining, machine learning, and data visualization; dataanalysis tools including spreadsheets, Tableau, relational databases andSQL, Python, and R; introduction to network analysis and unstructureddata. Tools and techniques are hands-on but at a cursory level, providing abasis for future exploration and application. Prerequisites: comfort withbasic logic and mathematical concepts, along with high school APcomputer science, CS106A, or other equivalent programmingexperience.”9CS102
Who Shouldn’t Take It?Computer Science or MCS students(except by petition)10CS102
Who’s Taking It – Spring 2020Undergraduate, Masters, MBA, MD, PhDAll seven of Stanford’s schools, 42 different majorsAmerican StudiesAsian American StudiesBiologyBusiness AdministrationChemistryCivil EngineeringCivil & Environmental EngineeringComparative Studies in Race & EthnicityComparative LiteratureComputer ScienceEarth System ScienceEarth SystemsEast Asian StudiesEconomicsEducationElectrical EngineeringEngineeringEnergy Resources EngineeringEnglishEnvironment and ResourcesEnvironmental Systems Engineering11Feminist, Gender, & Sexuality StudiesGeological SciencesHistoryHuman BiologyIndividually Designed MajorInternational RelationsLawLinguisticsManagementManagement Science & EngineeringMaterials Science & EngineeringMath & Computational ScienceMechanical EngineeringMedicinePhilosophyPolitical SciencePublic PolicyScience, Technology, & SocietySociologyTheater and Performance StudiesUndeclaredCS102
Who’s Taking It12CS102
Who’s Taking It13CS102
Who’s Taking It14CS102
Who’s Taking It15CS102
Ordering of Course Topics§§§§§§§§§§§§16Data Analysis & Visualization Using SpreadsheetsAdvanced Data Visualization Using TableauRelational Databases and SQLPython for Data Analysis & VisualizationMachine Learning – Regression, Classification, ClusteringUsing Python for Machine LearningThe R LanguageData Mining AlgorithmsData Mining Using Python (and SQL)Network AnalysisUnstructured DataCorrelation and CausationCS102
Assigned Work17Assignment/ProjectAssignedDueAssignment #1Spreadsheets for Data Analysis and VisualizationApril 13April 20Project #1Personal Data AnalysisApril 13April 27May 18Assignment #2Data Visualization Using Tableau, SQLApril 20April 30Assignment #3Python for Data Analysis and VisualizationApril 30May 9Assignment #4Machine Learning, R LanguageMay 18May 25Project #2Movie-Rating PredictionsMay 18June 1Assignment #5Data Mining, Network AnalysisMay 28June 5CS102
ExamsExamDateExam #1During class time*May 12Exam #2During class time*June 9*Alternate times (but not dates) available by petition18CS102
Honor CodeUnder the Honor Code at Stanford, you are expected to submit yourown original work for assignments, projects, and exams. On manyoccasions when working on assignments or projects (but never exams!)it is useful to ask others – the instructor, the TAs, or other students –for hints, or to talk generally about aspects of the assignment. Suchactivity is both acceptable and encouraged, but you must indicate onall submitted work any assistance that you received. Any assistancereceived that is not given proper citation will be considered aviolation of the Honor Code. In any event, you are responsible forunderstanding, writing up, and being able to explain all work thatyou submit. The course staff will pursue aggressively all suspectedcases of Honor Code violations, and they will be handled throughofficial University channels.19CS102
Logistics§ Units - 4 for undergraduates, 3-4 for graduates§ WAYS requirement - Applied QuantitativeReasoning (WAY-AQR)§ Textbook? No Readings? Recommended§ Class “attendance” – ExpectedØ Hand-on activitiesØ Only cursory notesØ All class material game for exams20CS102
Logistics§ Grading - Letter grades calculated, C- or abovefor S, otherwise NC§ Grade weighting - 1/3 each assignments,projects, exams§ Graded on a curve? Not really§ Late policy - 10%/30% for 24/48 hours late,four free late days21CS102
Office HoursTA office hours – via Zoom 15 hours per week Times can varyAlways check thecourse calendarfor times and links!Prof. Widom office hours – via Zoom Wednesdays 4:00-5:00 PM (usually)22CS102
OnlineWebsite - http://cs102.stanford.eduCanvas - Zoom lectures and help sessions,recordings posted afterwardPiazza Announcements Q&A (private and public) DiscussionGradescope - Assignment submission & grading23CS102
For Thursday’s Class1) Get set up on Google Drive if you’re notalready2) Download Europe city temperatures data fromcourse website (three files)3) Copy data files into Google Drive, make sureyou can open with Google Sheets4) Be prepared to work on your computeralongside the videoSet-up help sessionOn Wednesday24CS102
CS 102: Working with DataTools and TechniquesQuestions?
introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Too