Master Degree In - Data Science

Transcription

Master Degree inUniversity of Milano-Bicocca1

Departments involved Dipartimento di Informatica, Sistemistica eComunicazione (DISCo) Dipartimento di Economia, Metodi quantitativi eStrategie di impresa Dipartimento di Statistica e Metodi quantitativi2

The three stakeholders, students, companies,teachers: how to boost cooperation among them?Coursesand LabsEarly BirdInitiativeKaggle3

Students4

Statistics on enrolled Students - 1Per area culturaleEconomia e caMatematicaScienze ComFilosofia0,90,90,90,95,57,338,516,519,35

Statistics on enrolled Students - 2Per area geografica di provenienza1,19,97,714,355,45,525,3BicoccaAltre MilanoLombardiaaltre NordCentroSud6

The three stakeholders, students, companies,teachers: how to boost cooperation among them?Coursesand Labs7

From S. Ceri, EDBT Venice, March 2017How big is the genome?As a string: 700MByteAs raw data: 200 GbyteAs called mutations: 125MByteHow many genomes will be sequencedin 5 years?Estimates: order of 5-20 MillionsVery big data problem

From small data to big dataBroadness of observed realityTimeDepth in knowledge of observed reality9

10

From small data to big dataBroadness of observed realityTimeDepth in knowledge of observed reality11

1 among 4Courses1 among 3CYB – Cybersecurityfor data scienceIP – Signal and imageprocessing1 among 3DM&DV - Datamanagementand visualizationMLDM – MachineLearning &Decision ModelsJSI – Juridical &Social Issues inInformationSocietySTDA – StatisticalmodellingTIDS – Technological infrastructures for data scienceFC – Foundam.in Comp.Sc.1 among 3DS – DataSemanticsIS – InformationSystems1 among 2FS – Found.in Stat. & P.WM&CM – Webmarketing &CommunicationManagementTMS – Textminingand searchEW – Expert WeekData Science Lab inEnvironment & PhysicsBDGIS – Big Data in Geographical Information SystemsBDPhis - Big data managementand analysis in physics researchData Science Lab in biosciencesBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineASM – Advanced statisticalmethodologies for Big DataBDM1 – - Big Datain Health CareSDM – Streamingdata management andtime series analysisBDM2 - Medical imaging& big dataEDS – Economics for DataScience1 among 3SMA – Social MediaAnalyticsSS –Service ScienceDSL - Data ScienceLabBI - BusinessIntelligenceLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial ServicesFirst year1 among 3Second year12

Courses1 among 41 among 3Common coursesCYB – Cybersecurityfor data scienceAnalytical trackBusiness trackAnalyticaltrack1 among 3DM&DV - Datamanagementand visualizationMLDM – MachineLearning &Decision ModelsJSI – Juridical &Social Issues inInformationSocietySTDA – StatisticalmodellingFC – Foundam.in Comp.Sc.IP – Signal and imageprocessingTIDS – Technological infrastructures for data science1 among 3DS – DataSemanticsIS – InformationSystems1 among 2FS – Found.in Stat. & P.WM&CM – Webmarketing &CommunicationManagementDSL - Data ScienceLabTMS – Textminingand searchBDPhis - Big data managementand analysis in physics researchData Science Lab in biosciencesBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareSDM – Streamingdata management andtime series analysisBDM2 - Medical imaging& big data1 among 3BusinessTrackBDGIS – Big Data in Geographical Information SystemsASM – Advanced statisticalmethodologies for Big DataEDS – Economics for DataScienceEW – Expert WeekData Science Lab inEnvironment & PhysicsSMA – Social MediaAnalyticsSS –Service ScienceBI - BusinessIntelligenceLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial ServicesFirst year1 among 3Second year13

Scientific areas1 among 41 among 3Computer ScienceCYB – Cybersecurityfor data scienceStatisticsSocioEconomicMixed1 among 3DM&DV - Datamanagementand visualizationAnalyticaltrackASM – Advancedstatistical methodologiesfor Big DataDS – DataSemanticsIS – InformationSystemsJSI – Juridical &Social Issues inInformationSociety1 among 2FS – Found.in Stat. & Pr.STDA – StatisticalmodellingWM&CM – Webmarketing &CommunicationManagementTIDS – Technological infrastructures for data science1 among 3FC – Foundam.in Comp.Sc.MLDM – MachineLearning &Decision ModelsIP – Signal and imageprocessingSDM – Streamingdata management andtime series analysisTMS – Text miningand searchEDS – Economics for DataScience1 among 3EW – Expert WeekBusinessTrackSMA – Social MediaAnalyticsSS –Service ScienceBI - BusinessIntelligenceDSL - Data Science LabData Science Lab inEnvironment & PhysicsBDGIS – Big Data in Geographical Information SystemsBDPhis - Big data managementand analysis in physics researchData Science Lab in biosciencesBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareBDM2 - Medical imaging& big dataLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial ServicesFirst year1 among 3Second year14

DSc – a dynamic, evolving science1 among 3StatisticsCYB – Cybersecurityfor data scienceComputer ScienceSocioEconomicMixed1 among 3DM&DV - Datamanagementand visualizationAnalyticaltrackASM – Advancedstatistical methodologiesfor Big DataDS – DataSemanticsIS – InformationSystemsJSI – Juridical &Social Issues inInformationSociety1 among 2FS – Found.in Stat. & P.STDA – StatisticalmodellingWM&CM – Webmarketing &CommunicationManagementDSL - Data Science LabTIDS – Technological infrastructures for data science1 among 3FC – Foundam.in Comp.Sc.MLDM – MachineLearning &Decision ModelsIP – Signal and imageprocessingSDM – Streamingdata management andtime series analysisTMS – Text miningand searchEDS – Economics for DataScience1 among 3EW – Expert WeekSMA – Social MediaAnalyticsBusinessTrackSS –Service ScienceBI - BusinessIntelligence1 among 4Data Science Lab inEnvironment & PhysicsBDGIS – Big Data in Geographical Information SystemsBDPhis - Big data managementand analysis in physics researchData Science Lab in biosciencesBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareBDM2 - Medical imaging& big dataLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial ServicesFirst year1 among 3Second year15

Four V’s of Big Data Volume Velocity Variety Value16

Change of Paradigm in Data Management SystemsVolumeBig DataNoSQL Hadoop MapReduceHadoop & Spark(plus: distributed file system)SmallDataSQL Traditional DBMSLong-termchanging dataSpark(plus: in-memory processing)StreamingdataVelocity17

The four Vs:1. VOLume2. VELocity3. VARiety4. VALue1 among 41 among 3DM&DV - Datamanagementand visualizationVALMLDM – MachineLearning &Decision ModelsVALJSI – Juridical &Social Issues inInformationSocietySTDA – StatisticalmodellingDSL1 – DataScience Lab 1VELCYB – Cybersecurityfor data scienceAnalyticaltrack1 among 3VOLData Science Lab inEnvironment & PhysicsFC – Foundam.in Comp.Sc.DS – DataSemanticsIP – Signal and imageprocessingVELTIDS – Technological infrastructures for data science1 among 3VOLASM – Advanced statisticalmethodologies for Big DataVALIS – InformationSystems1 among 2FS – Found.in Stat. & PrSDM – Streamingdata management andtime series analysisVOLTMS – Textminingand search VELVALWM&CM – Webmarketing &CommunicationManagementVOLVELVALEDS – Economics for DataScienceEW – Expert Week1 among 3BusinessTrackSMA – Social MediaAnalyticsSS –Service ScienceBI - BusinessIntelligenceData Science LabBDPhis - Big data managementand analysis in physics researchData Science Lab in biosciencesVOLVARBDGIS – Big Data in Geographical Information SystemsVALBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareBDM2 - Medical imaging& big dataIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPS - Big Data in Public andSocial Services1 among 3Second yearVELVALBDPHe – - Big Datain Public HealthFirst yearVOLVAL

Data types Tables Relational (keys, referential integrity, etc.) Weak semantics (e.g. csv) Texts Loosely structured Semistructured (e.g. XML) Signals (from the Internet of Things) Images (e.g. X-ray, security, etc.) Graphs Mathematical - syntactic Knowledge - semantic Open data & Linked Open Data Maps & Remote sensing & Georeferenced data Mixed (Web data)20

Main Data Types1 among 41 among 3Tables & SeriesSignals and imagesKnowledge graphsLoosely Str.&Semistr. texts AnalyticalMaps & Georef. datatrackNot relevantDM&DV - Datamanagementand visualizationMLDM – MachineLearning &Decision ModelsJSI – Juridical &Social Issues inInformationSocietySTDA – Statisticalmodelling1 among 3FS – Foundam.in InformaticsDS – DataSemanticsIS – InformationSystems1 among 2FS – Found.in Stat. & PrWM&CM – Webmarketing &CommunicationManagementCYB – Cybersecurityfor data scienceBDGIS – Big Data in Geographical Information SystemsIP – Signal and imageprocessingBDPhis - Big data managementand analysis in physics researchIP – Signal and imageprocessingData Science Lab in biosciencesTIDS – Technological infrastructures for data science1 among 3MSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareSDM – Streamingdata management andtime series analysisBDM2 - Medical imaging& big dataEDS – Economics for DataScienceEW – Expert WeekSMA – Social MediaAnalytics1 among 3SS –Service ScienceBI - BusinessIntelligenceDSL1 – DataScience Lab 1First yearBDB&B – Big data inbiotechnology & biosciencesASM – Advanced statisticalmethodologies for Big DataTMS – Textminingand searchBusinessTrackData Science Lab inEnvironment & PhysicsLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial ServicesDSL1 – DataScience Lab 11 among 3Second year21

Traditional Analysis Life cycle vsnew Analysis life cycle of digital (big) dataBig DataLife cycleTraditionallife tionAnalysisDiffusionCross cutting activitiesSEMANTICSQUALITYLEARNINGVALUE22

Phases of the life cycle and main feedbacks1. Access2. Management3. Visualization4. Analysis5. Diffusion23

Phases of the life cycle - detail1.Access & Acquisition Search Selection Acquisition2. Management Filtering Quality assessment Semantic interpretation & enrichment Matching & integration3. Visualization4. Analysis Descriptive analysis: what happened or what is happening. Diagnostic analysis: why it happened or why it is happening. Predictive analysis: what will happen Prescriptive analysis : what to do to achieve the goal5. Diffusion24

Main Phases of the Life CycleAccess & AcquisitionIP – Signal and imageprocessingVisualizationAnalysisDiffusion & UsageAllDM&DV - DatavisualizationMLDM – MachineLearning &Decision ModelsSTDA – StatisticalmodellingAnalyticaltrackTIDS – Technological infrastructures for data science1 among 3ASM – Advanced statisticalmethodologies for Big Data1 among 3FS – Foundam.in InformaticsSDM – Streamingdata managementDS – DataSemanticsSDM - Time series analysisIS – InformationSystems1 among 2JSI – Juridical &Social Issues inInformationSociety1 among 3CYB – Cybersecurityfor data scienceManagementDM&DV - Datamanagement1 among 4FS – Found.in Stat. & PCWM&CM – Webmarketing &CommunicationManagementTMS – Textminingand searchEDS – Economics for DataScience1 among 3EW – Expert WeekBusinessTrackSMA – Social MediaAnalyticsSS –Service ScienceBI - BusinessIntelligenceDSL1 – DataScience Lab 1First yearData Science Lab inEnvironment & PhysicsBDGIS – Big Data in Geographical Information SystemsBDPhis - Big datamanagement and analysis inphysics researchData Science Lab in biosciencesBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological dataData Science Lab in MedicineBDM1 – - Big Datain Health CareBDM2 - Medical imaging& big dataLabsIL – Industry LabData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceBDBP - Big data inBehavioural PsycologyData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthBDPS - Big Data in Public andSocial Services1 among 3Second year25

Main Platforms and imePhytonJSI – Juridical &Social Issues inInformationSocietyKaggleSTDA – StatisticalmodellingRBDPhis - Big data managementand analysis in physics researchIP – Signal and imageprocessingMLDM – MachineLearning &Decision ModelsRBDGIS – Big Data in Geographical Information SystemsCYB – Cybersecurityfor data scienceSparkSASAnalyticaltrack1 among 3DM&DV - Datamanagement andvisualizationHadoopData Science Lab inEnvironment & Physics1 among 3KaggleBPMNRDF & SparqlKaggleHadoop1 among 4FC – Foundam.in Comp.Sc.DS – DataSemanticsBDB&B – Big data inbiotechnology & biosciencesMSBD - Making sense ofbiological data1 among 3Data Science Lab in MedicineASM – Advanced statisticalmethodologies for Big DataSDM – Streamingdata management andtime series analysisRDF & SparqlBDM1 – - Big Datain Health CareBDM2 - Medical imaging& big dataRBPMNIS – InformationSystemsFS – Found.in Stat. & P.1 among 3KaggleDSL - Data ScienceLabSASKaggleData Science Lab in Business & MarketingBDBF – - Big data in Businessand FinanceSMA – Social MediaAnalyticsEW – Expert WeekBDBP - Big data inBehavioural PsycologySS –Service ScienceBusinessTrackBI - BusinessIntelligenceData Science Lab in Public Policies & ServicesBDPHe – - Big Datain Public HealthPhytonFirst yearSecond yearLabsIL – Industry LabEDS – Economics for DataScienceTMS – Textminingand search1 among 2WM&CM – Webmarketing &CommunicationManagementData Science Lab in biosciencesSparkTIDS – Technological infrastructures for data scienceSQLPhytonKaggleHadoopBDPS - Big Data in Public andSocial Services1 among 326

The three stakeholders, students, companies,teachers: how to boost cooperation among them?Kaggle27

Kaggle: a platformmanaging data challengesIt allows to: Participate in Dataset-specific competitions orga-nizedby Companies Grow up Data Science skills through practicalexperience on Datasets provided by Companies Get Academic Credits Know about Job Offers Prof. Stella will provide further detail soon28

Students Portfolio ce-portfolio-newcomers-guide-data-scientist29

The three stakeholders, students, companies,teachers: how to boost cooperation among them?Early BirdInitiative30

Early Bird InitiativeOpportunities of collaboration for companies Training activities1. Testimonials and Case studies2. Teaching in the first year «Data Science Lab» and in thesecond year «Industry Lab»3. Hackathons4. Certifications Internships Final thesis31

Other types of contributions from companiesTo Students Scolarships Grants for1. Internships in Italian companies2. Internships in European universities or companies (Erasmusprograms)3. Internships in extra-European universities or companies (Extraprograms) Degree AwardsTraining services Access to big data infrastructuresCommunication and Marketing Endorsement Donations (with tax benefit)32

Erasmus and Double Degrees Strong effort to establish Erasmus agreementsand Double Degrees Prof. Pasi will provide further detail soon33

Start-ups All students should consider the opportunityto create a startup This is one of the topics of the expert week34

Want to know more? Access http://datascience.disco.unimib.it/35

TimetableSeewww.disco.unimib.it36

for Big Data. SDM - Streaming . data management and . time series analysis. 1 among 3. EDS - Economics for Data Science. Ana. lytical. track. Data Science Lab in Business & Marketing. Data Science Lab in Public Policies & Services. Scientific areas 1 among 4. DSL - Data Science Lab. Labs. Statistics. Computer Science. SocioEconomic. 14 FC .