Big Data Is Not About The Data!

Transcription

Big Data is Not About the Data!Gary KingInstitute for Quantitative Social ScienceHarvard University(Talk at the Golden Seeds Innovation Summit, New York City 1/30/2013)Gary King (Harvard)Big Analytics1 / 10

The Data in Big DataGary King (Harvard)Big Analytics2 / 10

The Data in Big Data1Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsGary King (Harvard)Big Analytics2 / 10

The Data in Big Data12Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGary King (Harvard)Big Analytics2 / 10

The Data in Big Data123Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasGary King (Harvard)Big Analytics2 / 10

The Data in Big Data1234Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesGary King (Harvard)Big Analytics2 / 10

The Data in Big Data12345Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesGary King (Harvard)Big Analytics2 / 10

The Data in Big Data123456Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesSatellite imagery: increasing in scope, resolution, and availability.Gary King (Harvard)Big Analytics2 / 10

The Data in Big Data1234567Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesSatellite imagery: increasing in scope, resolution, and availability.Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, campaign contributionsGary King (Harvard)Big Analytics2 / 10

The Data in Big Data12345678Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesSatellite imagery: increasing in scope, resolution, and availability.Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, campaign contributionsWeb surfing artifacts: clicks, searches, and advertising clickthroughs,multiplayer games, virtual worldsGary King (Harvard)Big Analytics2 / 10

The Data in Big Data123456789Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesSatellite imagery: increasing in scope, resolution, and availability.Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, campaign contributionsWeb surfing artifacts: clicks, searches, and advertising clickthroughs,multiplayer games, virtual worlds 90% of all data ever created was created last yearGary King (Harvard)Big Analytics2 / 10

The Data in Big Data12345678910Unstructured text: emails, speeches, reports, social media updates,web pages, newspapers, scholarly literature, product reviewsCommerce: credit cards, sales, real estate transactions, RFIDsGeographic location: cell phones, Fastlane, garage camerasHealth information: digital medical records, hospital admittances,accelerometers & other devices in cell phonesBiological sciences: genomics, proteomics, metabolomics, brainimaging producing huge numbers of person-level variablesSatellite imagery: increasing in scope, resolution, and availability.Electoral activity: ballot images, precinct-level results, individual-levelregistration, primary participation, campaign contributionsWeb surfing artifacts: clicks, searches, and advertising clickthroughs,multiplayer games, virtual worlds 90% of all data ever created was created last yearPopular versions: MoneyBall, SuperCrunchers, The NumeratiGary King (Harvard)Big Analytics2 / 10

The Value in Big Data: the AnalyticsData:Gary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsOutput can be highly customizedGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsOutput can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithmGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsOutput can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm 2M computer v. 2 hours of algorithm designGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsOutput can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm 2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededGary King (Harvard)Big Analytics3 / 10

The Value in Big Data: the AnalyticsData:becoming commoditizedeasy to come by; often a free byproduct of IT improvementsIgnore it & your company will still have more every yearWith a bit of effort: huge data production increasesWhere the Value is: the AnalyticsOutput can be highly customizedMoore’s law (doubling speed/power every 18 months) v. 1000xincrease with one algorithm 2M computer v. 2 hours of algorithm designLow cost; little infrastructure; mostly human capital neededInnovative analytics: enormously better than off-the-shelf approachesGary King (Harvard)Big Analytics3 / 10

Examples of what’s now possibleGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists:Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise:Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts:Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksEconomic development in developing countries:Gary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksEconomic development in developing countries: Dubious ornonexistent governmental statisticsGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksEconomic development in developing countries: Dubious orsatellite images ofnonexistent governmental statisticshuman-generated light at night, road networks, other infrastructureGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksEconomic development in developing countries: Dubious orsatellite images ofnonexistent governmental statisticshuman-generated light at night, road networks, other infrastructureExpert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics winsGary King (Harvard)Big Analytics4 / 10

Examples of what’s now possibleOpinions of activists: A few thousand interviewsbillions ofpolitical opinions in social media posts (1B every 3 Days)Exercise: A survey: “How many times did you exercise last week?500K people carrying cell phones with accelerometersSocial contacts: A survey: “Please tell me your 5 best friends”continuous record of phone calls, emails, text messages, bluetooth,social media connections, address booksEconomic development in developing countries: Dubious orsatellite images ofnonexistent governmental statisticshuman-generated light at night, road networks, other infrastructureExpert-vs-analytics contests: Whenever enough information isquantified, a right answer exists, and good analytics are applied:analytics winsIn each: without new analytics, the data are uselessGary King (Harvard)Big Analytics4 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Gary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisGary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsGary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Gary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Key to both methods: classifying (deaths, social media posts)Gary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’sGary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’sModern Data Analytics: New method for estimating %’s led to:Gary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’sModern Data Analytics: New method for estimating %’s led to:1Gary King (Harvard)Big Analytics5 / 10

How to Read a Billion Blog Posts& Classify Deaths w/o PhysiciansExamples of Bad Analytics:Physicians’ “Verbal Autopsy” analysisSentiment analysis via word countsDifferent problems, Same Analytics Solution:Key to both methods: classifying (deaths, social media posts)Key to both goals: estimating %’sModern Data Analytics: New method for estimating %’s led to:12Worldwide cause-of-death estimates forGary King (Harvard)Big Analytics5 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts:Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticNew customized analytics we developed:Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticNew customized analytics we developed:Logical consistency (e.g., older people have higher mortality)Gary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticNew customized analytics we developed:Logical consistency (e.g., older people have higher mortality)More accurate forecastsGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticNew customized analytics we developed:Logical consistency (e.g., older people have higher mortality)More accurate forecastsTrust fund needs 1 trillion more than SSA thoughtGary King (Harvard)Big Analytics6 / 10

The Solvency of Social SecuritySuccessful: single largest government program; lifted a wholegeneration out of poverty; extremely popularSolvency: depends on mortality forecasts: If retirees receive benefitslonger than expected, the Trust Fund runs outSSA data: little change other than updates for 75 yearsSSA analytics:Few statistical improvements for 75 yearsIgnore risk factors (smoking, obesity)Mostly informal (subject to error & political influence)Forecasts: inaccurate, inconsistent, overly optimisticNew customized analytics we developed:Logical consistency (e.g., older people have higher mortality)More accurate forecastsTrust fund needs 1 trillion more than SSA thoughtOther applications to insurance industry, public health, etc.Gary King (Harvard)Big Analytics6 / 10

Reading and Writing TechnologyWriting Technology: Big changesGary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperGary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcGary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Gary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverGary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverNow:Gary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverNow:How often do you read a book cover-to-cover for work?Gary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverNow:How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverNow:How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)Gary King (Harvard)Big Analytics7 / 10

Reading and Writing TechnologyWriting Technology: Big changesThen: Quill tip pen & expensive paperNow: Microsoft Word, Google docs, etcReading Technology: Little change (ripe for disruption)Then: 50, 100, 300 years ago: Get book; read cover to coverNow:How often do you read a book cover-to-cover for work?We collect 100s of documents, read a few, delude ourselves intothinking we understand them allGoal: understanding from unstructured data (hardest part of big data)More data isn’t helpful! Novel analytics needed.Gary King (Harvard)Big Analytics7 / 10

Computer-Assisted Reading (Consilience)Gary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Gary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve informationGary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve informationBad Analytics:Gary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve informationBad Analytics:Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Gary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve informationBad Analytics:Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Automated “Cluster Analysis”: Many widely available, but nonework (computers don’t know what you want!)Gary King (Harvard)Big Analytics8 / 10

Computer-Assisted Reading (Consilience)To understand many documents, humans create categories torepresent conceptualization, insight, etc.Most firms: impose fixed categorizations to tally customercomplaints, sort reports, retrieve informationBad Analytics:Unassisted Human Categorization: time consuming; huge efforts tryingnot to innovate!Fully Auto

The Value in Big Data: the Analytics Data: becoming commoditized easy to come by; often a free byproduct of IT improvements Ignore it & your company will still have more every year With a bit of effort: huge data production increases Where the Value is: the