Big Data - Deloitte

Transcription

Big DataChallenges and Success FactorsDeloitte AnalyticsYour data, inside out

Big Data refers to the set of problems – and subsequent technologies developed tosolve them – that are hard or expensive to solve in traditional relational databasesBig Data: common definitionHandling 10 TB of dataVery high throughputsystemsData with a changingstructure or with nostructure at allBusiness requirementsdiffer from relationaldatabase modelMassive processing However, there is no single or agreed definition as well as each Enterprise is on adifferent maturity level in the potential Big Data journey 2013 Deloitte Touche Tohmatsu1

Today positioning in the Hype-Cycle is affected by the excess of marketingmessages coming both from true players and from illusionist players Overused marketing term, with solutions brought in as a plug&play panacea Over-hyped, with few actual client references in common business world Buzz concentrated on social media websites / search engines 2013 Deloitte Touche Tohmatsu2

Big Data is not just a marketing term: it is reality with a solid story and evolutionarypath. Just the adoption for common business is not mature yet Flexibility of Big Data technologies is traded with consistency and integrity of RDBMStechnologies Big Data technologies are complementary and not a replacement for RDBMS technologiesIncrease in Big Data powered projects2013Multiple Commercial distributions ofHadoop target the enterprise2010 Facebook announce their Hadoop cluster has 21 PB of storageJuly 27, 2011 the data had grown to 30 PBJune 13, 2012 the data had grown to 100 PBNovember 8, 2012, the warehouse grows by roughly half a PB per day2006Google releases paper describing Big TableYahoo announce 10,000core Linux Hadoop clusteris powering search2004Google releases paperdescribing MapReduceAttempts to use multipleRDBMS through sharding20082005TimeHadoop open-sourceimplementation ofMapReduce created 2013 Deloitte Touche Tohmatsu3

The challenge starts with processing: Big Data solutions can be classified based onthe expected service levels being addressed and the type of underlying data TProcessing Solutions- TraditionalITraditional GeneralPurpose ProcessingSystems- In-MemoryAppliancesDistributedComputing Architectures- MPP - MassivelyParallel Processing- DistributedCluster SystemsMCCommon Usage Scenarios for Processing SystemsTurnaround Time / Processing VelocityBatchStructuredNear Real Time - Real TimeT TraditionalM MPPVarietyData volumeSemistructuredUnstructuredM MPPMPP IIn-MemoryData volumeT Traditional M MPPC Distributed Cluster M Specialized MPP C Distributed ClusterData volumeData volumeC Distributed ClusterData volumeMSpecialized Systems (Hybrid Solutions)Data volumeCosts 2013 Deloitte Touche Tohmatsu4

and continues with analysis: from the analysis standpoint, Big Data can beapproached with several applications specialized for specific needsSearch TraditionalReporting Syntactic Analysis and text mining, with statistical models for keywords and keytopics detection Semantic Analysis with a Natural Language Processing engine and ontologies,to map concepts in a specific context Visual representation of Data to communicate information clearly and effectivelythrough graphical means for an immediate ‘capture’ Dynamic and easy-to-use reports for freely navigating across data with nopredefined paths Tools with the ability to combine structured and unstructured data fromdisparate systems and automatically organize information for search, discovery, andanalysis Static reporting for standardized access to institutional and predefined information 2013 Deloitte Touche Tohmatsu5

Big Data is often described by the 3 V’s: Velocity, Volume and Variety: each Vrepresents a hard problem for traditional databasesVelocity Volume Variety ValueVelocity: Frequency of generation isVolume: The growth of world data istoo high to be managed traditionallyexponential482M47K2.5835Hours of videouploaded everyminute to Youtubequeries onGoogle everyminuteApp downloadsper minute viaiTunesZettabytes ofworld data in2010Zettabytes ofworld data in2015Zettabytes ofworld data in2020Variety: Big Data can be structured and unstructuredWeb / Social MediaMachine to MachineBig Transaction DataBiometricHuman Generated 2013 Deloitte Touche Tohmatsu6

However, additional V’s are being proposed, to generate greater value: as the worldof data grows, so does the challenge.Velocity Volume VarietyVeracity: Establishing trust in data Veracity Viability ValueViability: Relevance and Feasibility?One Third of Businessleaders do not trustthe information theyuseUncertainty is due toinconsistency,ambiguity, latencyand approximationHypothesis validation to determineif the data will have ameaningful impactLong Term rewardsand better outcomesfrom hiddenrelationships in dataValue: Measuring return on investmentsCosts – there is a serious risk ofsimply creating Big Costs withoutcreating strong valueInsights – Sophisticated queries,counterintuitive insights and uniquelearning 2013 Deloitte Touche Tohmatsu7

Big Data can enhance customer view exploiting the potential of hidden meaningsMore data sourcesMore insights Flexibility of Big Data technologies allowsthe usage of both Big Data can provide a whole new set ofinformation, in order to reach an omnicomprehensive and multi-level customerview- Internal and external data- Structured and unstructured dataUnstructuredSocialmediaCustomer contact notesCustomersurvey textContractsScanned documentsWeb crawlingE-mailWeb Cust. ExperienceInternalLog vey resultsCustomer contact logsName, Address detailsAttitudinal Data EmailChat transcriptionCall Center notesWeb analyticsIn person dialoguesOptionsPreferencesNeeds and DesiresMarket ResearchSocial MediaExternalExternal (credit and risk)agency dataMarketing dataInteraction DataSocio-demographicPrice benchmark comparisonsBehavioural Data OrdersTransactionsPayment HistoryUsage HistoryDescriptive Data AttributesCharacteristicsSelf Declared InfoSocial Geo /Demographics infoTransaction historyStructured 2013 Deloitte Touche Tohmatsu8

The power of Big Data extends further away from a Social Media centric view: thefollowing industries have already gathered scenarios requiring Big Data solutionTMTFinancialServicesEnergySensors generated DataForums DataDigital Channels DataSocial Channels DataGeomappingMarket SurveyWeather ForecastsCompany Ecosystem DataVehicles Traffic DataInsuranceRetailManufacturing 2013 Deloitte Touche Tohmatsu9

For each industries is possible to evaluate enhancements based on Refinement,Exploration and Enrichment of existing ManufacturingInsurance Log Analysis / AdOptimization Cross Channel Analytics Loyalty ProgramOptimization Churn prediction Fraud scenariosidentification Risk Modeling & FraudIdentification Trade PerformanceAnalytics Production Optimization Consumption prediction Supply Chain Optimization Asset maintenance Multi-party Fraud ScenarioInvestigation Weather Impact AnalysisExplore Enrich Social Networks Analysis Event Analytics Brand and SentimentAnalysis Dynamic Pricing /recommendation Engines Session / ContentOptimization Fraud Detection Market needs Market Analysis Consumers behavior Targeting Surveillance and FraudDetection Customer Risk Analysis Real-time upsell, cross salesmarketing offers Grid Failure Prevention Smart Meters Individual Power Grid Customer Churn Analysis Dynamic Delivery Replacement parts Customer Risk Analysis Dynamic Insurance Plan(sensor enabled) Insurance PremiumDetermination 2013 Deloitte Touche Tohmatsu10

Big Data comes with lots of challenges: Big Data provides opportunities howeverthere are challenges that need to be addressed and overcome1/2 Determine a strategy how to leverage on the benefits of Big DataStrategy Determine business drivers and if Big Data can play a role in better insight Define criteria for evaluating return on investments Identify and acquire the skill sets required to understand and leverage BigData to add valueTalent Acquire Data Scientists, with expertise on math, statistics, data engineering,pattern recognition, advanced computing, visualization and modeling Organize business analysts team with strong knowledge of companyecosystem 2013 Deloitte Touche Tohmatsu11

Big Data comes with lots of challenges: Big Data provides opportunities howeverthere are challenges that need to be addressed and overcome2/2Scalability Flexibility of infrastructure to interact with extreme volume / variety of dataformats Cost and effort associated with scalabilityIntegration Increasing data volume, variety, and complexity results in increased timeand investments to remove barriers to compiling, managing and leveragingdata across multiple platforms /systemsDeployment Identifying the best software and hardware solutions and determining thebest overall infrastructure solution; internally, externally or using a combination Transitioning from legacy systems to newer technologyAnalytics Considerable time and money invested to create algorithms that scale to bigdata volume and variety and improve user experienceData Quality Compromise of quality due to volume and variety of data Cost of maintaining all data quality dimensions: Completeness, Validity,Integrity, Consistency, Timeliness, and AccuracyGovernance Identifying relevant data protection requirements and developing anappropriate governance strategy Reevaluation of internal and external data policies and regulatoryenvironmentPrivacy Privacy issues related to direct and indirect use of big data sources Evolving security implications of big data 2013 Deloitte Touche Tohmatsu12

These challenges require a strong roadmap, which begins with decision makersand their crunchy questions, and proceeds to data sources and technologies3 - Determine data sources4 - Identify / Define Use CasesAssess: Data and application landscape includingarchives Analytics and BI capabilities including skills Assess new technology adoptions IT strategy, priorities, policies, budget andinvestments Current projects Current data, analytics and BI problemsBased on the assessments and business prioritiesidentify and prioritize big data use cases2 - IdentifyOpportunitiesBrainstorm and askcrunchy questions6 - Adopt in ProductionPrioritize and implement successful,high value initiatives in production5 - Pilots and Prototypes1 - Strategic planIdentify tools, technologies and processes foruse cases and implement pilots and prototypesIdentify strategicpriorities 2013 Deloitte Touche Tohmatsu13

Every Big Data project starts with a short planning and scoping phase 12. ConductAnalysisMission; Vision;ValuesAnalysis ofexternal sourcesCreativityand ideasStrategicOptionsBI & rategic BigData PlanKey IssuesSituationAssessmentAnalysis ofinternal sourcesInterviewsWorkshops3. FormulateStrategy4. CreateTransformationplan1. Evaluate currentsituationThink outof the boxBrainstorm sessions,Workshops and AnalysesImplementationPlan Writing 2013 Deloitte Touche Tohmatsu14

and goes on identifying strategic opportunities asking crunchyquestions for “sticky” business issuesCustomersand socialmediaSustainabilityand supplychainEmployeesand risk2 What’s the buzz about your company online, and how could it impact sales? What are analysts saying about your organization? What about customersand online influencers? Who are the next 1,000 customers you’ll lose - and why? Which trade promotion programs have the highest impact on profitability? What factors most influence customer loyalty? Why? How do factors such as politics and demographics affect the price yourcustomers are willing to pay? Which factors have the most adverse effects on customer satisfaction? Which facilities are using more energy than they should?Which suppliers are at risk of going out of business?What is the impact of shipping costs on pricing?Which locations offer the best options for setting up your next distribution center? Which new-hire characteristics best reflect your organization’s riskintelligence profile? Which are most likely to steal from you? Why do high-potential employees leave your company? What would causethem to stay? 2013 Deloitte Touche Tohmatsu15

Bringing Big Data into the current Business Ecosystem leads to amultitude of difficult questions to be answered(1/2)3 What data sources should be collected and how can they be acquired efficiently? Should retention be provided for those data? How intensively will those data be processed? How is data quality managed across so many sources of data, many of which come fromoutside the organization, such as public social networks? What structure can be derived from non-traditional data sources (documents, Web logs, videostreams, etc.) to make storage, analysis, and ultimately decision-making easier? How can non-traditional unstructured data be integrated with data stored in traditionaltransactional systems? How can decision-makers comprehend the results of analyzing so much data quickly enoughto act? What data governance is appropriate when analysis is distributed, needs change and datadefinitions and schemas evolve over time? What architectures and algorithms can be used to decompose problems and data for rapidexecution in parallel environments? 2013 Deloitte Touche Tohmatsu16

Bringing Big Data into the current Business Ecosystem leads to amultitude of difficult questions to be answered(2/2)3 What levels of availability and reliability are possible in mission-critical applications, as datavolumes are so large? Is specialized hardware required for a particular need, or can low-cost commodi

Big Data is often described by the 3 V’s: Velocity, Volume and Variety: each V represents a hard problem for traditional databases Velocity Volume Variety Value Velocity: Frequency of generation is too high to be managed traditionally Volume: The growth of world data is exponential Variety: Big Data can be structured and unstructured 48 2M 47K