From Cost Center To Profit Center – Data Management Best .

Transcription

From Cost Center to Profit Center – Data Management BestApproachesJuly 13, 2016

2 RedPoint Global Inc. 2016Confidential

Overview of RedPoint GlobalLaunched in 2006Founded and staffed by industry veteransHeadquarters: Wellesley, MassachusettsOffices in US, UK, Australia, PhilippinesGlobal customer baseServes most major industries3 RedPoint Global Inc. 2016Confidential

RedPoint Data Management Ranks High in Gartner CriticalCapabilities ReportProduct or Service Scores forOperational/Transactional Data QualityProduct or Service Scores forData Integration4.214 RedPoint Global Inc. 20164.41Confidential

Big Data Can Become Big Information5 RedPoint Global Inc. 2016Confidential

What Needs to ofdataOrganizeLinkInformation6 RedPoint Global Inc. executionConfidential

Attributes of InformationRELEVANTInformationmust linformationisoften ll.ACCURATEThisoneisobvious.Ina curacyof aclearcostbenefit. formationbutthisisalsowhatrivestheuseifsuccessful7 RedPoint Global Inc. 2016Confidential

Current State of Data DUCTIONMODEDenormalizing filingNormalizingvalue8 RedPoint Global Inc. 2016Confidential

When Data Prep is ODETimespenttuningalgorithm:80%9 RedPoint Global Inc. 2016Confidential

The Elephant in the RoomSkillsGap SevereshortageofMRorSparkskilledresources Veryexpensiveresourcesandhardtoretain Inconsistentskillsleadtoinconsistentresults Underutilizesexistingresources 10 RedPoint Global Inc. 2016Maturity&GovernanceDataIntoInformation AnascenttechnologyecosystemaroundHadoop tionality Newapplicationsarenotenterpriseclass Legacyapplicationshavebuiltshorttermcapabilities ormation pectives endeduseofthedataConfidential

Key Data Mastering Functionality Needed for Fast Data PrepETL&ELTDataQuality Profiling,reads/writes,transformations SingleprojectforalljobsWebServicesIntegration Consumeandpublish HTTP/HTTPSprotocols XML/JSON/SOAPformats11 Cleansedata Parsing,correction Geo- ‐spatialanalysisProcessAutomation&Operations Jobscheduling,monitoring,notifications Centralpointofcontrol MetaDataManagement RedPoint Global Inc. 2016Integration&Matching Grouping FuzzymatchHadoopIntegration PureYARNintegrationintoHadoop NocodingdataqualityConfidentialMasterKeyManagement Createkeys Trackchanges MaintainmatchesovertimeJavaSDKLayer JavaSDKforrapiddevelopment Publicprojectincubatorforprojectsharing

Benchmarks – Project entirecodewhichtotalsnearly150lines):public  static  class  MapClassextends  Mapper WordOffset, Text, Text, IntWritable {private  final  static  String delimiters "',./ ?;:\"[]{}- ()&*% # !@ \\ «»¡ ¶·¿";private  final  static  IntWritable one new  IntWritable(1);private  Text word new  Text();public  void  map(WordOffset key, Text value, Context context)throws  IOException, InterruptedException {String line value.toString();StringTokenizer itr new  StringTokenizer(line, delimiters);while  (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}12PigSamplePigscriptwithouttheUDF:SET  pig.maxCombinedSplitSize 67108864SET  pig.splitCombination trueA LOAD  '/testdata/pg/*/*/*';B FOREACH A GENERATE FLATTEN(TOKENIZE((chararray) 0)) AS  word;C FOREACH B GENERATE UPPER(word) AS  word;D GROUP  C BY  word;E FOREACH D GENERATE COUNT(C) AS  occurrences, group;F ORDER  E BY  occurrences DESC;STORE F INTO  '/user/cleonardi/pg/pig-count'; 150 Lines of MR code 50 Lines of script code0 Lines of code6 hours of development3 hours of development15 minutes of development6 minutes runtime15 minutes runtime3 minutes runtimeNeeds extensiveoptimizationUser-defined functions neededbefore running scriptNo tuning oroptimization required RedPoint Global Inc. 2016Confidential

Intel’s POV on Data Quality Outside rDataQualityandMDMCostlyprocessintimeandmoney13 RedPoint Global Inc. gConfidential

RedPoint’s Marketing Data LakeDataIngestionDataLake14Specialized  AnalyticDatabases  &  CachesProduction  RDBMSDatabasesPersistent Entity Resolution, Linkage and KeyingYARN1 Matching, M DM In clusterProcessnativedocumentortabulardata RedPoint Global Inc. 2016nPurposeBuiltDataStructuresConfidential

For Additional can: ualityreport ViewCustomerCasestudies RequestaFreeTrial15 RedPoint Global Inc. ntial

Jul 06, 2016 · 11 RedPoint Global Inc. 2016 Confidential Key Data Mastering Functionality Needed for Fast Data Prep ETL*&*ELT Data*Quality Master*Key*Management Web*Services*Integration Integration*&*Matching Process*Automation* &Operations Profiling,*reads/writes,* transformations Single*project*for*all*jobs Cleanse*data Parsing,*correction