Hadoop Use Cases With MicroStrategy - Meetup

Transcription

Hadoop Use Cases with egyitakahashi@microstrategy.com

2

Big Data DriversGrowth:o volumeofdatao numbersofsourceso ng(“wrangling”)capabiliEes

Financial Services – Big Data odes,risklevels,etc.inoneplace.Re- rty 1Client 2Client 1Account 1Account 2MarketingHousehold 14Party 2Client 3Account 3Account 4MarketingHousehold 2

ystDataScien;stUser PersonasSkillsetsAccess ase &Hadoop(SQL, HQL)Programmingand Scripting(Java, Perl,Python, etc.)StatisticsTools (R, SAS)ODBC (SQL)HIVE (HQL)in-memory data results / cubesSubject Areas in HadoopSummary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Data Assets5Confidential & Private

usersophisEcaEonBusiness Analyst Responsible for acquiring,synthesizing, evaluating thedata and representing theneeds of the business.Skillsets Deep knowledge of theorganization and themechanics of how theenterprise makes money Able to run reports from theBI tool via visual dataexploration, ad hoc reports Can narrate the state of thebusiness based onhistorical and present-daydata volumesData Assets Requires fast andresponsive data access Primary touchpoint will bein-memory result sets IntelligenceTechnicalDataAnalystDatabase &Hadoop(SQL, HQL)DataScien;stProgrammingand Scripting(Java, Perl,Python, etc.)StatisticsTools (R, SAS)ODBC (SQL)HIVE (HQL)in-memory data results / cubesSubject Areas in HadoopSummary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Confidential & Private

usersophisEcaEonQuality Assurance Responsible for reconcilingthe data at the variouslevelsSkillsets Able to run reports from theBI tool via visual dataexploration, ad hoc reports SQL , HQL Data analysis ETL systemsData Assets Must be able to run checksof the in-memory data setsagainst the underlyingrelational tables Understands ETLprocesses and basicbusiness telligenceTechnicalDataAnalystDatabase &Hadoop(SQL, HQL)DataScien;stProgrammingand Scripting(Java, Perl,Python, etc.)StatisticsTools (R, SAS)ODBC (SQL)HIVE (HQL)in-memory data results / cubesSubject Areas in HadoopSummary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Confidential & Private

usersophisEcaEonTechnical Data Analyst Serves as a bridgebetween IT and thebusiness. Understands thefundamentals of thebusiness but also hasstrong technical skillsSkillsets Pig Understanding ofMapReduce SQL, HQL R or SASData Assets Has access to all datasources, and is comfortableblending data results fromeach Understands the datastructures in the RDBMSand Hadoop Can create, drop, truncatetables and populate tablesin his/her own ntelligenceTechnicalDataAnalystDatabase &Hadoop(SQL, HQL)DataScien;stProgrammingand Scripting(Java, Perl,Python, etc.)StatisticsTools (R, SAS)ODBC (SQL)HIVE (HQL)in-memory data results / cubesSubject Areas in HadoopSummary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Confidential & Private

usersophisEcaEonData Scientist Advanced data analyst whohas a programming andstatistics background, aswell as a goodunderstanding of thebusiness and thefundamental drivers.Skillsets Machine learning –supervised, unsupervised Advanced statisticalanalysis Facility with different typesof programming languagesand ingesting various formsof data (JSON, XML,unstructured text or logs)Data Assets Incorporates external dataincluding APIs (SOAP,REST, etc) and blends withthe core data repositories Has rights to create,truncate, and drop tables inhis/her own eBusinessIntelligenceTechnicalDataAnalystDatabase &Hadoop(SQL, HQL)DataScien;stProgrammingand Scripting(Java, Perl,Python, etc.)StatisticsTools (R, SAS)ODBC (SQL)HIVE (HQL)in-memory data results / cubesSubject Areas in HadoopSummary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Confidential & Private

WORKFLOW SCENARIOUser (technical dataanalyst) starts with highlevel data and drills tolower level detail inOracle , then Hadoop1. User who is interested ina subject areas runs acube-based dashboard.2. User navigates from thedashboard to a gridreport and manipulatesthe data (sorts, in-linefiltering, pivot) until asubset of the data hasbeen chunked out.3. Depending on thelevel(s) of the report, theuser can right-click on anattribute and drill off ofthe cube to lower leveldata4. At the lowest level theuser can drill across via atemplate, passing a setof values to a FFSQLbased report. Thetemplate connects toHadoop either throughthe Hive connector or anative driver.10Leveragesthedrill- ‐to- yst1Business Intelligence – MicroStrategyMicroStrategy iCube (data exploration, in-memory filtering) à Grid Report (SQL) à Drill-to-template (HQL)ODBC (SQL)HIVE (HQL)2in-memory data results / cubes4Subject Areas in Hadoop34Summary Level RDBMSSubject AreaSubject Area(Net New Money)(FA)Flat TablesSubject AreaSubject Area(Revenue)(Accounts)Confidential & Private

Life Sciences – Analytics I/DataAccessSymphony Health (SHA)DatabaseSourceSubjectAreasAnaly;csSHAIMS ortalitymedical blic Government11Confidential & Private

The Data Warehouse – built for standard reporting and routineanalysesoTake the known lines of thought and build highly engineered systems around themoTake only the data required to create the final informationoDocument the final data in the reports and ad hoc environmentsoMaintain tight control over, and access to, final data sets in the Data Warehouse and Data MartsoData can have a significant lag but is acceptable for operational purposesoMaintain a high level of QA throughout data creationSource 1Source suranceReportsDataWarehouseAd hocSource ddedinthecodeConfidential & yaccompaniedbyagoodbusinessglossaryformetrics

Analytic Environments – built for rapid acquisition and agility for analyticsoStart with hypotheses and iterate through them - faster iterations lead to more insightsoSearch and explore all of the data across its lifecycle stages to find the answersoPerform work in a sandbox outside the DWs and DMsoData sets are not perfect but are directionally correctoDeliver results quicklyoWork with current information for immediate business decisionsSearch and BrowseRaw SourceDataIntegrated andStandardizedData3rd PartyAnalyticsSandbox13AnalyticsConfidential & PrivateData Warehouse

Data ts/Explore- ‐the- ‐Data/Dataset- ‐Downloads.html#14Confidential & Private

Oct 27, 2015 · Hadoop Use Cases with MicroStrategy . Tools (R, SAS) Programming and Scripting (Java, Perl, Python, etc.) in-memory data results / cubes . Hadoop either through the Hive connector or a native driver. Subject Areas in