Self-service BI For Big Data Applications Using Apache Drill

Transcription

Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies1

Data Is Doubling Every Two YearsTotal Data StoredUnstructured data will accountfor more than 80% of the datacollected by organizationsSEMI-STRUCTUREDDATASTRUCTURED DATAITResources1980199020002010Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data2020 2015 MapR Technologies2

Data Increasingly Stored in Non-Relational tructured, semi-structured and unstructuredDevelopmentPlanned (release cycle months-years)Iterative (release cycle days-weeks)RELATIONAL DATABASESNON-RELATIONAL DATASTORESFixed schemaDBA controls structureDynamic / Flexible schemaApplication controls structureDatabase19801990200020102020 2015 MapR Technologies3

How To Bring SQL Into An Unstructured Future?Familiarity of SQL Agility & Flexibility of NoSQLSQLBI (Tableau, MicroStrategy, – HDFS (Parquet, JSON, etc.)– HBase– etc.) Low latency ScalabilityNo schema management No transform or silos of data 2015 MapR Technologies4

Industry's FirstSchema-free SQL enginefor Big Data 2015 MapR Technologies5

Apache Drill Brings Flexibility & PerformanceAccess to any data type, any data source Relational Nested data Schema-lessRapid time to insights Query data in-situ No Schemas required Easy to get startedIntegration with existing tools ANSI SQL BI tool integrationScale in all dimensions TB-PB of scale 1000’s of users 1000’s of nodesGranular Security Authentication Row/column level controls De-centralized 2015 MapR Technologies6

Agility & Business ValueExtending Self Service to Schema-free dataSchema-FreeData ExplorationAnalyst-driven withno IT dependencySelf-Service BISelf-Service BIAnalyst-driven withIT support for ETLIT-Driven BIIT-Driven BIIT-Driven BI1980s -1990s2000sNowIT-createdreports, spreadsheetsUse cases for BI 2015 MapR Technologies7

Enabling “As-It-Happens” Business with Instant AnalyticsTotal time to insight: weeks to monthsGovernedapproachHadoop dataData ew Business questionsSource data evolutionTotal time to insight: minutesExploratoryapproachHadoop dataUsers 2015 MapR Technologies8

Drill’s Role in the Enterprise Data ArchitectureRaw data“Optimized” dataCentrally-structureddata JSON, CSV, . Parquet, Schemas in HiveMetastoreRelational data Highly-structured dataExploration(known and unknown questions)Oracle, TeradataHive, Impala, Spark SQL 2015 MapR Technologies9

Business BenefitsRapid time-to-value for business analysts:SQL specialists and BI analysts can query any dataset—including complexnested data—instantly, versus waiting several weeks for data preparation by IT.Efficiency with easy governance for IT:IT can avoid unnecessary ETL cycles and schema maintenance activities, butstill ensure governance through easy-to-deploy granular access controls.Accelerated big data adoption for businesses:Organizations can use the existing and large SQL talent base and tools torapidly discover new business insights from big data. 2015 MapR Technologies10

Quick TourSelf-Service Data Exploration with Apache Drill2015MapRTechnologies 2015MapRTechnologies11

Data is growing fast and scattered in various silo’s:Customers CSV filesWebsite click logs JSON filesProduct database MapR-DB NoSQL 2015 MapR Technologies12

Apache Drill: SQL in a Non-Relational World2 Create and maintain schemas inadvance:––– DON’T WANTHDFS (Parquet, JSON, etc.)HBase Transform, copy, or move dataWANT ANSI SQLBI (Tableau, MicroStrategy, etc.)Low latencyScalabilityAgility 2015 MapR Technologies13

Closing The Gap Between Different Datasources using DrillCustomersWebsite click logsProduct database Cust id Trans id Prod id Customername Sess date Productname State Cust id Category Gender Device Price Agg rev Prod id Age Purch flag MembershipCSVJSONNoSQLHbase / MapR-DB 2015 MapR Technologies14

Demo2015MapRTechnologies 2015MapRTechnologies15

In lieu of the live demonstration please find links below: Apache Drill with Tableau (4:28):https://www.youtube.com/watch?v EH0 vRTAkyk Twitter analytics with Apache Drill and Microstrategy (5:02):https://www.youtube.com/watch?v -gqwgahtc2Y Analyzing JSON and Packet Data with SAP Lumira and ApacheDrill: https://www.youtube.com/watch?v s-fEATDI2wA 2015 MapR Technologies16

Access control that scalesUserPAM Authentication User ImpersonationUserUDrillView 1UFine-grained row andcolumn level access controlwith Drill Views – nocentralized securityrepository requiredUDrillView 2FilesHBaseHive 2015 MapR Technologies17

Granular security permissions through Drill viewsOwnerAdminsRaw File (/raw/cards.csv)NameCityStateCredit Card #DaveSan ess Analyst ViewData Scientist View Credit Card #DaveSan JoseCADaveSan O1374-1111-1111-1111Business AnalystNot a physical data copyData Scientist 2015 MapR Technologies18

Case Studies2015MapRTechnologies 2015MapRTechnologies19

Self-Service Data ExplorationDirect access to any data store from familiar tools- ANSI SQL compatibleRaw Data ExplorationJSON AnalyticsDWH Offload {JSON}, ParquetText Files Files DirectoriesHiveHBase 2015 MapR Technologies20

Data Warehouse Offload with Drill & MapRUltimately replace existing expensive SQL analytics platform with HadoopOBJECTIVES Mine credit card data and compares consumer shopping habits Require internal SQL specialists to gain instant access to data at all timesCHALLENGES Want to preserve instant access to data but a lower price point Need a system that is reliable, does not lose data and is fast Must be able to leverage the SQL skill sets in the companySOLUTION Apache Drill allows interactive analysis on large datasets with MapR as theunderlying platform that meets scale, reliability and data protection needs SQL users did not have to learn Pig, HiveQL or any other language andcontinue to use Tableau and Squirrel on top of DrillBusinessImpactPotential Hadoop and Drill dramatically reduce the price point to less than 1,000 / TB MapR platform with Drill delivers reliability and performance for the end users Leverage existing BI and SQL skill-sets on Hadoop without retraining 2015 MapR Technologies21

Telecom OEM application with Drill & MapRLeverage Drill’s JSON capabilities to create revenue-generating IOT servicesOBJECTIVES Offer service to mobile operators to proactively monitor and improve theirsubscriber experience Instant availability of data from diverse and disparate sourcesCHALLENGES Data is very diverse and dynamic using JSON as the key format Require interactive, ad-hoc analysis capabilities via standard BI tools suchas Tableau and SpotfireSOLUTION Apache Drill is being used to build the engine for the interactive experience Drill allows SQL queries on incoming JSON structures natively withoutrequiring any centralized schema definitions Drill connects to all BI tools using standard ODBC connectorsBusinessImpactPotential Provide new revenue-generating services to mobile operators Enable deeper, instant intelligence about the networks and users Reduce maintenance costs - no IT intervention required for schema changes 2015 MapR Technologies22

Recap: Apache Drill enables Self Service SQL for Big dataAGILITYINSTANT INSIGHTS TO BIG DATA Direct queries on selfdescribing data No schemas or ETLrequiredFLEXIBILITYFAMILIARITYONE INTERFACEFOR HADOOP & NOSQLEXISTING SKILLS &TECHNOLOGIES Query HBase andother NoSQL stores Use SQL to nativelyoperate on complexdata types (such asJSON) Leverage ANSI SQLskills and BI tools Plug-n-play with Hiveschema, file formats,UDF’s 2015 MapR Technologies23

Learn more and get started with Apache DrillNew to MapR and/or Drill?–––Get started with Free MapR On Demand trainingTest Drive Drill on cloud with Amazon EMRLearn how to use Drill with Hadoop using MapR sandboxReady to play with your data?–––Try out Apache Drill in 10 mins guide on your desktopDownload Drill for your MapR cluster and start exploration Use both with relational and JSON datasetsComprehensive tutorials and documentation availableAsk questions– user@drill.apache.org 2014 MapR Technologies24

Thank oom@mapr.commaprtechMapRTechnologiesmaprtech 2014 MapR Technologies25

Backup Slides2014MapRTechnologies 2014MapRTechnologies26

MapR with Drill is Top-Ranked SQL-on-HadoopKey: Number indicates companies relative strength across all vectors Size of ball indicates company’s relative strength along individual vectorLike other vendors’offerings, Drillhandles BI andinteractive queries withgreat aplomb, but it isdesigned to serve theseworkloads with datacomplexity that goeswell beyond the flatstructured data thatother SQL-onHadoop systems dealwith.Source: Gigaom Research, 2015 2014 MapR Technologies27

SQL technologies available on MapRDrillHiveImpalaSpark SQLSelf-service Data ExplorationInteractive BI / Ad-hoc queriesBatch/ ETL/ Long-running jobsInteractive BI / Ad-hoc queriesSQL as part of Spark pipelines/ Advanced analytic workflowsFiles supportParquet, JSON, Text, allHive file formatsYes (all Hive file formats)Yes (Parquet, Sequence,RC, Text, AVRO )Parquet, JSON, Text, allHive file formatsHBase/MapR-DBYesYes, performance issuesYes, performance issuesSame as HiveBeyond DynamicschemaYesNoNoLimitedHive Meta storeYesYesYesYesSQL supportANSI SQLHiveQLHiveQLANSI SQL (limited) &HiveQLClient supportODBC/JDBCODBC/JDBCODBC/JDBCODBC/JDBCBeyond mitedLatencyLowMediumLowLow (in-memory) / MediumConcurrencyHighMediumHighMediumKey Use CasesDataSourcesSQL /BI toolsPlatform 2014 MapR Technologies28

Key Reasons for Selecting MapRRespondents who had prior experience with another Hadoop distribution** Apache Hadoop, Cloudera or Hortonworks 2014 MapR Technologies29

MapR: The Only Platform Architected For Big, Fast, ReliableYour choiceof SQLAPACHE HADOOP AND OSS ECOSYSTEMBatchML, GraphSQLNoSQL &SearchStreamingTezDataIntegration& AccessWorkflow Provisioning& Data&Governance coordinationDrillSparkCascadingGraphXSpark SQLPigMLLibImpalaSolrStormHttpFSSavannahMapReduce v1 & opEXECUTION ENGINESTrillion filesSecurityOpen sourceProjects ‘inherit’MapR’s platformattributesSentryOozieZooKeeperDATA GOVERNANCE AND OPERATIONSMapR-FSMapR-DB(HDFS and NFS APIs)(High-Performance NoSQL)MapR Data Platform(Random Read/Write)First newdatabasedesignedforoperationalreal-time2-11x fasterMore efficient use of infrastructure(30-50% lower TCO)Industry’s only mirroring,point-in-time consistent snapshots 2014 MapR Technologies30

MapR: Best Solution for Customer SuccessHigh GrowthBest Product700 CustomersPremierInvestorsApache Open Source2X2X140%90%Growth In Direct CustomersGrowth In AnnualSubscriptions ( ACV)Dollar-based Net ExpansionSubscription LicensesSoftware Margins 2014 MapR Technologies31

MapR platform with Drill delivers reliability and performance for the end users . - Learn how to use Drill with Hadoop using MapR sandbox Ready to play with your data? - Try out Apache Drill in 10 mins guide on your desktop - Download Drill for your MapR cluster and start exploration Use both with relational and JSON datasets