Building The Enterprise Data Lake With Cloudera & Cisco

Transcription

Building the Enterprise DataLake with Cloudera & CiscoPrepared by :Marilyn Tan, Country Manager SingaporeXue Daming, Senior Systems Engineer Cloudera, Inc. All rights reserved.1

Digital Transformation with Data Cloudera, Inc. All rights reserved.2

DATA is Transforming the World!CONNECTED WORLDMORE DATAINTERNET OF THINGSINDUSTRY 4.04thDATA ASPRODUCTION FACTORSMART APPsDIGITAL DEMOCRACY &SECURITYTHE NEW UXRISE OF OPEN SOURCE-Smart Things / Devices-New Analytics-Connected Experience-New Use Cases-Machine Learning & AI--New Architectures-New data sources-By 2020: 20.8b devices-Data VirtualizationUbiquitous computing:Everywhere & on everydevice (Voice, VR, AR,mobile, Wearables)-IoT: 1.7trillion in 2020-Data Science-Digital Sharing Economy:Open Data & Algorithms-Enterprise ready OpenSource (e.g. Apache)-Digital (distributed) Trust(esp. Blockchain) Cloudera, Inc. All rights reserved.3

The 9 year Cloudera journey 200820112012201320142016CLOUDERA FOUNDED BYMIKE OLSON,AMR AWADALLAH &JEFF HAMMERBACHER,CHRISTOPHE BISCIGLIAJOINED BYDOUG CUTTING (2009)CLOUDERA REACHES 100PRODUCTION CUSTOMERSCLOUDERA ENTERPRISE 4THE STANDARD FOR HADOOPIN THE ENTERPRISECLOUDERA EXPANDSBEYOND MR AND HBASE,INTRODUCING IMPALA,SOLR AND SPARKCLOUDERA FOCUSSES ONSECURITY, ANDGOVERNANCE WITHNAVIGATOR 2 AND CLOUDWITH DIRECTORNAVIGATOR OPTIMIZERGENERAL AVAILABILITY,IMROVED CLOUD COVERAGEWITH AWS, AZURE AND GCP CloudsCLOUDERAENTERPRISE4CDH / CMENTERPRISEDATA HUBAltusCDSW20092011201220142015CDH: FIRST COMMERICALAPACHE HADOOPDISTRIBUTION &CLOUDERA MANAGERCLOUDERA UNIVERSITYEXPANDS TO 140 COUNTRIESSUPPORT IMPLEMENTSFOLLOW THE SUN MODELCLOUDERA CONNECTREACHES 300 PARTNERSACROSS SI, HARDEWARE,AND SOFTWARE PARTNERSCLOUDERA INTRODUCESTHE ENTERPRISE DATA HUBAND CLOUDERAENTERPRISE 5CLOUDERA INCLUDESKAFKA, KUDU ANDRECORD SERVICE WITHINCLOUDERA ENTERPRISE2017 CLOUDERA ACQUIRED FASTFORWARD LABS, ANNOUNCED PaaSALTUS, DATA SCIENCE WORKBENCH,SHARED DATA EXPERIENCE (SDX)AND MORE TO COME! Cloudera, Inc. All rights reserved.4

What Happen Next: A Decade of Hadoop18 Projectsand beyond Cloudera, Inc. All rights reserved.5

Gartner Analytics Ascendancy ModelWhat willmake it happened?What willhappened?ValueWhy did HDifficulty Cloudera, Inc. All rights reserved.6

Cloudera & Cisco Enterprise Data Lake Innovation Cloudera, Inc. All rights reserved.7

Cisco UCS Integrated Infrastructure with Clouderafor IoTData AnalyticsReal-TimeData Inject (CoAP/MQTT.XMPP)FogC800/UCS Mini/UCS C240ISR 8x9 with 4G LTE and Dual 802.11na/g/n (WiFi) RadiosManaged by Cisco FogDirectorReal-Time Data StoreUCS C220/C240Data ProcessingKafkaCisco UCS C240DATAAggregatorCisco UCS C240Speed LayerBatch LayerBatchBig Data StoreUCS C240/C3160Serving LayerCisco UCS at all layers, fully validated architectures with all major players Cloudera, Inc. All rights reserved.8

Fabric Centric DesignHigh Performance40 GB/s Ethernet; 320 GB/sper ChassisUnified FabricSingle Cable for Network, Storage, andManagement TrafficUCS ManagerManagementEthernetStorageEasy to ScaleSingle Point of Management: AddCables for Bandwidthvs. Fabric Type Cloudera, Inc. All rights reserved.9

Management SimplicityBig Data: Management ConsistencyHundreds of ServersThousands of management pointsSimplified ScalabilityEasily Scale your infrastructure from fewservers to thousands of servers with afully Integrated InfrastructureUCS Service ProfileCisco ACI Application ProfileCentralized ManagementService Profiles for Servers Manage all servers centrallyApplication Profiles for Network Manage all network centrally Cloudera, Inc. All rights reserved.10

The enterprise platform for machine learningPATTERNRECOGNITIONDRIVE CUSTOMER INSIGHTSMarket segmentationCustomer 360Next best offerChurn analysis & preventionDETECTIONPROTECT BUSINESSCybersecurityFraudAnti-money launderingRisk modeling & assessmentSPAM detection500 CUSTOMERS RUNONPREDICTIONCONNECT PRODUCTS & SERVICES (IoT)Predictive maintenanceGenomics & personalized medicinePredicting and preventing diseaseNatural language Cloudera, Inc. All rights reserved.11

Machine learning requires a complete stack.Business IntegrationPrepare Load external dataProcess structured dataProcess unstructured dataProcess streaming dataCleanse dataVectorize data Batch ProcessingStream ProcessingInteractive SQLSearch ToolsText/Image ProcessingAnalyze Data Diagnose/treat data issuesDesign experimentsPartition dataEngineer featuresTrain and validate modelsEvaluate and assess models Analytic Languages ML LibrariesDeploy Publish to BI/VizDeploy to batch scoringDeploy to real-time scoringDeploy to scoring APIManage modelsMonitor model performance BI/VizInteractive SQLBatch ProcessingStream ProcessingOperational DBAdministration, Governance and Security Cloudera, Inc. All rights reserved.12

A complete, integrated enterprise platformCloudera Enterprise Data HubCloudera Distribution for Hadoop Cloudera, Inc. All rights reserved.13

Cloudera Data Science WorkbenchSupports data science end-to-end Full access to data Secure self-service provisioning Containerized environments Supports Python, R, and Scala Automates: WorkflowVersion controlCollaborationSharing Cloudera, Inc. All rights reserved.14

CDSW BenefitsData Scientists Web browser, no desktop footprint Use R, Python, or Scala Install any library or framework Isolated project environments Direct access to data in secure clusters Share insights with team Reproducible, collaborative research Automate and monitor data pipelines Built-in job schedulingIT Support self-service data science Full platform security Kerberos authentication Run on-premises or in the cloud Cloudera, Inc. All rights reserved.15

Deep learning in Cloudera with Apache SparkSpark Packages Two packages: CaffeOnSpark TensorFlowOnSpark Developed by Yahoo Python and Scala APIs All DL architectures Integrated pipeline Run on existing clusters Training and inferenceDL4J Open source DL libraryDeveloped by SkymindBuilt on JVMsSupports CPUs and GPUsJava, Scala, Python APIsTraining and inferenceImports models from: TensorFlow Caffe Torch TheanoRuns on existing clustersBigDL Deep learning frameworkDeveloped by IntelSupports CPUs onlyLeverages Intel MKLScala, Python APIsImports models from: TensorFlow Caffe TorchRuns on existing clusters Cloudera, Inc. All rights reserved.16

New! Accelerated deep learning on-demand with GPUs“Our data scientists want GPUs, but we can’tfind a way to deliver multi-tenancy.If they go to the cloud on their own, it’sexpensive and we lose governance.” Extend existing CDSW benefits to GPUoptimized deep learning tools Schedule & share GPU resources Train on GPUs, deploy on CPUs Works on-premises or cloudMulti-tenant GPU support on-premises or cloudData ScienceWorkbenchCDHCDHCPUCPUCPUGPUsingle-node trainingdistributedtraining, scoring Cloudera, Inc. All rights reserved.17

Enterprise Data Lake Architecture Cloudera, Inc. All rights reserved.18

Canonical Ingestion & Spark Streaming Analytics with CiscoBig Data Analytics Platform Integrate with Apache Spark Streaming for real-time analysis of data Write back to Kafka for further processing or to send to an application layer Cloudera, Inc. All rights reserved.19

Proposed Architecture for Enterprise Data PlatformAI PlatformData SourcesMeteorologicalDataSensors ageGeospatialAdvanced AnalyticsData VisualizationDataWarehouseDataWarehouseBW HANABPCany SAP NWDataWarehouseEnterpriseData Warehouse Cloudera, Inc. All rights reserved.20

Big Data Blueprints: Cisco Validated DesignsDesigns Big DataCisco Validated Designs with ClouderaWhat you getIndustry-leading partnershipsTested and validated reference architecturesto meet performance, capacity, and scaleJoint engineering labExtensive options for data management(Hadoop, MPP, and NoSQL) to meet yourbusiness needsSolution bundles optimized for cost ofownershipand ease of orderingSolution designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. Cloudera, Inc. All rights reserved.21

Our Customers’ Success Stories Cloudera, Inc. All rights reserved.22

CASE STUDYDATA-DRIVENPRODUCTSTRANSPORTATION» PREDICTIVE MAINTENANCE» IMPROVED SERVICE» DATA DRIVEN PRODUCTSUsing Predictive Maintenance to ImprovePerformance and Reduce Fleet Downtime OnCommand Connection is collectingtelematics and geolocation data acrossthe fleet Reduced maintenance costs to .03 permile from .12- .15 per mile Centralizing data from 13 systems withvarying frequency and semanticdefinitions Real-time visibility of 250,000 trucks inorder to improve uptime and vehicleperformance Cloudera, Inc. All rights reserved.23

CASE STUDYDATA-DRIVENPRODUCTSPROCESSTRAVEL & TRANSPORTATION» SMART BUILDINGS» PREDICTIVE MAINTENANCE» ADVANCED ANALYTICSSmart Buildings - Preventative MaintenanceUsing Sensors & IoT to Improve PassengerSafety and Airport EfficiencyChallenge: Improve traveler satisfaction and safety,by reducing downtime for criticaloperational machinerySolution: Cloudera on Azure to capture, secure,and correlate sensor (IoT) data collectedfrom escalators, elevators, and baggagecarouselsProvide necessary fixes to preventunplanned downtime Cloudera, Inc. All rights reserved.24

CASE STUDY2016 Data Impact Award WinnerState of Kentucky Department ofTransportationSmart CitiesEnabling the State of Kentucky manage snowand ice events in real timeChallenge: Needed more efficient approach toinclement weather road managementSolution: Real-time weather response system thatincorporates real-time data from Waze,HERE, ESRI’s GeoEvent processor, andAutomatic Vehicle Locations (providingsensor data from salt trucks). KYTC aggregates 15-20 million recordsevery day and process more than amillion records per second. Cloudera, Inc. All rights reserved.25

Data Protection & Governance0 – Data inIsolation1 – Behavior andTransaction Fusion2 – Expanded DataSurface Area3 – EDH: SecureData VaultFully Compliance ReadyAudit-Ready & ProtectedData in application silosLimited InsightsSummarizedBasic Security ControlsAuthorizationAuthenticationComprehensive AuditingData Security & GovernanceLineage VisibilityMetadata DiscoveryEncryption & KeyManagementAudit Ready For:PCIPIIFull encryption, keymanagement, transparency,and enforcement for all dataat-rest and data-in-motionSecurity Compliance & Risk Mitigation Cloudera, Inc. All rights reserved.26

Why ClouderaThe Platform for Next-Generation AnalyticsCloudera Enterprise delivers the capabilities required by the largest enterprises, spanning analytics,security, governance, and management. We make Hadoop fast, easy, and secure.The Experience to Help You SucceedNo one knows Hadoop like Cloudera.As the first Hadoop company, Cloudera is the world’s leading contributor to and provider of enterpriseHadoop, with experience you can rely on to help you succeed.Open InnovationOur unique hybrid open source strategy enables us to lead the enterprise expansion of the Hadoopecosystem, driving innovative new capabilities and open standards in the community. Cloudera, Inc. All rights reserved.27

Thank youmarilyn@cloudera.com 65 9822 2338daming@cloudera.com 65 9368 2316 Cloudera, Inc. All rights reserved.28

Proposed Architecture for Enterprise Data Platform Data Warehouse Data War eho use Meteorological Data Sources Data . Tested and validated reference architectures to meet performance, capacity, and scale . , Cloudera is th