Azure Databricks

Transcription

Azure Databricks:The Best Platform to Run ML and AI

Organizations are looking to analytics to transform their businesses.With the help of concepts such as AI and machine learning, organizationssee not only ways to make huge gains in terms of reducing costs, but alsotransformative changes through new revenue streams.Yet only 1% of organizations today are able to take advantage of thecapabilities of AI. It is the siloed nature of analytics that stifles success.Azure Databricks accelerates innovation by breaking down the silosbetween people, processes and infrastructure.This whitepaper explains what makes Azure Databricks uniqueand how you can use it to transform your business and solve youranalytics problems.2

Transform your Business with AnalyticsThree key elements are needed for a successful analytics program:BIG DATACLOUD INFRASTRUCTUREARTIFICIAL INTELLIGENCEFirst of all, the ability to ingest and analyze all ofCloud infrastructure is required to make yourRealizing the huge innovative leaps that makeyour relevant data in your analytics processes isprocesses economical. It’s especially useful in bigtransformation possible requires the ability tokey. Many organizations find they can only accessdata analytics — where large analytics runs spin upbuild on the work of others. Data Engineers, Datacertain data silos, or can only load some of theand down constantly. With a cloud infrastructure,Scientists, and Business Analysts need to able todata for processing. Many organizations findyou have the ability to spin up massive analyticscollaborate to bring the right business problemthat they can only process a small percentagejobs, and then shut them down again, payingand question, the right data set, and the rightof their data on a weekly basis, causing themfor only processing you use. It requires the rightanalytical model in play to answer questions.to fall farther and farther behind in the effort toprocesses to scale your analytics, enablingThis requires the ability for people to collaborateunderstand what their data is telling them. Itscheduled analytics jobs to run to completionquickly and efficiently.takes the right infrastructure to enable access toreliably.your insights.3

ChallengesOrganizations come across fundamental challenges in achieving their analytics goals:DATA VOLUMESECURITY ASSURANCESILOED PROCESS AND PEOPLEManaging the volumes of data needed toA reliable, secure, and trusted cloud to runTechnology limitations negatively impacteffectively train machine and deep learningyour analytics. A lack of cohesive securityproductivity and collaboration of datamodels.features can put operations at risk, introducescience teams.vulnerabilities and jeopardize compliance.Bringing data and the analytics engine together is the key to this transformation.4

Challenge: Data VolumeThe flow of data seems never-ending and comesfrom internal and external sources alike such asline of business or CRM systems, social channels,internet bots, mobile devices and IoT sensors toname only a few. This “any data, anywhere” can overwhelm the operational capacityof an organization. Organizations want to use all of the data to build better analytical modelsand provide the context for decisions. AI systems need large volumes of data to test and refine models. In some cases, processing this volume of data for analytics takesweeks to analyze just a fraction of the data, which puts the organizationfurther and further behind.5

Challenge: Security AssuranceKeeping data safe is critical to an organization’sreputation. Without the appropriate securityinfrastructure, threats and vulnerabilities proliferateand the integrity of the entire company can becomecompromised. User authentication and security policy enforcement becomea performance bottleneck and stifle collaboration. Oversight of user activity can become resource intensive, negativelyimpacting productivity. Hindered productivity can often lead to users bypassing processes,which can introduce more threats and vulnerabilities.6

Challenge: Siloed Process and PeopleData Scientists use tools they are familiar withto create models and run analytics. Data Engineersuse a different set of tools to blend and clean data. Fragmented workflows can create massive inefficiencies for big data andartificial intelligence initiatives. A lack of process automation from dataingestion to production can greatly reduce the speed of innovation Siloed analytics kills the productivity of data science teams. It’s verydifficult to explore data, train AI models and solve business problemswith a disjointed analytics platform. The very speed of innovationrequires the team to pivot — as it understands the data, the questionthat is being pursued morphs and changes, requiring fast collaborationto keep up. Resource inefficiencies become performance bottlenecks and costdrivers when projects must be spun up and spun down with regularity.DATA SCIENTISTDATA ENGINEERBUSINESS ANALYSTA technology infrastructure that is unable to meet these demandscan cause productivity failures at scale.7

The Azure Databricks SolutionAzure Databricks is a fast, easy and collaborative Apache Spark -basedanalytics platform optimized for Azure. It was created to bring Databricks’Machine Learning, AI and Big Data technology to the trusted Azure cloudplatform. Designed in collaboration with the team started the Spark research project at UC Berkeley —which later became Apache Spark — for optimal performance on Azure cloud. Data pipelines ensure analytics can be performed against growing volumes of data from Matei Zaharia started the Spark Researchproject at UC Berekely in 2009. Replaced MapReduce as the de facto dataprocessing engine for big data analytics . Includes libraries for SQL, streaming, machinelearning and graph. Largest open source community in big data(1000 contributors from 250 orgs).multiple sources. Trusted by some of the largest enterprises(Netflix, Yahoo, Facebook, eBay, Alibaba).Uniquely streamlined workflows and an interactive workspaces enable collaboration between Databricks contributes 75% of the code,10x more than any other company.data scientists, data engineers, and business analysts. THE RAPID ASCENSION OF APACHE SPARKProvides native integration with Azure services such as enterprise-grade Azure security, Over 365,000 Meetup members aroundthe world.including Azure Active Directory integration, compliance, and enterprise-grade SLAs.8

How it Works: Azure DatabricksThe Azure Databricks service sitsinside the Azure cloud. You canaccess all your Azure data sourcesto apply the power of the AzureDatabricks analytics engine, anddistribute your results by writingto visual dashboards or back todata warehouses for access.Azure DatabricksCollaborative WorkspaceMachine learning modelsIoT streaming dataDATA ENGINEERDATA SCIENTISTBUSINESS ANALYSTDeploy Production Jobs & WorkflowsBI toolsCloud storageMULTI-STAGE-PIPELINESData warehousesHadoop storageEnhance ProductivityJOB SCHEDULERNOTIFICATION & LOGSData exportsOptimized Databricks Runtime EngineDATABRICKS I/OAPACHE SPARKSERVERLESSBuild on Secure & Trusted CloudREST APIsData warehousesScale Without Limits9

Azure Databricks:Scale without LimitsAzure Databricks is optimized from the ground up forperformance and cost-efficiency to scale your businessand handle the demands of Big Data. OPERATE AT MASSIVE SCALE WITHOUT LIMITS, GLOBALLYDatabricks enables your analytics processes to scale up and downautomatically, enabling you to process all of your data at once. ACCELERATE DATA PROCESSINGTake analytics processes from weeks to hours or minutes withthe fastest Spark engine built around speed, ease of use, andsophisticated analytics. OPTIMIZED PERFORMANCEImprove performance by as much as 10-100x over traditionalApache Spark deployments with performance optimizationsincluding caching, indexing, and advanced query optimization.10

Azure Databricks:Build on a Secure, Trusted CloudAzure Databricks is uniquely architected to protectyour data and business with enterprise-level securitythat aligns with any compliance requirements yourorganization may have. REGULATE ACCESSSet fine-grained user permissions to Azure Databricks Notebooks, clusters,jobs, and data. SIMPLIFY SECURITY AND IDENTITY CONTROLBuilt-in integration with Azure Active Directory takes advantage of yourexisting roles and security settings. BUILD WITH CONFIDENCEAzure Databricks is backed by unmatched support, compliance and SLAs onthe most-trusted cloud platform.11

Azure Databricks:Increase Productivity & CollaborationAzure Databricks delivers the best of Azure and ApacheSpark so that data science teams can be immediatelyproductive. INSTANT PRODUCTIVITYUsers can launch a new Spark environment on Azure with a single click. SEAMLESS COLLABORATIONA unified workspace provides interactive Notebooks and dashboardsfor real-time collaboration. Features such as seeing where each other isworking in Notebooks, to the ability to add comments, enables users towork synchronously or asynchronously. SHARABLE INSIGHTSWith rich Power BI integration, interactive visualizations can be sharedacross the organization, allowing for instant feedback and leading quicklyDATA SCIENTISTDATA ENGINEERto the next business question.BUSINESS ANALYST12

Sample Use CasesThese are just a few examples of the types of valuable analytics use cases you can address with Azure DatabricksPERSONALIZEDRECOMMENDATIONSEFFECTIVE CUSTOMERRETENTIONCustomer Profiles, Viewing History,Online Activity,Content Sources, ChannelsCustomer Profiles, Online Activity,Content Distribution, ServicesData Personalized Viewing andEngagement ExperienceCONSUMER ENGAGEMENTANALYSISRISK AND FRAUD ANALYSISINVENTORY ALLOCATIONTransaction Data, Demographics,Purchasing History, TrendsTransactions, Subscriptions,Demographics, Credit Data Quality of Service andOperational Efficiency Real-time AnomalyDetection Predict Audience Interests Demand-Elasticity Social Network Analysis Click-path Optimization Market Basket Analysis Fraud Prevention Network Performance &Optimization Next-best Content Analysis Customer Behavior Analysis Improved Real-time AdTargeting Click-through Analysis Customer Spend & RiskAnalysis Data Relationship MapsContent Metadata, Ratings,Comments, Social Media Activity Pricing Predictions Promotion events TimeSeries Analysis Nielsen Ratings andProjections Multi-channel marketingAttribution Mobile Spatial AnalyticsFaster Innovation forCustomer ExperienceImproved ConsumerOutcomes and IncreasedRevenueRisk ManagementWith Machine LearningPredictive AnalyticsTransforms GrowthImproved ConsumerEngagement With MachineLearning13

Azure DatabricksBy removing common analytics limitations, AzureDatabricks allows organizations to innovate faster thanever before. Azure Databricks addresses the data volume issue with a highly scalableanalytics engine. Processes that used to take weeks run in hours orminutes with Azure DatabricksIntegrated with Azure security, AzureDatabricks provides fine-grained security control that keeps data safewhile enhancing productivity. It enables people and processes to collaborate in a single Notebook,enabling faster iteration to get to the right answer quicker.14

How to Get StartedSTEP 1Acquire and prep your dataSTEP 2Prep and Train your ML modelsSTEP 3Deploy models to productionREADREADAzure Databricks for Data EngineeringREADSimplify Machine Learning Tech NoteHow to Productionize Your MachineLearning Models Using Apache SparkGet started today — see tutorials and example videos at databricks.com/azure15

Replaced MapReduce as the de facto data processing engine for big data analytics . Includes libraries for SQL, streaming, machine learning and graph. Largest open source community in big data (1000 contributors from 250 orgs). Trusted by some of the large