Building Streaming Data Pipelines Using Azure Cloud

Transcription

Rolf TesmerMicrosoft AustraliaAzure Cloud Solution Architect Data Analytics AILinked In: https://www.linkedin.com/in/rolftesmer/Blog: https://mrfoxsql.wordpress.com/Building Streaming Data PipelinesUsing Azure Cloud Services

My Assumptions for TodayAzure Portal

Basic Program for Today

Introduction

Why is data so important?Because there’s just so much of it!CLOUDMOBILE

On-Prem vs IaaS vs PaaS vs SaaS – Which One?Serverless

Azure Services – Which re101Cards/default.html

And so what exactly is a “data pipeline” anyway?a pipeline is a set of data processing elements connected in series, wherethe output of one element is the input of the next one. The elements of apipeline are often executed in parallel or in time-sliced fashiona data pipeline is the software that consolidates data frommultiple sources and makes it available to be usedstrategically

atao cesAnalyticalata to eeal i e essa en estionEdgeComputingt eaocessinchest m/en-us/services/iot-edge/Analyticsandepo tinBATCHLAYERatchocessinSPEEDLAYERata to a eSERVING LAYERWhat is the LAMBDA architecture?

Where did this come from, and why do we care?1. Customers are on a multi-year transformational journey2. Many data sources are not static or at rest3. Solutions cannot wait for data to be landed before using it4. b ildin pipelines Historically Complex, costly capital investment, time consuming Today Fast, si ple, “fit fo p pose” se vices, sa e data platformAs modern day Data Professionals we have to deal with it

What exactly is theData PlatformNowadays?

What was the data platform?Up till 5 years ago it was typically a relational platform and included relational-like services (OLTP, OLAP, DW, ETL, MDM, ) and often on-prem, or in a hosted DC and rarely hosted in external public cloud providersOccasionally included special projects (ie Big Data, NoSQL, -exactly-is-the-data-platform-nowadays/

What is the data platform now?

Data Pipeline Services in Azure

OperationalisedData ScienceHigh SpeedNoSQL DistributedData LayerIntelligentServicesBusinessWorkflow /LogicMachineLearningCognitiveAPI CallsReportSelectiveLoadLogic AppSQL DataWarehouseStructuredStorageASDB ASDWIncomingData FlowEvent HubsCloud DataIngestion PointAEH ata ArchiveAnalyticsReportData LakeFull LoadScheduledPullStorage blobTrendReportReal-TimeReportStream EEDLAYERServerlessCustom Code &FunctionsMachine LearningAPI CallsData FactoryReporting /VisualisationDataMovement /OrchestrationBATCH LAYERRidiculous Example ArchitectureSERVINGLAYERWhat are some of the Azure pipeline services?

Demos / Examples:Lets see some Azure pipelines!

Demonstration Mobile G-Force Solution - !REAL-TIMEEvent Telemetry ReportStreaming DatasetStorage blobAzure.Event ArchiveJSONEventJSONMobileIoT HubON-DEMANDEvent Trend ReportSQL QueryAll EventsCSVEventJSONStream AnalyticsG-Force PredictionAPIMachineLearningSQL DatabaseAlert EventCSVAlert EventsG-Force 3JSONNew EventTriggerJSONEvent HubsEVENT-DRIVENTwilio Phone CallFunction

Demonstration Mobile G-Force Solution - !

Other Examples High Scale Web Search TelemetryEVENT AEH ASA 2 secData ArchiveAvg: 56GB/dayASA SQL 5 secAVRO Event ArchiveBatch(COLD Path)Logic AppStatus Report 1/hourEvent HubMax: 3900/secAvg: 2300/secServerServerServerService Bus Queue 1/hourJSON ReportJSONEventsearchServerBlob Store / ADLSSHEventStoreSH Data Streaming(West Europe)Logic AppSHLogicAppPostEventsStream AnalyticsTelemetry Input3900/secEvent HubSHIngressServiceBus QSHSBQEgressPower BIOn DemandSQL SPref dataJSON EventsStream(HOT Path)JSONEvent TypeTabularEventsStream oviersStream AnalyticsPower BIAggregation Path3900/Sec 1/minJSON EventsStream(HOT ortsOn-Demandreports1 Min WindowSQL DBSHEventHistory(Short Term Store)Azure SQL DatabaseMax: 3900/secAvg: 2300/sec(5 days 1b rows)(1 year 72b rows)Realtime Stream(200K rows moving window)Stream AnalyticsSHEgressPBIHourlyalertsPower BI1/min(200K tumbling window)Real TimeDashboards(troy.earle)

Web Search Telemetry – Total Events (By Day)AVG Workload 1,410,000,000 / week 201,000,000 / day 8,392,000 / hour 139,000 / min 2,330 / sec

Web Search Telemetry – Events/Sec (By Hour)600% increaseover 9 hours

When is scale an issue? What do yoean by “scale”?IoT Device – Streaming Telemetry Workload:29,000 / sec 2,505,600,000 / day 914,544,000,000 / year Lambda principals still apply!! Ingestion handlin the “peak” ate witho t latency/delay/e oProcessing/Speed need data granularity, or are aggregate windows OKStorage/Batch need adhoc on-demand data engineering, or recurringServing what granularity is important, what decisions will be madeQuestion: can you pre-p ocess at “the edge”?

So where to from here?Wrap up and summary

What’s next fo the data platfo? and what does this mean for us Data Professionals?4. Customer “expectation” This is the “Domain of the Data Professional”

Where can I try this out – or learn more? Vehicle nd-Forecasting-3 Developing IoT Solutions with Azure ions-azure-iot-microsoft-dev225xProcessing Real-Time Data Streams in rating Big Data with Azure Data g-data-azure-data-microsoft-dat223-3x-0

Your Homework Twitter Social Media AnalyticsAzure Public CloudAzure Cognitive ServicesRegion: West USPower BI DesktopOn-PremAzure Machine LearningRegion: Southeast Asia(optional)ML ModelsSocial Media PipelineRegion: Australia SETweets@Handles#TagsTwitterText Analytic APIAzure MachineLearningSentimentKey Phrases(optional)On demandData ScienceLogic AppCheck TwitterEvery 3 minFunction.Net (C#)NewPower BIReportsC LevelDashboardsTweet DataSentimentKey PhrasesCallOffice 365Power BIDataConnectionExecutiveMarketingDashboardsAzure SQL DBSentiment Schemapowerbi.comSocial solution-templates/brand-management-twitter/

[End of Presentation]

Appendix

Appendix and Lambda -exactly-is-the-data-platform-nowadays/

Other Examples Business Incident Management(push)In-Stream ReportingRegion: Southeast Asia(pull)Live/Batch ML(R Language API calls)Azure MachineLearning / R(future option)Region: Australia South EastOn demandData Science(future option)powerbi.comAzure CloudOn demandData Science(future option)(SaaS)MobilityExternal – BusinessEvent MessageJSONSecureendpointAzure Event HubBusiness AppDatabase(SQL)(pull)JSON msg(max 256KB)1 Event/Msg(PaaS)stgtables(push)Event msg(tabular data)Azure StreamAnalytics(PaaS)dwtables(pull)SQL SSIS(tabular data)Azure SQL DW / DB(PaaS)Azure VM (IaaS)(pull)Live/BatchReporting(tabular data)UsersSQL SSAS (cubes)SQL SSRSSQL SSISCortanaSQL AgentScheduler(push)Original JSON msgSQL AgentScheduler(folder structure example)\EventMess ages\yyyy\mm\dd ID Seq msg.cs vdataAzure Blob Store(RA-GRS) (private)(PaaS)HDInsight(on demandanalytics)(future option)(PaaS)(pull)SQL SSISExternal Ref DataExternal Data SourcesNew Reportsand DatasetsPower BI Desktop(on-prem author)

Where can I find even more examples of this se?categories ["10"]&orderby freshness desc

cture/regions/

Microsoft Azure Data Servicestransactional processingrich querymanaged as a serviceelastic scaleschema-free data modelInternet accessible http/restarbitrary data formats

Azure Relational Database Platform (PaaS)Scale/Sizing Based on “Throughput Units”Power BI, App Services, Data Factory,Analytics, ML, Cognitive, Bot CosmosDB(NoSQL)JSON Doc DBDatabaseServicesPlatformSQL DataWarehouseSQLDatabaseScale/Sizing Based on Cores ING!Intelligent: Advisors, Tuning, MonitoringFlexible: On-demand scaling, Resource governanceTrusted: HA/DR, Backup/Restore, Security, Audit, IsolationAzure ComputeAzure StorageGlobal Azure with 50 -stores

Get Azure ?Your Enterprise Agreement (EA)Various options – Currently being setup and configured for MLCWould be linked to your corporate identity/login/accountAzure 30 day free account up to 260 (time boxed to 30 days)https://azure.microsoft.com/en-au/free/Would be linked to your personal identity/login/accountMSDNThee are free monthly Azure credits within MSDN subscriptions. Rolls over month to ber-offers/msdn-benefits-details/Would be linked to your corporate identity/login/accountResearch Programs and GrantsFree credits available for specific research programs rogram/microsoft-azure-for-research/You can apply for a Microsoft Azure for Research Grant default.aspx

Learn Azure ? Free Online-TrainingedX – Free online courses on Microsoft Azure45 Free Azure Courses https://www.edx.org/course?search query azureIntroduction to Azure - azure-microsoft-azure201xArchitecting Azure Solutions - azure-solutions-microsoft-dev205bx-3Developing Azure Solutions - ure-solutions-microsoft-dev233-1Developing Apps and Bots - apps-bots-microsoft-dat211x-1Deliver a DW in the Cloud - se-cloud-microsoft-dat220x-0Delivery Big Data Solutions with Machine Learning - utions-azure-microsoft-dat228xProvision SQL Databases in Azure - azure-sql-server-microsoft-dat219x-0

Learn Azure ? Free 1x day In-Person y/events/?Country Australia&query azure discovery day

Learn Azure ? Patterns, Blogs and FeedbackNeed Azure Patterns and Guidance? Check out the Azure Architecture Centre - ! e/ Reference Architectures - e/reference-architectures/Application Architectures - e/guide/Azure Design Patterns - e/patterns/Azure Service Roadmap - https://azure.microsoft.com/en-us/roadmap/Data Architecture Guide - e/data-guide/Need Some Azure Updates? Subscribe to the Global Azure Blog & Update Feed - ! Blog - https://azure.microsoft.com/en-us/blog/ Updates - https://azure.microsoft.com/en-us/updates/You Have a Cool New Azure Idea? Submit it to Azure Ideas & Feedback - ! eedback

Certify in Azure ?Azure Certifications Overview - tion-overview.aspx Detailed Guide - 3E78546B-4C71-9EC3-2CB7751444BF/MCP Cert Paths 01 01 18.pdfRecommendedRecommended

Certify in Azure ?MCSE Cloud Platform & Infrastructure

Certify in Azure ?MCSE Data Management & Analytics

Where did this come from, and why do we care? 1. Customers are on a multi-year transformational journey 2. Many data sources are not static or at rest 3. Solutions cannot wait for data to be landed before using it 4 b ildin pipelines Historically Complex, costly capital investment, time consuming Today F