TMX Ben Sharma - Data.zaloni

Transcription

Governing your cloud-based enterprise data lakeSelwyn CollacoChief Data OfficerTMX1Ben SharmaFounder and CEO,Zaloni

TMX Data StrategyUnlocking The Business Value Of DATA, AnEnterprise Asset2

Enterprise Data Strategy“Our strategy is that we use DATA as an Enterprise (TMX) Assetto unlock Business Value and enable Growth”1ConsolidatedData Asset32Secure &Accessible3AdvancedAnalytics& Visualization3

Foundational Enablers for Transforming TMX Data Assets21 3 4 543

Update On Our Journey“Unlocking business value through our DATA”CloudC360Client C360Holistic view of our customers through TMXAnalytics as a ServiceUse case driven search based analyticsEnterprise Data PlatformEnterprise Data Lake – Enterprise Wide data repository5 Ad-hoc analytical capability in milliseconds Real-time calculations White label for monetization opportunities Data Platform supporting Business InsightsTechnology enabling new Business Capabilities Research & Product Development Improved data fulfilment Advanced AnalyticsData GovernanceEnterprise wide BI and Data Visualization Reduce time and effort for Reporting & Analysis Self-serve BI and Empowered business users Improved Quality and Better insightsGovernance process framework for managingdataEnterprise BI Enable cross sell & up sell Enterprise-Wide Account management Enterprise Client Relationship view Data ownership Data classification Data quality

Enterprise Data & Analytics PlatformENABLING NEW CAPABILITIESSINGLE SOURCE OF DATA DATA MARKET PLACE 3600 VIEW ADVANCED ANALYTICS & VISUALIZATIONTMX Data AssetsTMX Enterprise Data RepositoryAnalyst WorkbenchData LakeTrading Data(Equities &Derivatives)Security RelatedDataAdvanced AnalyticsETF BrokerMarket DataClearing DataIngestGovernDistributeBatch/StreamingMaster andMetadataManagementSelf-ServiceData LineageDataConsumptionFile/ RelationalInternal/ ExternallyHosted/Third PartyListingInformationData ConsumptionBilling /Finance DataData Security &Access controlsData DiscoveryAnalyticsOrder bookVisualizationGRAPEVINEPlug & Play ApplicationsData PrepData Profiling &Data PreparationData CatalogData QualityCustom SolutionsAnalytics As a ServicePowered By:Customer Data(Salesforce)Alternate DataSelf-Service BI & ReportingGOVERNANCEDATA POLICIES & OVERSIGHT DATA DEFINITION DATA MANAGEMENT DATA CLASSIFICATION6Pre-defined AnalyticTemplates

Non-Invasive Data Governance ApproachIn order to develop a Data Governance structure that is right-sized for TMX, the proposed Enterprise Data Governance structure has been developed based onkey observations identified through stakeholder interviews, which provided insights in to key opportunities and challenges, data domain structure 1 and in-flight datainitiativesTMX GroupGlobal Solutions, Insights andAnalytics (GSIA)Global Equity Capital Markets(includes Equity Trading & Listings)Montreal Exchange (MX) &Regulatory (MXR)CDS / CDCCTSX TrustShorcanEnterprise StrategyGlobal Technology Solutions (GTS)FinanceMarketingLegal & ERMHuman ResourcesNote: Working sessions did not include Trayport as they were a recent acquisitionKey Opportunities andChallenges IdentifiedDuplicative DataThird Party Data ManagementData DomainsLack of data ownershipData accessibility and securityEnterprise DataGovernance StructureData Domains & In-flightInitiatives-Trading – Deriv.Trading – EquitiesTrading – FIListingsClearing – Deriv.Clearing – EquitiesRegulatory – Deriv.Finance-MarketingCustomerLegalHuman ResourcesRiskTech. OperationsExternal/AlternativeData Sharing CapabilitiesData retention / access to historicaldataAn ongoing responsibility of the DataGovernance bodies will be to prioritize andaddress data opportunities and challenges7In-flightInitiativesManual ProcessesClient 360Advance AnalyticsPlatformIssuer ServicesExcel.WorkdayTableauAtlasOwners for each domain have been identified and willserve as the foundation for the EDGC; an in-flightproject will be selected for preliminary implementation1Based on the identified key opportunities andchallenges, as well as TMX’s data domain structure, atwo-tier structure is createdData domains and data domain owners are further detailed in the TMX Data Asset Catalogue7

Enterprise Data Platform – Key Technology Options There are number of Technology solutions available in market place for building theEnterprise Data Platform. Technologies required for Data Platform falls in THREE categories. Chart describe the categories and key options within those categories.Technology PlatformBig Data Hadoop DistributionBig Data Management Tool On – Premise Servers Cloud – AWS*, Azure Cloudera (On-prem, Cloud) Hortonworks (On-prem, Cloud) EMR (AWS) Informatica (Enterprise ETL) IBM Data Stage (Enterprise ETL) Zaloni (DataLake)Technology Selected88

Transform your business with an integratedself-service data platformSeptember 12, 20189

Today’s data confusionZaloni Confidential and Proprietary - Provided under NDA10Zaloni Proprietary and Confidential

The key is an integrated self-service data platform1.Manage data acrosson-prem and cloudenvironments2.Metadata management tofind useful data and enablegovernance3.Automation of datapipelines for scale andefficiency4.Improve time to insight withself-service for dataconsumersAdvanced Analytics,BI, EnterpriseReporting,ApplicationsData erviceZaloniData Platform data platformComputeStorage11 Zaloni Inc. 2018, All rights reserved Zaloni proprietary

No matter where you are on your data lake journey.Improve governance andadoptionModernize your platform Need more agility and flexibilityNeed advanced analytics to reducetime to insightDo not want the complexity ofbuilding and integrating the platform Early Hadoop adoptersNot sure what data is in the swampNeed to right-size governanceData duplicated across pondsEnterprise adoptionNeed to demonstrate ROISelf-service data platform Acquire useful data from acrossenterprise Improved visibility and understandingvia metadata management Ensure security/privacy of sensitive data Scalable production data lake for newand improved business insightsGovernScale Out ArchitectureGoverned Data LakeTraditional ArchitectureData Swamp/Data PondsZaloni Confidential and Proprietary - Provided under NDAData Warehouse12Zaloni Proprietary and Confidential

Typical data lake reference architectureTransientLanding ZoneRaw ZoneTrusted ZoneRelational DataStores (OLTP/ODS/DW) Logs(or other unstructureddata) SensorsTemporary store ofsource data Consumers are IT,Data Stewards Implemented inhighly regulatedindustries Original source dataready for consumptionRefined ZoneConsumers aredevelopers, datastewards, some datascientistsSingle source of truthwith historySandbox(or other time series data)Social and shareddataData Lake Standardized on corporate governance/ quality policies Consumers are anyone with appropriate role-basedaccess Data required for LOB specific views - transformedfrom existing certified data Consumers are anyone with appropriate role-based access Ad-hoc exploratory analysis Data scientists, data consumers with proper privileges

Zaloni’s integrated self-service data platform (ZDP) accelerates time to insightProvides the foundation for your data initiativesGovernEnable EAGGENVNERGOEBL Batch IngestionStreaming IngestionMetadata CaptureAuto Discovery EngageData Quality Data MarketplaceData Lineage Self-Service OperationsData MasteringData Privacy/SecurityData EnrichmentData Lifecycle ManagementAENCloud, On-Premises, HybridInfrastructure and cloud-platform agnostic14Zaloni proprietary – do not duplicate without permission

Enable the Data Lake with ZDPScale out ingestion of data-Batch and streaming ingestionDB ingestion with support for full/incremental updatesMetadata capture and integration with Data PipelineIngestion Wizard simplifies ingestion and creates reusable workflowsAutomated data inventory-Crawls the data store to catalog datasetsAutomatically detects metadata in several casesWorkflow and orchestration-Robust workflow management feature withscalable and extensible architectureMonitor the health of the data lake-15Single unified log repository for all data lake operationsDashboards for operational health of the data lake

Govern the Data Lake with ZDPMetadata managementBusiness, technical and operational metadataCaptures lineage from ingestion toconsumptionIntegration with Enterprise Metadatarepositories-Security and access control--Masking and Tokenization of sensitive dataRBAC for data lake artifacts – Entities, DQ rules,ingestion with push down to underlying securityframeworkSupports kerberized environmentData quality-Rules Engine for DQDQ reporting and analysisAutomation of DQ remediation processData enrichment-Data lifecycle management-16Policy based DLMLeverages HDFS storage tiers andsupports S3 interface-Drag and drop enrichment of the dataSupport for data operations – JOIN, FILTER,UNION, etc.Out of the box format conversion, parsers withability to extend with custom implementations

Engage the Business with ZDPData marketplaceRich data catalog that aggregates business,technical and operational metadataGlobal search allows data consumers andproducers to search across data lake zones,projects, workflows, data quality rules,transformations and other related contentAnnotate, Tag, Rate and create custommetadataWorkspaces for collaboration----Data provisioning-17Shopping cart experienceSandbox provisioning into the data lake orRDBMS

AWS Data Lake Platform ComponentsConsumption LayerDatabasesZaloni DataPlatformMLEMRServing LayerESRedshiftRDSQuicksightDynamoDBZaloni aS APIsSourcesEMRProcessing LayerAthenaKinesisZaloni Confidential and Proprietary - Provided under NDAIngestionStorage LayerEFSS3GlacierData MgmtandGovernanceSecurity and Networking components are included but not shown in this architecture18Zaloni Proprietary and Confidential

Solution Architecture for S3 Data i Data PlatformIngest rageGatewayAthenaProcessQueryIngestZaloni Confidentialand Proprietary- Provided isFirehoseLandingzoneRawZaloni Proprietary and inedSandbox

Visit Booth #1115The Data ImperativeBen Sharma, CEO ZaloniThursday Keynotes, 10:20am – 3EComplimentary copy of“Architecting Data Lakes”2nd edition

Informatica (Enterprise ETL) IBM Data Stage (Enterprise ETL) Zaloni (DataLake) 8. 9 Transform your business with an integrated self-service data platform . -Masking and Tokenization of sensitive data-RBAC for data lake artifacts - Entities, DQ rules, ingestion with push down to underlying security