IMS Data Integration With Hadoop - KIESSLICH CONSULTING

Transcription

IMS Data Integration with HadoopKaren DurwardInfoSphere Product Manager17/03/2015*IMS Technical Symposium 2015

z/OS Structured Data Integration for Big Data The Big Data Landscape Introduction to Hadoop What, Why, How The IMS community cares about Hadoop because Exponential value at the intersection of Hadoop and Structured Data z/OS Data Integration with Hadoop The Requirements and The Journey Hadoop on z? z/OS data integration InfoSphere System z Connector for Hadoop Fast, Easy, Low Investment z/OS data delivery to HDFS and Hive InfoSphere Data Replication Keeping your Big Data Up-to-Date2 2015 IBM Corporation

Hadoop Created to Go Beyond Traditional Database TechnologiesFoundation for Exploratory Analytics Hadoop was pioneered by Google and Yahoo!to address issues they were having withthen-available database technology: Data volumes could not be costeffectively managed using databasetechnologies Analyzing larger volumes of data canprovide better results than samplingsmaller amounts Insights needed to be mined fromunstructured data types Data was being explored to understandits potential value to the business33Typical Use CasesAnalyze a Variety of InformationNovel analytics on a broad set of mixedinformation that could not be analyzed beforeAnalyze Extreme Volumes ofInformationCost-efficiently process and analyze petabytesof informationDiscovery and ExperimentationQuick and easy sandbox to explore data anddetermine its value 2015 IBM Corporation

What is Hadoop?Divide and Conquer! Rapidly evolving open source software framework for creatingand using hardware clusters to process vast amounts of data! 2.# version framework consists of: Common Core: the basic modules (libraries and utilities) on which allcomponents are built Hadoop Distributed File System (HDFS): manage data stored onmultiple machines for very high aggregate bandwidth across the"cluster" of machines MapReduce: programming model to support the high data volumeprocessing of data in the cluster "Yet Another Resource Negotiator" (YARN): platform to managethe cluster's compute resources, de-coupling Hadoop workload andresource management4 2015 IBM Corporation

Typical Hadoop Data Flow1. Load Data into an HDFS cluster Optimize for parallelism and reliability Break the source into large blocks (typically 64MB) so that each block canbe written (and read) independently Each block is written to a node the more nodes the more dispersion ofthe data for future parallel processing Redundant copies of each block are maintained on separate nodes toDataprotect against hardware failuresNodeSourceFile Redundancy for reliability5.ClientNodeName(Meta Data) Configurable, three is the typical default Simplistically, the more unreliable the environment,the higher the degree of redundancy requiredNodeDataNodeDataNode.DataNode 2015 IBM Corporation

Typical Hadoop Data Flow1. Load Data into an HDFS cluster2. Analyze the data in the cluster using a MapReduce "program" Map: Drive analysis of the data Reduce: Construct a result Writing it back into the clusterCClientMMaster3. Use the result!split plit 1split 2split 3split 4split tputFiles 2015 IBM Corporation

Hadoop Cluster Configuration Implications Redundancy drives Network Traffic With three-way redundancy, each terabyte of data results in threeterabytes of network traffic Parallelism drives Performance Scale OUT (more nodes) and/or files OUT (more blocks) Spread blocks of data across more nodes so more blocks can be read in parallel Can spread a file to more nodes if you have enough nodes Network activity spread across many nodes/files Scale UP (more CPUs and/or memory per node rather than more nodes) Increases the density of each node More network activity concentrated on each node7 2015 IBM Corporation

The Landscape is Rapidly EvolvingMore Apache Frameworks and Products You'll Hear About HiveTMApache data warehouse framework accessible using HiveQL SparkIn-memory framework providing an alternative to MapReduce HBaseTM Apache Hadoop database PigHigh level platform for creating long and deep Hadoop source programs Zookeeper Infrastructure & services enabling synchronization across large clusters8 FlumeCollect and integrate data into Hadoop coordinating Web/app services, OozieWorkflow processing connecting multiple types of Hadoop source jobs 2015 IBM Corporation

z/OS Structured Data Integration for Big Data The Big Data Landscape Introduction to Hadoop What, Why, How The IMS community cares about Hadoop because Exponential value at the intersection of Hadoop and Structured Data Starting on the z/OS data integration with Hadoop journey InfoSphere System z Connector for Hadoop Fast, Easy, Low Investment z/OS data delivery to HDFS and Hive Keeping Big Data Current InfoSphere Data Replication Continuous incremental updates9 2015 IBM Corporation

Imagine the possibility of leveraging all of your information assetsPrecise fraud &risk detectionUnderstand andact on customersentimentAccurate andtimely threatdetectionPredict and act onintent to purchaseLow-latencynetwork analysisNew ApproachTraditional ApproachStructured, analytical, logicalData: Rich,historical, private,structured.Customers, tive, holistic thought, intuitionMultimediaTransactionDataWeb LogsInternal AppDataMainframeDataStructuredRepeatableLinearOLTP SystemDataValue in NewSocial DataUnstructured Text Data: ideasemails questions ExploratoryDynamicSensor data: answersimagesERPDataHere’s a question.What’s the answer?Data: Intimate,unstructured.Social, mobile, GPS,web, photos, video,email, logsRFIDTraditionalSourcesNewSourcesHere’s some data.What does it tell me?Transformational benefit comes from integrating"new" data and methods with traditional ones!10 2015 IBM Corporation

This is Driving the Shifting Sands of Enterprise ITNew ways of thinking for transformative economicsTraditional Approach1111New Approach Vertical Infrastructure Distributed data grids Design schemas in advance Evolve schemas on-the-fly What data should I keep? Keep everything, just in case What reports do I need? Test theories, model on-the-fly ETL, down-sample, aggregate Knowledge from raw-data On-premise On-premise, cloud, hybrid 2015 IBM Corporation

Transaction & Log Data Dominate Big Data DeploymentsVery High Proportion of this Data Resides on System zN 465 (multiple responses allowed)Source: Gartner (September, 2013)Gartner research note “Survey Analysis - Big Data Adoption in 2013 Shows Substance Behind the Hype“ Sept 12 2013Analyst(s): Lisa Kart, Nick Heudecker, Frank Buytendijk12 2015 IBM Corporation

Hadoop and System zThe Requirements and The JourneyIMS Technical Symposium 2015

Hadoop Topology Choices for System z DataProcessing done outside z(Extract and move data)Move dataover thenetworkSystem z dataProcessing done on z(Hadoop cluster on zLinux)Extract and consume(z controls the process)IngestSystem z dataMR jobs, resultSystem z dataPetabytes possibleGigabytes to Terabytes reasonableExternal data NOT routed through System z.Additional infrastructure.Rapidly provision new node(s)System z governance for the result set.Challenges with:- scale- governance- ingestion.Near linear scale.System z is the control point.System z is the control point.Data is outside System z control.14 2015 IBM Corporation

The most likely driver for Hadoop on System zValue of exploratoryanalytic models appliedto System z dataRisk associated with numerouscopies of sensitive datadispersed across commodityservers with broad access15 2015 IBM Corporation

Hadoop on System z What makes sense when?Case 1: Hadoop on the Mainframe Data originates mostly on the mainframe (Log files,database extracts) and data security is important Z governance & security models needed Network volume or security concernsDB2VSAMQSAMIMS Moderate volumes – 100 GB to 10s of TBsLinuxSMFLinuxLinuxLinux Hadoop value from rich exploratory analyticsz/VMRMFLogsLinuxIFLIFL (Hybrid Transaction-Analytic appliances for traditionalanalytics)IFLz/OSCP(s)Case 2: Hadoop off the nuxLinux Most data originates off of the mainframe Security less of a concern since data is not"trusted" anyway Very large data sets – 100s of TB to PBs Hadoop is valued for ability to economicallymanage large datasets Desire to leverage lowest cost processingand potentially cloud elasticityz/OSCP(s)1616 2015 IBM Corporation

IBM InfoSphere BigInsights Uniquely Offers Multiple technology options & deployment modelsIntel Servers17IBM PowerIBM System zOn Cloud 2015 IBM Corporation

IBM's InfoSphere z/OS Data Integration with HadoopFrom the Sandbox to the EnterpriseLinux(IFLs)InfoSphereSystem zConnectorfor HadoopLinuxx-basedSystemor PPowerz/OSSimplicity, Speed andSecurityof the z Connector load todeliver a sandbox and jumpstart all of your Big DataprojectsLinux forSystem z1818LinuxSystem xInfoSphereSystem zConnectorfor HadoopLinux on Power,XVSAMQSAMSMF , RMFIMS 2015 IBM Corporation

IBM's InfoSphere z/OS Data Integration with HadoopFrom the Sandbox to the EnterpriseLinux(IFLs)InfoSphereSystem zConnectorfor HadoopLinux forSystem z1919Linuxx-basedSystemor PPowerz/OSOptimized, ChangeOnly, Real-Timedata replicationkeeping your BigData current withlower operationalcostLinuxSystem xInfoSphereDataReplicationInfoSphereClassic CDCDB2IMSCICSLogsVSAMQSAMSMF , RMFIMSInfoSphereSystem zConnectorfor HadoopLinux on Power,X 2015 IBM Corporation

IBM's InfoSphere z/OS Data Integration with HadoopFrom the Sandbox to the EnterpriseLinux(IFLs)InfoSphereSystem zConnectorfor HadoopLinux forSystem z2020InfoSphereInformation Serverfor System z(DataStage)z/OSTransformationRich DataIntegration andGovernancebridging traditionaldata warehousingInfoSpherewith BigData for theDataEnterpriseReplicationInfoSphereClassic CDCDB2IMSCICSLogsVSAMQSAMSMF , RMFIMSLinuxx-basedSystemor PPowerLinuxSystem xInfoSphereInformation Server(DataStage)InfoSphereSystem zConnectorfor HadoopLinux on Power,X 2015 IBM Corporation

z/OS Data Integration with HadoopThe InfoSphere System z Connector for HadoopIMS Technical Symposium 2015

IBM InfoSphere System z Connector for HadoopSetup in Hours, Generated Basic Transforms, Interactive or ScheduledEXTRACTMOVE & CONVERTANALYZENo SQL, No COBOLAutomatic TransformsLeverage HadoopIMS DatabasesDB2 for z/OSVSAM FilesSequential Files Syslog Operlog SMF/RMFz/OS22HipersocketOSAInsert into HDFSLinux (System z, Power or System x)Click and Copy 2015 IBM Corporation

IBM InfoSphere System z Connector for Hadoop Leverage z data with Hadoopon your platform of choiceSystem z Mainframe IBM System z forsecurityLinux for System zz/OSDB2VSAMSMF Power SystemsSystem zConnectorForHadoopMapReduce, Hbase, HiveHDFS Intel Servers Point and click or batch selfservice data accessIMS Lower cost processing &storageLogsz/VMSystem zCP(s)ConnectorIFLIFL IFLFor Hadoop23 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopKey Product Features Supports multiple Hadoop distributions – on and off the mainframe Multiple source formats: IMS, DB2, VSAM/QSAM, Log files HiperSockets and 10 GbE Data Transfer Drag-and-drop interface – no programming required Multiple destinations: Hadoop, Linux File Systems, Streaming endpoints Define multiple data transfer profiles Streaming interface – filter and translate data & columns on the fly, on the target Secure pipe – RACF integration Preserves metadata, landing mainframe data in Hive tables24 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopArchitectural viewLinux Server (zVM guest) or distributed Linux (Intel or SSL)Data PlatformData AnalysisData PluginData VisualizationvHubTomcatIMSServletLogs Client Applet25 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopArchitectural viewLinux Server (zVM guest) or distributed Linux (Intel or SSL)Data PlatformData AnalysisData PluginData VisualizationvHubTomcatIMSUSS ComponentsLogs26 Trigger DB2 unload or highperformance unload, streamingbinary result to the target noz/OS DASD involved Read other sources (e.g. uses IMS'sJDBC interface) streaming binaryresults to the targetNote: Log file transfers pick up fromthe date/time stamp of the previoustransfer automating target updatesServlet Client Applet 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopArchitectural viewz/OSLinux Server (zVM guest) or distributed Linux (Intel or Data PlatformData AnalysisData PluginData VisualizationvHubTomcatIMSLogsServletLinux Components Apply row-level filtering Reformat the data Write to the target HDFS/HiveClient Applet Write meta data in Hive27Demo Screens 2015 IBM Corporation

InfoSphere System z Connector for Hadoop Summary A secure pipe for data– RACF integration – standard credentials– Data streamed over secure channelusing hardware crypto A Rapid Deployment Integrating z/OS data in a few hours Easy to use ingestion engine– Light-weight; no programming required– Native data collectors accessed via agraphical user interface– Wide variety of data sources supported– Conversions handled automatically– Streaming technology does not loadz/OS engines nor require DASD forstagingBest Use Cases HDFS/Hive sandbox for initial deployments explore your data Easy to setup Hours, not Days! Operational Analytics using z/OS log data (SMF, RMF, ) Exploring operational data using Hadoop on day one! Moderate volumes (100s of GBs to 10s of TBs) of transactional data Source of the data is z/OS Security may be a primary concern28 2015 IBM Corporation

z/OS Data Integration with HadoopKeeping your Hadoop Data CurrentIMS Technical Symposium 2015

IBM’s InfoSphere Data Replication (IIDR) CoverageDB2 (z/OS, i, LUW)DB2 z/OSInformixOracleDB2 z/OSMS SQL ServerDB2 (i, LUW)InformixOracleMS SQL ServerSybaseSybasePD for A (Netezza)TeradataIMSInformation ServerIMSVSAMHDFS/HiveMessage QueuesFilesESB, Cognos Now, FlexRep (JDBC targets)Customized ApplyMySQL, GreenPlum, VSAM3030 2015 IBM Corporation

IMS to HadoopRead logs - Send committed changes - Apply changesWHAT: IMS loggingManagementACTION: IMS replication logs developedConsolefor IMS v10 and higherspecifically to support IMS toIMS and IMS to non-IMS dataAccessreplicationClassic ataIMSClassicServerIMSLogsRECONLogRead/MergeAdmin AgentAdmin. ServicesUORCaptureSOURCE SERVERIBM InfoSphere Classic CDC31ServerMetaDataAdmin APITCP/IPComm LayerTarget EngineApply AgentTARGETIBM InfoSphere Data Replication 2015 IBM Corporation

IMS to Non-IMS Data ReplicationRead logs - Send committed changes - Apply changesClassic DataArchitectSourceIMSDBsACBLIBIMSManagementWHAT: IMS Log ReaderConsoleIMS log reader capable of capturing changes ogsRECONLogRead/MergeAdmin AgentAdmin. ServicesUORCaptureSOURCE SERVERIBM InfoSphere Classic CDC32BOTH local and remote logs. It ensures propersequencing of committed changes for a singleAccesslocal IMS instance or for multiple logs in anServerIMSPLEXIMPACT: One centralized log reader instance replaces themultiple instances required by Classic DEP v9.5,simplifying deployment and reducing overhead.MetaDataAdmin APITCP/IPComm LayerTarget EngineApply AgentTARGETIBM InfoSphere Data Replication 2015 IBM Corporation

IMS to Non-IMS Data ReplicationRead logs - Send committed changes - Apply changesClassic SLogsRECONLogRead/MergeUORCaptureAdmin APITCP/IPMetaDataComm LayerWHAT: Capture EngineTarget EngineApply AgentACTION: Maintain conversations with the Target ServersSOURCE SERVERIBM InfoSphere Classic CDC33Admin AgentAdmin. Services Manage transaction boundariesTARGET Buffer multiple changes for performanceIBM InfoSphere Data Replication 2015 IBM Corporation

InfoSphere Data Replication for IMS for z/OSDetails of IMS Source CaptureWHAT:Classic routine associated withIMS’s Partner Program Exit.ACTION: Notify Classic Server when anIMS system starts.IMSTM / DB*IMSDatabasesStart NotificationExit RoutineIMSLogsTCP/IP NotificationBATCHDL/IBatch Start-StopExit RoutineLog ReaderServiceTCP/IP NotificationDBRCAPIWHAT:Classic routine associated withIMS’s Log Exit.Log InfoACTION: Notify Classic Serverwhen aBatch DL/I job starts or ngCapture ServicesSOURCE SERVER* includes BMP and DBCTL 2015 IBM Corporation

IMS to Non-IMS Data ReplicationRead logs - Send committed changes - Apply changesClassic ataIMSClassicServerIMSLogsRECONAdmin AgentAdmin. ServicesLogRead/MergeUORCaptureSOURCE SERVERIBM InfoSphere Classic CDC35WHAT: Classic ServerManagementACTION: Latest version of Classic Server used by all IBM InfoSphereConsoleClassic products. Reformat IMS data into a relational format Converse with IIDR target enginesAccess Read source IMS data for full refreshServerIMPACT: Same handling of IMS data as provided in Classic DEPAdmin APITCP/IPMetaDataComm LayerTarget EngineApply AgentTARGETIBM InfoSphere Data Replication(IIDR) 2015 IBM Corporation

IMS to Non-IMS Data ReplicationRead logs - Send committed changes - Apply changesWHAT: IIDR Target ServerACTION: Apply changes to the target(s) while maintaining restart information in alocal bookmarkClassicfor recoveryDatapurposesManagement Additional nsole Integration with many other InfoSphere solutions Available for z/OS, Linux on System z, Linux, Unix, and WindowsAccessSourceIMPACT: One target server for tens of targets regardless of the cServerIMSLogsRECONAdmin AgentAdmin. ServicesLogRead/MergeUORCaptureSOURCE SERVERIBM InfoSphere Classic CDC36ServerAdmin APITCP/IPMetaDataComm LayerTarget EngineApply AgentTARGETIBM InfoSphere Data Replication(IIDR) 2015 IBM Corporation

Thank you!37 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopPoint and click access to z/OSresident data sources withautomated transfer to IBMBigInsights and third party Hadoopclusters.38 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopExtensive browserbased help is built-in39 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopMultiple users can be providedwith access to the web-basedSystem z Connector datatransfer tool40 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopDirectly access data onMainframe DASD41 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopDirectly access System z logfiles including SMF, RMF,Syslog and the Operator logs42 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopA step-by-step wizard guidesusers through the process ofsetting up connections to z/OSdata sources (DB2 shown here)43 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopA similar wizard is used toconfigure Hadoop-based targets44 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopJCL parameters are enteredthrough the web-based GUI45 2015 IBM Corporation

IBM InfoSphere System z Connector for HadoopBrowse the contents ofmainframe data sources fromwithin the System z Connectorfor Hadoop interface46 2015 IBM Corporation

GO BackIBM InfoSphere System z Connector for Hadoop47Copy source data to HDFS, the LinuxFile System, HIVE or stream datadirectly to a receiving port on a Linuxsystem or virtual machine in yourchosen format 2015 IBM Corporation

When it comes to realizing time to value:Commercial Framework Differences MatterCapabilitiesSoftware tooling to build higher quality, moremaintainable applications quickly and costefficientlyInfrastructureDeploy on reliable, cost-efficient infrastructurethat matches quality-of-service requirements48 2015 IBM Corporation

IBM's InfoSphere z/OS Data Integration with Hadoop From the Sandbox to the Enterprise Linux (IFLs) z/OS Linux System P Linux System x InfoSphere System z Connector for Hadoop Linux for System z InfoSphere System z Connector for Hadoop Linux on Power, X InfoSphere Data Replication InfoSphere Classic CDC DB2 IMS CICS Logs 19 Optimized, Change .