Reverse Engineering In Data Integration Software

Transcription

Database Systems Journal vol. IV, no. 1/201311Reverse Engineering in Data Integration SoftwareVlad DIACONITAThe Bucharest Academy of Economic Studiesdiaconita.vlad@ie.ase.roIntegrated applications are complex solutions that help build better consolidated andstandardized systems from existing (usually transactional) systems.Integrated applications are complex solutions, whose complexity are determined by theeconomic processes they implement, the amount of data employed (millions of recordsgrouped in hundreds of tables, databases, hundreds of GB) and the number of users [11].Oracle, once mainly known for his database and e-business solutions has been constantlyexpanding its product portfolio, providing solutions for SOA, BPA, Warehousing, Big Dataand Cloud Computing. In this article I will review the facilities and the power of using adedicated integration tool in an environment with multiple data sources and a target datamart.Keywords: ODI, reverse engineering, SOA, data mart1systems cannot be overload with theadditional queries required by the businessintelligence needs. Integrating data frommultiple, usually heterogeneous sources, isa problem addressed by ODI.Linking SOA with ODI and DataWarehousing (DW) can be beneficial.From a business perspective the key wordin dealing with SOA is definitelyflexibility. Companies must be able to keeppace with the rapid changing conditions ofthe business environment. In the same timethe trend in IT architectures leads towardan integrated model by building businessprocesses that span multiple operationalsystems and by enabling interoperabilitybetween legacy systems and newlydeveloped systems [3].As shown in [4] SOA has the merit tointroduce a new kind of technological“democracy” where the applicationsystems are considered a federation per se,thus opening the doors to a new kind oflogical distributed computing approachwhere the technological platforms aredowngraded to the implementation orphysical level.Most SOA efforts have centered ontransaction systems, but data warehousingcan benefit from SOA with the ability tojoin various actions (services) fromdifferent areas of the DW to createData integration softwareThere are many software packages thatcan aid to system integration. One ofthem is the Oracle Data Integrator (ODI)that has its roots in the acquisition ofSunopsis by Oracle in 2006. The primaryuse of ODI is to move and transform datafrom one place to another so there aremany other Oracle products that arebenefiting and thus creating the need ofusing this tool in many organizations.Like shown in [7] data Warehouses anddata marts (a subset of the data warehousethat is usually oriented to a specificdepartment) are data-intensive systems thatare used for analytical tasks in businessessuch as analyzing sales/profits statistics,cost/benefit relation statistics, customerpreferences statistics, etc. The idea of adata warehouse is to extract data fromoperational databases and to store themseparately. The justification for thisapproach is that OLAP largely deals withcondensed data, thus does not depend onthe latest updates by transactions.Furthermore, OLAP requires only readaccess to the data, so the separation of thedata for OLAP from OLTP allows timeconsuming transaction management to bedispensed with.The need for data warehouse usuallycomes from the fact that the operational11

12composite applications or commonservices. The services that are part of adata warehouse such data extraction,transformation, loading, querying orupdating should be part of the SOA fromthe start. This should make morecomprehensivebusinessintelligencepossible, and could assist in thedevelopment of fully integrated SOA [2].The usage of SOA makes data location oflittle importance to its users so using it inconjunction with ODI processes andtransformations makes lot of sense. SOAcan enable an abstract layer that makesdata available inside an enterprise usinghomogeneous services.ODI can also have a role in normalizingdata definition so different applicationsreference the same data. It can also have arole in processing Big Data by delegatingand distributing processing.2. ETL and ELTAs an alternative to ETL, ODI proposesELT which extracts data from the source,loads it into the target and processes thereby using SQL the needed transformations.This approach exploits database optimizersas opposed to transformation that isperformed in-flight or requiring a separateintermediary. The ELT approach directlyimpacts performance and has proven tomake data loading fast, efficient andincredibly reliable [9].An ELT has the ability to manage a stagingarea, generate code and executeinstructions on target systems but also onsource systems, such systems that arebeing managed by any DBMS.The components of ODI architecture arethe repository, the studio, the graphicalinterface of the software, the agents andthe console. In SOA, repositories are usedto manage services and support servicediscovery at runtime. Usually, there is aprocess needs a data access service (DAS)to execute a query usually against adatabase. DAS are variations of theordinary service concept; they are moredata-intensive and designed to expose dataReverse Engineering in Data Integration Softwareas a service [5]. In an ELT solution thedata is not stored in the repository; it ismoved directly to the target. The repositoryis usually stored in a schema of an existingdatabase and is composed of the masterrepository for the sensitive data and thework repository for the data needed by thedevelopers. In a production environmentan execution repository is also present. Itstores only the operational metadata. Theexchange of data can be done throughversioning or by importing or exportingXML files. There are two types of agents:the standalone agent that can be installedon any platform and the JEE agent thatruns on a Weblogic server. Usually theJEE agents has the role to distributeexecution requests and balance load acrossdifferent other agents which are usuallystandalone agents. A strategy using onlystandalone agents is also possible.The key elements of ODI are ExecutionContexts, Knowledge Modules, Models,Interfaces, Packages and Load Plans.For assuring independence from thephysical location of the data, logicalschemas can be used, at execution this aretranslated into physical ones so themaintenance of the connection parameters,location of the databases, and schemanames is entirely independent of the codeitself.Metadata can be imported usingknowledge modules from applications,where objects usually are representation ofthe data or from databases using models.After it’s imported, metadata can beenhanced in ODI (for example by addingconstraints). Also new metadata can becreated.Another key element in ODI is theinterfaces where the transformations arebuilt. An interface contains among othersdescription, mappings and flows.Packages put together elements such asinterfaces, variables and procedures. Theyare compiled into scenarios whichexecution can be organized with loadplans.

Database Systems Journal vol. IV, no. 1/201313ERP systems. Different other elements,such as structural integrity data can beadded to the captured metadata.Like shown in [6], the ReverseEngineering Knowledge Modules role isto perform customized reverse engineeringfor a model. It connects to the applicationor metadata provider then transforming andwriting the resulting metadata into OracleData Integrator's repository. The metadataiswrittentemporarilyintotheSNP REV xx tables. The RKM then callsthe Oracle Data Integrator API to readfrom these tables and write to Oracle DataIntegrator's metadata tables of the workrepository in incremental update mode(figure 1).3. Reverse-engineering the modelmetadataLike shown in [1], to use ODI, first wedeclare a new data server in the ODIPhysical Architecture and then a referenceto a Physical Schema located on that serverthat holds the business data. We can alsoconstruct a work schema for every physicalschema to store temporary data.A Physical Schema definition needs to beassociated with a Logical Schema namewhich will be exclusively used to buildODI models. The models are abstracted,independent of the data source but seemhomogeneous to the developer. Usuallythey are built by reverse-engineering thestructural data, that coming fromdatabases, flat files, XML files or differentFig. 1. RKMAs shown in [7], Oracle Data Integratorimplements five different types of KMs.Each of them covers one phase in thetransformation process from source totarget. The three most important types ofmodules are the integration knowledgemodule (IKM), the loading knowledgemodule (LKM), and the check knowledgemodule (CKM). As explained in [1] whena Knowledge Module has been previouslyimported into the parent project andapplied to the interface target and theinterface is subsequently executed, it is thesteps within the IKM that determine thewhat, how, and when data is moved intothe target data store. LKMs load data intothe staging area from other servers. If thesource data is in the same server as thestaging area then LKM is not needed.CKMs are used to check and enforce dataintegrity through testing conformance toconstraints and references, either staticallyon data tables on source or target systems,or dynamically during the process of a dataflow defined in an ODI interface.To reverse engineer, after defining thetopology, we create the model like shownin figure 2.13

Reverse Engineering in Data Integration Software14Fig. 2. Create the modelAfter configured the ODI representationsof our data objects an interface can be builtto move and transform the data into thedata mart, like shown in figure 3.We canuse the Automatic Mapping and also domanual mappings and transformationsusing SQL code and apply sometimestamps useful for audit purposes. Thetransformed data will be loaded from anOracle source into a target Oracle datamart.Fig. 3. Transform flowWe can add some additional complexity byintroducingjoinsandlookups,heterogeneous data sources and dataaggregation. And we can do this withmultiple source databases usually usingdifferent JDBC connectors (figure 4).

Database Systems Journal vol. IV, no. 1/201315Fig. 4. MySql JDBC DriverIn the example shown in figure 5 we jointhree sources (actually we have 2 instancesof the same data source table) and linkthem to the target data mart.Fig. 5. Join transform flowLike shown in figure 6, an interface can bebuilt to move data using JDBC from athird-party DBMS to an Oracle data mart.15

Reverse Engineering in Data Integration Software16Fig. 6. Lookup transform flowAs shown in [10] eXtensible MarkupLanguage (XML) is a platformindependent format for representing dataand was designed as a standard forinformation exchange over the Internet.XML enables easy exchange ofinformation, which allows interoperabilitybetween applications due to dataencapsulation with metadata. To use XMLinside ODI a JDBC driver is requiredwhich is available out-of-the-box (figure7).Fig. 7. XML JDBCLet’s look at the following XML file: CLIENT ID CLIENT 10 /ID CLIENT FIRST NAME Jan / FIRST NAME LAST NAME Roberts / LAST NAME CREDIT 600 / CREDIT EMAIL Ishwarya.Roberts@LAPWING.COM /EMAIL DATE OF BIRTH 21-MAR-44 /DATE OF BIRTH CIVIL STATUS single / CIVIL STATUS SEX F /SEX INCOME G: 130,000 - 149,999 /INCOME /CLIENT This would reverse-engineer to a tablecalled CLIENT that has the followingcolumns: ID CLIENT, FIRST NAME,LAST NAME etc. The mapping is shownin figure 8.

Database Systems Journal vol. IV, no. 1/201317Fig. 8. XML-Relational s/1065308-1.html[3] Alexandra Florea, Anca Andreescu,Vlad Diaconita, Adina Uta, “ApproachesRegarding Business Logic Modeling inServiceOrientedArchitecture”,Informatica Economică vol. 15, no. 3/2011[4] Cătălin Strîmbei, “Smart Data WebServices”, Informatica Economică vol. 16,no. 4/2012[5] M. Turner, D. Budgen, P. Brereton,“Turning software into a service”,Computer, 36 (2003), pp. 38–44[6]http://docs.oracle.com/cd/E15586 01/integrate.1111/e12645/intro.htm[7] Uli Bethke, Developing a KnowledgeModule in Oracle Data Integrator, 2009[8] Jane Zhao, Hui Ma, “ASM-baseddesign of data warehouses and on-lineanalytical processing systems”, Journal ofSystems and Software, Volume 79, Issue 5,May 2006, Pages 5852.pdf[10] Iuliana Botha, “Managing XML Datato optimize Performance into Object-ConclusionsData integration is very often a necessity inbigger projects. Using an integrated toolcan be much more powerful and useful inprojects that imply using data fromheterogeneous sources, targets, andapplications. Such products provide greataid in integrating databases, ERPs, CRMs,B2B systems, flat files, XML data, LDAP,JDBC or ODBC. It also can help in cuttinghardwarecoststhroughimprovedutilization and high-performance dataintegration. Using external services fordata integration and by deploying dataservices and transformation services thatcan be integrated within an SOAinfrastructureAlso,SOAbusinessprocesses can assign large data operationsto Oracle Data Integrator by using webservices.References[1] Peter C. Boyd-Bowman, ChristopheDupupet, Denis Gray, David Hecksel,Julien Testut, Bernard Wheeler, GettingStarted with Oracle Data Integrator 11g:A Hands-On Tutorial, May 201217

18Relational Databases”, Database SystemsJournal vol. II, no. 2/2011Reverse Engineering in Data Integration Software[11] Vlad Diaconita, “Hybrid Solution forIntegratedTrading”,InformaticaEconomică vol. 14, no. 2/2010Vlad DIACONIŢA is a lecturer at the Department of Economic Informaticsand Cybernetics, within the Faculty of Economic Cybernetics, Statistics fromthe Bucharest Academy of Economic Studies. He has graduated the faculty atwhich he is now teaching in 2005 and since 2010 holds a PhD in the field ofCybernetics and Statistics. He is the co-author of 3 books i

Keywords: ODI, reverse engineering, SOA, data mart Data integration software There are many software packages that can aid to system integration. One of them is the Oracle Data Integrator (ODI) that has its roots in the acquisition of Sunopsis by Oracle in 2006. The primary use of ODI is to move and transform data from one place to another so there are many other Oracle products that are .