Criteria For Evaluating General Database Migration Tools

Transcription

Research ReportOccasional PaperCriteria for Evaluating GeneralDatabase Migration ToolsBin Wei and Tennyson X. ChenOctober 2012RTI Press

About the AuthorsBin Wei, MS, is a senior researcher atthe Pacific Islands Fisheries ScienceCenter, a National Oceanic andAtmospheric Administration (NOAA)research branch within the JointInstitute for Marine and AtmosphericResearch, a NOAA CooperativeInstitute at the University of Hawaiiat Manoa.* His areas of expertiseinclude designing systems forremote data collection and datatransformation.Tennyson X. Chen, MS, is a seniorresearch analyst and softwaresystem architect in RTI International’sResearch Computing Division. Hismain focus is the National Surveyof Drug Use and Health (NSDUH)project, for which he is a key systemdesigner and database manager.RTI Press publication OP-0009-1210This PDF document was made available from www.rti.org as a public serviceof RTI International. More information about RTI Press can be found athttp://www.rti.org/rtipress.RTI International is an independent, nonprofit research organization dedicatedto improving the human condition by turning knowledge into practice. TheRTI Press mission is to disseminate information about RTI research, analytictools, and technical expertise to a national and international audience. RTI Presspublications are peer-reviewed by at least two independent substantive expertsand one or more Press editors.Suggested CitationWei, B., and Chen, T. X. (2012). Criteria for evaluating general database migrationtools. RTI Press publication No. OP-0009-1210. Research Triangle Park, NC: RTIPress. Retrieved from http://www.rti.org/rtipress.* Although Mr. Wei is employed by the NOAAJoint Institute for Marine and AtmosphericResearch, this paper was producedindependently and is not a work product ofNOAA.This publication is part of theRTI Research Report series.Occasional Papers are scholarlyessays on policy, methods, or othertopics relevant to RTI areas ofresearch or technical focus.RTI International3040 Cornwallis RoadPO Box 12194Research Triangle Park, NC27709-2194 USATel:Fax:E-mail:Web site: 1.919.541.6000 1.919.541.5985rtipress@rti.orgwww.rti.org 2012 Research Triangle Institute. RTI International is a trade name of Research TriangleInstitute.All rights reserved. This report is protected by copyright. Credit must be provided to the authorand source of the document when the content is quoted. Neither the document nor partial orentire reproductions may be sold without prior written permission from the op.0009.1210www.rti.org/rtipress

Criteria for Evaluating General DatabaseMigration ToolsBin Wei and Tennyson X. ChenAbstractSoftware updates often involve data migration, especially when convertinglegacy software implemented to interface with outdated relational databasemanagement systems or other nonrelational database electronic files. Moreover,many software applications rely on data migration to import data from avariety of platforms. Usually, database migrations are time consuming and errorprone. Many independent general data migration products are available, butwhat features does such a tool need to provide to be effective? Based on ourexperience designing and implementing custom utilities to convert a largenumber of legacy databases and files in different platforms, we developed fivecriteria that need to be considered when evaluating a data migration tool (DMT).These criteria can help users and software development project managers makeinformed decisions in data conversion tasks, help software developers assessdesign and implementation considerations for future DMT products, and provideguidelines for database administrators to evaluate a general DMT.ContentsIntroduction2Evaluating DatabaseMigration Tools3Types of Databases the DMTSupports3User Interface Configurability,Maintainability, andReusability3Support for Data IntegrityChecking4Customization Adaptability ofthe DMT5Data Correctness Verification6Comparing Data MigrationTools7Future Work8Conclusion8References9Acknowledgments Inside back cover

2Wei and Chen, 2012IntroductionOne of the largest problems organizations face isensuring quality database administration.1 As manylegacy software applications age, programmers areupgrading them to the most up-to-date technologies.Because many of these applications interface with olddatabase platforms, one of the most important issuesin a software upgrading effort is data conversion,that is, transferring the data from the old systemsto new relational database management system(RDBMS) platforms or converting data betweenRDBMS platforms. Converting data from onedatabase platform to another, whether it has the sameor a different data structure or format, is called datamigration.Data migration is a specific implementation of abroader concept called data extract, transform, andload (ETL). A properly designed ETL system extractsdata from the source systems, enforces data qualityand consistency standards, conforms data so thatseparate sources can be used together, and deliversthe data in a presentation-ready format.2 The criteriawe discuss in this paper contribute to the body ofknowledge and experience of ETL.According to an industry analyst’s estimate in 1999,3at any given time, roughly two-thirds of the Fortune1000/Global 2000 companies are engaged in someform of data conversion project, including migratingdata from legacy systems to packaged applications,consolidating data, improving data quality, orcreating data warehouses and data marts. Althoughthis estimate is somewhat outdated, we believethat data conversion is still a very common taskamong today’s database administrators. In addition,businesses may face situations in which their successrelies on data migration, such as transferring datain different platforms into a consolidated databaseduring a company merger or moving data betweencontractors’ databases when a federal contractchanges contractors.Unfortunately, the cost and difficulties of a dataconversion effort are almost always underestimated.Problems with data can result from missinginformation and mismatches between the data inthe original platform (which is often less clearlydocumented) and the new platform (which is usuallymore explicitly documented).4RTI PressMost RDBMS, such as Oracle, My SQL, and SQLServer, have their own database transferring utilitiesfor some database migration tasks. However, thesetransferring utilities are often based on two importantassumptions. First, users must assume the contentof the source data is correct, especially with respectto domain integrity, entity integrity, and referentialintegrity, which are discussed in the section Supportfor Data Integrity Checking. If the source data haveproblems, as most source data do, data transfersusing the built-in utilities will often fail. Second,these RDBMS utilities often assume that the data inthe destination database have a structure identical tothat of the source database. If database administratorswant to modify the structure in the destinationdatabase to accommodate requirement changes, theyhave to change the structure of the source data aswell. In reality, it is rare for a data migration project tomove data from one system to another with identicaldata structure. Consequently, database administratorsoften must fix the source data manually or by writinga customized program. Customized programmingincreases the cost of a data migration project andcould be a source of errors as well.These limitations mean that the RDBMS built-indata transfer utilities are difficult to use in large-scaledata migration projects. Quite a few third-partyproducts have been developed and are commerciallyavailable—such as Altova MapForce 2012,5 AsteraCenterprise Data Integrator 5.0,6 DBConvert datamigration products,7 and SwisSQL Data MigrationTool 6.58—each with its own strengths andlimitations.In a project for the US National Oceanic andAtmospheric Administration (NOAA), weimplemented a robust database migration tool (DMT)to transform about 500 FoxPro flat tables and otherexternal files into Oracle and MySQL. The data inthese tables and files dated as far back as 1948. ThisDMT is also currently used to convert about 100 flattables per month from different sources on a flowbasis. We designed this DMT to allow data to bemigrated from a large variety of platforms into themost current RDBMS platforms. For this project, weconsidered the possibility of purchasing an existingDMT product or developing a customized tool orutility. We thoroughly researched user manuals and

Criteria for Evaluating General Database Migration Toolsexperimented with trial versions of many commercialDMT products. For each product, we examinedseveral fundamental questions, such as whetherthe tool would be suitable for the task at hand or itwould be more cost-effective to build a custom tool.After in-depth evaluation, we concluded that noneof the products that we had investigated could meetour requirements without significant customizedprogramming. Therefore, we decided to build a newDMT tool to perform our task. As a result of thisexperience, we developed a set of five criteria that areuseful in evaluating a DMT: types of databases the DMT supports; user interface configurability, maintainability, andreusability; support for data integrity checking; customization adaptability of the DMT; and data correctness verification.In the following section, we discuss these criteria indetail to help users and software development projectmanagers evaluate general DMTs and make informeddecisions when facing data conversion tasks. Softwaredevelopers might also use these criteria as designand implementation considerations for future DMTproducts. We believe these criteria can benefit otherdata migration projects and can evolve into bestpractice guidelines in DMT product evaluation anddevelopment.Evaluating Database Migration ToolsThe complexity of database migration varies fromproject to project. Therefore, depending on the dataconversion task, database administrators may beinterested in different features of a DMT, and theymay not consider each criterion with equal weight.We offer the following criteria as a general guidelineto help users decide whether a DMT product is viablebased on their particular needs.Types of Databases the DMT SupportsThe first criterion relates to whether the DMTsupports the source and destination data platformsinvolved in the desired data conversion. Althoughmost of the current DMT products on the market,such as Astera and SwisSQL, support the most3popular databases, such as Oracle, SQL Server, DB2,Sybase, MySQL, and MS Access, users cannot assumethat all DMT products have the same functionality. Asa matter of fact, in our NOAA project, certain sourcedatabase types were not supported by any of the toolsmentioned above. Also, because today’s computerapplications are increasingly likely to interface withmultiple operating systems and database platforms,users may find it necessary to evaluate whether aDMT supports databases on mobile devices such assmartphones and tablets. In addition, they may needto consider other software packages such as SAS if thedata conversion task involves interfacing with thesemobile devices or software packages and the dataformats they use.If a DMT does not support the database platformsthe user is working on, the second question is howeasily the DMT can be adapted to expand its supportto the desired database platforms. The evaluationquestion is if a new database platform can be addedand how much effort this addition will be involved. Ifthis problem can be resolved without significant extracosts, the DMT passes this first test.In some special situations, like our recent project forNOAA that required converting data from DBase,FoxPro, and data files dated back to the 1940s,none of the DMT products that we evaluated wereable to successfully migrate all the data. This is oneof the main reasons behind the decision to builda customized DMT. In fact, in our DMT design,because of the need to read data from so many olddatabase platforms, we focused on the diversity ofthe databases that the tool supports. As a result, thetool we developed is very robust in migrating databetween a variety of electronic platforms. If we needto add a new platform as a source or destinationdatabase, we only need to add the correspondingdatabase driver to the application package and do notneed to change the migration application itself. Thisapproach is unique, compared with commercial DMTproducts currently available.User Interface Configurability,Maintainability, and ReusabilityData migration involves a certain level of risk. Ifnot executed carefully and correctly, the processmay erase part or all of the data in either the source

4RTI PressWei and Chen, 2012or the destination database. For example, if a datamigration interface allows user to mistakenlyswitch the specification of source and destinationdatabase, the source data may be overwritten, hencedestroyed, by the data in the destination database.This consideration brings us to the second evaluationcriterion: the user interface of the DMT product.A good DMT needs to provide an interface thatallows users to easily and unambiguously specify adata transfer configuration, such as the source anddestination databases, tables and columns, and rangesof the data to be included in the transfer.Moreover, although most of the popular products,such as Astera, DBConvert, and SwisSQL, providecommon capabilities in their interfaces for users toinput the source and destination databases or fileinformation, more advanced levels of customizationshould be considered as well.Does the DMT allow users to do the following? Select specific tables and columns to transfer. Add, change, or remove column name, type, orother properties when transferring. Add, change, or remove constraints like primarykey, foreign key reference, and other properties. Add, change, or remove viewers, functions, orother utilities in the destination database whentransferring data.Another significant factor to keep in mind is thatdata migration is usually not a one-time process. Fora large data transfer, users usually will need to set upexperimental environments to test the procedure andcorrect potential errors before officially running theprocess to transfer real data. Once a configurationworks correctly in experimental trials, the DMTshould be able to use that configuration to run thereal data transfer. In addition, data migration can bean ongoing process in which users are required to runand convert new data with that same configurationinto production databases on a continuous basis.Therefore, maintaining and reusing a configurationare important issues.Most DMT products, like the ones mentionedpreviously, use graphical user interface (GUI) wizardsto steer users through the data-transferring process.They provide the interfaces for users to enter sourceand destination database platform information andselect tables and columns, for example. Althoughthis method works well most of the time for onetime operations, our experience indicates thata configuration defined in Extensible MarkupLanguage (XML) specification will work better interms of maintaining and reusing a configuration.XML is common knowledge among computingprofessionals, is easily understood among differentusers, and is easy to modify. A user can modify itin a text editor without going through the entireGUI. Another benefit of using XML specificationis that the XML file itself documents how the datatransfers are performed. This document can helpusers trace problems if errors occur during or aftera transfer. The GUI method does not have thisprovision. Furthermore, a DMT that works withan XML configuration can be scheduled to run atcertain times of day without human intervention—another advantage of XML over GUI interfaces whenundertaking an ongoing and repetitive data migrationtask.One hybrid approach is to take advantage of thestrengths of both GUI and XML. Having the capacityto save a GUI configuration into XML format thatallows future modification provides flexibility. Userswho need to run the same or similar data transfermultiple times during the course of a project shouldcheck whether the DMT has the capability to handleXML configuration specifications. This feature canbe a real benefit in large, ongoing, and complex datamigration tasks.Support for Data Integrity CheckingThe third criterion relates to database integrity andits impact on the data migration process. Quite often,a database transfer attempt fails because the data failto comply with the database integrity constraints inthe destination database. Understanding databaseintegrity and how it can affect data migration iscritical as well as beneficial.In essence, three types of integrity constraints areimportant components of a relational data model:Entity integrity: Every table must have a primary key,and the column or columns chosen to be the primarykey should be unique and not null.9

Criteria for Evaluating General Database Migration ToolsReferential integrity: No record with a foreignkey referencing column is allowed to exist if thecorresponding referenced record does not exist.10Domain integrity: Domain integrity controls thevalidation, such as the type of data and the range ofthe data, of values for a column.11These database integrities play an important role inpreserving the accuracy and consistency of the datain a database. However, these constraints require acertain level of agreement between the source anddestination databases when attempting to transferdata between them. If the source data do not meet theintegrity requirements in the destination database, thedata transfer will not succeed.Many DMT products, especially those like SQLServer and Oracle utilities that are built into theRDBMS, proceed with the data transfer withoutchecking database integrity. Consequently, the datatransfer fails and the operation is halfway completed.The incomplete process causes additional hours ofclean-up work. Usually, undoing a failed databasetransferring operation on a large scale is risky andtime-consuming.5Database integrity should be checked at two levels.The first level is with respect to domain integrity. Foreach column, users need to verify that the source datacan be correctly situated in the destination database.Two simple examples of issues are as follows: For a numeric column in the destination database,do the source data contain only numeric values inthe corresponding column? For a character column in the destination database,does the length requirement of the correspondingcolumn in the source database match?The second level of database integrity checkingis more complex. Users need to check the entityintegrity and referential integrity among tables. Twoexamples are as follows: When the destination database is empty, the DMTshould obtain the primary key and foreign keydefinitions in the destination database and verifythat the data in the source database tables complywith the defined constraint requirements. When the destination database is not empty, theDMT first needs to perform the above check tovalidate the source data. Then the DMT needsto check whether the data in the source databasepresent primary key conflicts with the existing datain the destination database.A DMT tool that does not offer integrity checkingprior to a database transfer can introduce anotherpotential error, in which data may be lost after thetransfer, even when the migration appears to be“successful.” For example, suppose all character fieldsof a table in the destination database are specifiedwith a certain length limitation. If the sourcedata designated to be migrated into a characterfield contain strings that exceed this field’s lengthlimitation, a DMT tool may just truncate the stringsto make them suitable for the destinati

data transfer utilities are difficult to use in large-scale data migration projects. Quite a few third-party products have been developed and are commercially available—such as Altova MapForce 2012, 5. Astera Centerprise Data Integrator 5.0, 6. DBConvert data migration products, 7. and SwisSQL Data Migration Tool 6.5. 8 —each with its own .Cited by: 1Publish Year: 2012Author