Teradata To Snowflake Migration Guide - Apptad

Transcription

TECHNICAL GUIDETeradata to SnowflakeMigration GuideDON’T LET YOUR PAST DETERMINE YOUR FUTURE

What’s inside:3Why migrate?5Strategy—Thinking about your migration7Migrating your existing Teradata warehouse11Need help migrating?12Appendix A—Migration tools13Appendix B—Data type conversion table14Appendix C—SQL considerations

Why migrate?Decades ago, Teradata identified the needto manage and analyze large volumes ofdata. But just as the volume, velocity andvariety of data has since changed, thecloud has enabled what’s possible todaywith modern data analytics. For example,by separating compute from storage,Snowflake has developed a moderncloud data warehouse solution thatautomatically and instantly scales in a waynot possible with Teradata, whether thecurrent Teradata system is on-premisesor hosted in the cloud, Snowflakeaccomplishes this with its multi-cluster,shared data architecture.YOUR MOTIVATION TO MIGRATESome of the key reasons enterprisesmigrate off of Teradata:1. Legacy platformTraditional technology fails to meet theneeds of today’s business users, such asthe increasing requirement for unlimitedconcurrency and performance.2. CloudA strategy to move from on-premiseto cloud implies a move away fromtraditional IT delivery models to a more'on-demand,' 'as-a-service' model withminimal management intervention.3. New data sources and workloadsKey data sources for the modernenterprise are already in the cloud.The cloud also allows for new types ofanalytics to be assessed and refinedwithout a long-term commitment toinfrastructure or specific tools.4. CostSnowflake allows true pay-as-you-gocloud scalability without the need forcomplex reconfiguration as your data orworkloads grow.TERADATA TO SNOWFLAKE MIGRATION GUIDE3

WHY SNOWFLAKE?Snowflake’s innovations break down thetechnology and architecture barriers thatorganizations still experience with otherdata warehouse vendors. Only Snowflakehas achieved all six of the defining qualitiesof a data warehouse built for the cloud:ZERO MANAGEMENTSnowflake reduces complexity with built-in performance, so there’s noinfrastructure to tweak, no knobs to turn and no tuning required.ALL OF YOUR DATACreate a single source of truth to easily store, integrate and extractcritical insight from petabytes of structured and semi-structured data(JSON, XML, AVRO).ALL OF YOUR USERSProvide access to an architecturally unlimited number of concurrentusers and applications without eroding performance.PAY ONLY FOR WHAT YOU USESnowflake’s built-for-the-cloud solution scales storage separate fromcompute, up and down, transparently and automatically.THE CORE OF SNOWFLAKEDATA SHARINGSnowflake delivers the performance,Snowflake extends the data warehouse to the Data Sharehouse ,with direct, governed and secure data sharing within seconds, soenterprises can easily forge one-to-one, one-to-many and many-tomany data sharing relationships.concurrency and simplicity needed tostore and analyze all data available to anorganization in one location. Snowflake’stechnology combines the power of dataSQLCOMPLETE SQL DATABASESnowflake supports the tools millions of businessusers already know how to use today.warehousing, the flexibility of big dataplatforms, the elasticity of the cloud andlive data sharing at a fraction of the costof traditional solutions.TERADATA TO SNOWFLAKE MIGRATION GUIDE4

Strategy—thinkingabout your migrationWHAT SHOULD YOU CONSIDER?‘Lift and shift’ vs a staged approachThere are several things to contemplateThe decision whether to move data andwhen choosing your migration path. It’susually desirable to pilot the migrationon a subset of the data and processes.Organizations often prefer to migrate instages, reducing risk and showing valuesooner. However, you must balance thisagainst the need to keep the programmomentum and minimize the period ofdual-running. In addition, your approachmay also be constrained by the interrelationships within the data, such asdata marts that rely on references todata populated via a separate process inanother schema.Questions to ask about yourworkloads and data What workloads and processes can bemigrated with minimal effort? Which processes have issues today andwould benefit from re-engineering? What workloads are outdated andrequire a complete overhaul? What new workloads would youlike to add that would deploy easierin Snowflake?processes in one bulk operation or deploy Or, you’ve chosen to focus on newa staged approach depends on severalbusiness requirements rather thanreworking legacy processes. Or, you want to change your dataecosystem, such as adding a newELT or other data ingestion tool, ormove to new business intelligence(BI) and visualization tools.factors. They include the nature of yourcurrent data analytics platform, the typesand number of data sources and yourfuture ambitions and timescales.Factors that lead to a ‘lift and shift’approach may focus on what youalready have: You have highly integrated data acrossthe existing warehouse. Or, you are migrating a singleindependent, standalone data mart. Or, your current system uses well-designed data and processes usingstandard ANSI SQL. Or, you have timescale pressures tomove off legacy equipment.Factors that may lead toa staged approach: Your warehouse platform consists ofmany independent data marts and otherdata applications, which can be movedindependently over time. Or, you have critical data andprocesses within your data warehousethat no longer perform well and requirere-engineering.TERADATA TO SNOWFLAKE MIGRATION GUIDE5

WHAT YOU DON’T NEED TOWORRY ABOUTWhen migrating to Snowflake fromTeradata there are a number of concernsthat you can forget about as they are notrelevant to your Snowflake environment.Data distributionand primary indexesIn Snowflake, there is no need for primaryindexes. Since compute is separate fromstorage in Snowflake’s architecture, thedata is not pre-distributed to the MPPcompute nodes. In Snowflake, we haveMPP compute nodes that do not rely onthe data being distributed ahead of time.Since Snowflake’s data is not predistributed, it can scale to more parallelWorkload managementDisaster recoveryWorkload management is unnecessary inTeradata has several disaster recoverya Snowflake environment due to its multi-scenarios. Many of them require thecluster architecture, which allows you topurchase of another system, andcreate separate virtual warehouses forpurchasing software such as Unityyour disparate workloads so as to avoidto implement these scenarios. Withresource contention completely.Snowflake, none of this is necessary.Statistics collectionSnowflake automatically capturesSnowflake leverages many of thebuilt-in features of the cloud, such asthe automatic replication of data builtstatistics, relieving DBAs from havinginto AWS. Snowflake is automaticallyto set up jobs to collect statistics forsynced across multiple AWS availabilityperformance tuning. It’s automatic inzones, with more zones and other cloudSnowflake, so you no longer have tocomputing platforms to come available onremember to add new tables to thea consistent basis. There is no workprocess when your data warehouse grows.on your part to establish this.Capacity planningSeparate dev/test environmentcompute nodes instantly. With Teradata,With Snowflake, you pay for only whatWith Teradata, to perform developmentthe data would have to run a reconfig,you use. Snowflake is a SaaS product butand testing, you need additional servers,with new AMPs/nodes added and newfurther enhanced for efficiency with per-which means an additional cash outlayhashmaps created before the data fromsecond, usage-based pricing. Under thisjust for the hardware plus the time tothe physical table could be re-distributedmodel, Snowflake also offers the optionconfigure the hardware. Not so withto the new AMPs. This is a process thatfor further cost reductions for customersSnowflake as you can simply createrequires significant planning and resources,who want to pre-purchase usage. Onanother database in your account and setwhich impacts performance but is notthe flip side, with capacity planning forit up for any purpose you need, such asnecessary with Snowflake.an on-premises Teradata system, you rundev or test. In addition, with Snowflake’sIndexing and query optimizationa risk of over or under configuring yourzero-copy clone feature, you can instantlysystem. Even with Teradata IntelliCloud,populate those databases with completeSnowflake has an optimizer built fromyou have the similar capacity planningcopies of production data for no additionalthe ground up and architected for MPPrisk as compute and storage are fixed percost. With Teradata, you have to endureand the cloud. Snowflake understandsinstance. If you need more capacity youthe painstaking process of exportingparallel execution plans and automaticallymust buy in predefined increments. Withyour production data from one systemoptimizes them, relieving our customers ofSnowflake’s elastic storage and computeto import it to your dev or test server.this task. This means there are no indexesarchitecture, you never have this risk, soin Snowflake so you don’t have to migrateyou can save money and avoid the timeyour USIs or NUSIs.previously spent on extensive planning.TERADATA TO SNOWFLAKE MIGRATION GUIDE6

Migrating your existingTeradata warehouseIn order to successfully migrate youras the output scripting dialect rather thanenterprise data warehouse to Snowflake,Teradata. Keep in mind, Snowflake isyou need to develop and follow a logicalself-tuning and has a unique architecture.plan that includes the items in this section.You won’t need to generate code for anyMOVING YOUR DATA MODELindexes, partitions or storage clausesof any kind that you may have neededAs a starting point for your migration,in Teradata. You only need basic DDL,you’ll need to move your databasesuch as CREATE TABLE, CREATE VIEW,objects from Teradata to Snowflake. Thisand CREATE SEQUENCE. Once youincludes the databases, tables, viewshave these scripts, you can log into yourand sequences in your existing dataSnowflake account to execute them.warehouse that you want to move overto your new Snowflake data warehouse.In addition, you may want to include allof your user account names, roles andobjects grants. At a minimum, the userwho owns the Teradata database mustbe created on the target Snowflakesystem before migrating data.If you have a data modeling tool, but themodel is not current, we recommendyou reverse engineer the current designinto your tool, then follow the approachoutlined above.Using existing DDL scriptsYou can begin with your existing DDLWhich objects you decide to move will bescripts if you don’t have a data modelinghighly dependent on the scope of yourtool. But you’ll need the most recentinitial migration. There are several optionsversion of the DDL scripts (in a versionfor making this happen. The followingcontrol system). You’ll also want tosections outline three possible approachesedit these scripts to remove code forfor moving your data model from Teradataextraneous features and options notto Snowflake.needed in Snowflake, such as primaryUsing a data modeling toolindexes and other storage or distributionrelated clauses. Depending on the dataIf you have stored your data warehousetypes you used in Teradata, you may alsodesign in a data modeling tool, you canneed to do a search-and-replace in thegenerate the DDL needed to rebuild thesescripts to change some of the data typesobjects. Since Snowflake uses standardto Snowflake optimized types. For a list ofSQL, you simply need to pick ANSI SQLthese data types, please see Appendix B.TERADATA TO SNOWFLAKE MIGRATION GUIDE7

Creating new DDL scriptsIf you don’t have current DDL scripts foryou data warehouse, or a data modelingtool, you will need to extract the metadataneeded from the Teradata data dictionaryin order to generate these scripts. Butfor Snowflake, this task is simpler sinceyou won’t need to extract metadata forindexes and storage clauses.As mentioned above, depending onthe data types in your Teradata design,you may also need to change some ofthe data types to Snowflake optimizedtypes. You will likely need to write a SQLextract script of some sort to build theDDL scripts. Rather than do a search andreplace after the script is generated, youcan code these data type conversionsdirectly into the metadata extract script.The benefit is that you have automatedthe extract process so you can do themove iteratively. Plus, you will save timeediting the script after the fact. Additionally,coding the conversions into the script isless error-prone than any manual cleanupprocess, especially if you are migratinghundreds or even thousands of tables.MOVING YOUR EXISTINGDATA SETOnce you have built your objects inSnowflake, you'll want to move thehistorical data already loaded in yourTeradata system over to Snowflake. To dothis, you can use a third-party migrationtool (see Appendix A), an ETL (extract,transform, load) tool or a manual processto move the historical data. Choosingamong these options, you should considerhow much data you have to move.For example, to move 10s or 100s ofterabytes, or even a few petabytes of data,a practical approach may be to extract thedata to files and move it via a service suchas AWS Snowball. If you have to move100s of petabytes or even exabytes ofdata, AWS Snowmobile is likely the moreappropriate option.If you choose to move your data manually,you will need to extract the data for eachtable to one or more delimited flat filesin text format using Teradata ParallelTransporter (TPT). Then upload these filesusing the PUT command into an AmazonS3 staging bucket, either internal or external.We recommend these files be between100MB and 1GB to take advantage ofSnowflake’s parallel bulk loading.After you have extracted the data andmoved it to S3, you can begin loading thedata into your table in Snowflake using theCOPY command. You can check out mored

for moving your data model from Teradata to Snowflake. Using a data modeling tool If you have stored your data warehouse design in a data modeling tool, you can generate the DDL needed to rebuild these objects. Since Snowflake uses standard SQL, you simply need to pick ANSI SQL as the output scripting dialect rather than Teradata. Keep in mind, Snowflake is