How To Use PowerCenter PowerCenter With Teradata To Load .

Transcription

How to Use PowerCenter with Teradata toLoad and Unload Data 2009-2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means(electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.

AbstractThis article explains how to move data between PowerCenter and Teradata databases. It explains when to useTeradata relational connections, Teradata load and unload utilities, or pushdown optimization to move data. It also listsissues you might encounter when loading data to or unloading data from Teradata and the workarounds.Table of ContentsOverview . 3Prerequisites . 4Teradata Relational Connections . 5Creating a Teradata Relational Connection . 6Standalone Load and Unload Utilities . 6Teradata FastLoad . 7Teradata MultiLoad. 7Teradata TPump . 7Teradata FastExport . 8Teradata Parallel Transporter . 8Pushdown Optimization . 9Achieving Full Pushdown without Affecting the Source System . 12Achieving Full Pushdown with Parallel Lookups . 13Achieving Pushdown with Sorted Aggregation . 14Achieving Pushdown for an Aggregator Transformation . 14Achieving Pushdown when a Transformation Contains a Variable Port . 14Improving Pushdown Performance in Mappings with Multiple Targets . 14Removing Temporary Views when a Pushdown Session Fails . 15Issues Affecting Pushdown Optimization . 16Issues Affecting Loading to and Unloading from Teradata . 17Making 32-bit Load and Unload Utilities Work with 64-bit PowerCenter . 17Increasing Lookup Performance. 17Performing Uncached Lookups with Date/Time Ports in the Lookup Condition . 18Restarting a Failed MultiLoad Job Manually . 19Configuring Sessions that Load to the Same Table . 19Setting the Checkpoint when Loading to Named Pipes . 20Loading from Partitioned Sessions . 20Loading to Targets with Date/Time Columns . 20Hiding Passwords . 21Using Error Tables to Identify Problems during Loading . 212

OverviewTeradata is a global technology leader in enterprise data warehousing, business analytics, and data warehousingservices. Teradata provides a powerful suite of software that includes the Teradata Database, data access andmanagement tools, and data mining applications. PowerCenter works with the Teradata Database and Teradata toolsto provide a data integration solution that allows you to integrate data from virtually any business system into Teradataas well as leverage Teradata data for use in other business systems.PowerCenter uses the following techniques when extracting data from and loading data to the Teradata database: ETL (extract, transform, and load). This technique extracts data from the source systems, transforms the datawithin PowerCenter, and loads it to target tables. The PowerCenter Integration Service transforms all data. If youuse the PowerCenter Partitioning option, the Integration Service also parallelizes the workload. ELT (extract, load, and then transform). This technique extracts data from the source systems, loads it to userdefined staging tables in the target database, and transforms the data within the target system using generatedSQL. The SQL queries include a final insert into the target tables. The database system transforms all data andparallelizes the workload, if necessary. ETL-T (ETL and ELT hybrid). This technique extracts data from the source systems, transforms the data withinPowerCenter, loads the data to user-defined staging tables in the target database, and further transforms the datawithin the target system using generated SQL. The SQL queries include a final insert into the target tables. TheELT-T technique is optimized within PowerCenter so that the transformations that better perform within thedatabase system can be performed there and the Integration Service performs the other transformations.To perform ETL operations, configure PowerCenter sessions to use a Teradata relational connection, a Teradatastandalone load or unload utility, or Teradata Parallel Transporter. To use ELT or ETL-T techniques, configurePowerCenter sessions to use pushdown optimization.Use a Teradata relational connection to communicate with Teradata when PowerCenter sessions load or extract smallamounts of data ( 1 GB per session). Teradata relational connections use ODBC to connect to Teradata. ODBC is anative interface for Teradata. Teradata provides 32- and 64-bit ODBC drivers for Windows and UNIX platforms. Thedriver bit mode must be compatible with the bit mode of the platform on which the PowerCenter Integration Serviceruns. For example, 32-bit PowerCenter only runs with 32-bit drivers.Use a standalone load or unload utility when PowerCenter sessions extract or load large amounts of data ( 1 GB persession). Standalone load and unload utilities can increase session performance by loading or extracting data directlyfrom a file or pipe rather than running the SQL commands to load or extract the same data. All Teradata standaloneload and unload utilities are fully parallel to provide optimal and scalable performance for loading data to or extractingdata from the Teradata Database. PowerCenter works with the Teradata FastLoad, MultiLoad, and TPump load utilitiesand the Teradata FastExport unload utility.Use Teradata Parallel Transporter for PowerCenter sessions that must quickly load or extract large amounts of data( 1 GB per session). Teradata Parallel Transporter provides all of the capabilities of the standalone load and unloadutilities, plus it provides more granular control over the load or unload process, enhanced monitoring capabilities, andthe ability to automatically drop log, error, and work tables when a session starts. Teradata Parallel Transporter is aparallel, multi-function extract and load environment that provides access to PowerCenter using an open API. It canload dozens of files using a single control file. It also allows you to distribute the workload among several CPUs,eliminating bottlenecks in the data loading and extraction processes.Use pushdown optimization to reduce the amount of data passed between Teradata and PowerCenter or when theTeradata database can process transformation logic faster than PowerCenter. Pushdown optimization improvessession performance by “pushing” as much transformation logic as possible to the Teradata source or target database.PowerCenter processes any transformation logic that cannot be pushed to the database. For example, pushing Filtertransformation logic to the source database can reduce the amount of data passed to PowerCenter, which decreasessession run time. When you run a session configured for pushdown optimization, PowerCenter translates the3

transformation logic into SQL queries and sends the queries to the Teradata database. The Teradata databaseexecutes the SQL queries to process the transformation logic.PrerequisitesBefore you run sessions that move data between PowerCenter and Teradata, you might want to install Teradata clienttools. You also need to locate the Teradata TDPID.Teradata Client ToolsTeradata client tools help you communicate with the Teradata database and debug problems that occur when asession loads data to or extracts data from the Teradata database.You can install the following Teradata client tools: BTEQ. A general-purpose, command-line utility (similar to Oracle SQL*Plus) that enables you to communicatewith one or more Teradata databases. Teradata SQL Assistant. A GUI-based tool that allows you to retrieve data from any ODBC-compliant databaseserver and manipulate and store the data in desktop applications. Teradata Queryman is the older version of thistool.Install BTEQ or Teradata SQL Assistant to help you debug problems that occur when loading to and extracting fromTeradata. Both tools are included in the Teradata Utility Pack, which is available from Teradata.TDPIDThe Teradata TPDID indicates the name of the Teradata instance and defines the name a client uses to connect to aserver. When you use a Teradata Parallel Transporter or a standalone load or unload utility with PowerCenter, youmust specify the TDPID in the connection properties.The Teradata TDPID appears in the hosts file on the machines on which the Integration Service and PowerCenterClient run. By default, the hosts file appears in the following location: UNIX: /etc/hosts Windows: %SystemRoot%\system32\drivers\etc\hosts** The actual location is defined in the Registry keyHKEY LOCAL rameters\DataBasePathThe hosts file contains client configuration information for Teradata. In a hosts file entry, the TDPID precedes the string“cop1.”For example, the hosts file contains the following entries:127.0.0.1 localhost192.168.80.113td 1192.168.80.114td 2192.168.80.115td 3192.168.80.116td 4demo1099cop1custcop1custcop2custcop3custcop4The first entry has the TDPID “demo1099.” This entry tells the Teradata database that when a client tool references theTeradata instance “demo1099,” it should direct requests to “localhost” (IP address 127.0.0.1).The following entries have the same TDPID, “cust.” Multiple hosts file entries with the same TDPID indicate theTeradata instance is configured for load balancing among nodes. When a client tool attempts to reference Teradatainstance “cust,” the Teradata database directs requests to the first node in the entry list, “td 1.” If it takes too long forthe node to respond, the database redirects the request to the second node, and so on. This process prevents the firstnode, “td 1” from becoming overloaded.4

Teradata Relational ConnectionsTeradata relational connections use ODBC to connect to Teradata. PowerCenter uses the ODBC Driver for Teradatato retrieve metadata and read and write to Teradata. To establish ODBC connectivity between Teradata andPowerCenter, install the ODBC Driver for Teradata on each PowerCenter machine that communicates with Teradata.The ODBC Driver for Teradata is included in the Teradata Tools and Utilities (TTU). You can download the driver fromthe Teradata web site.Use a Teradata relational connection when extracting or loading small data sets, usually 1 GB per session. Insessions that extract or load large amounts of data, a standalone load or unload utility or Teradata Parallel Transporteris usually faster than a Teradata relational connection.PowerCenter works with the ODBC Driver for Teradata available in the following TTU versions:PowerCenter VersionsTTU Versions7.0 - 8.1.18.18.5 and later8.2, 12.0For more information about the TTU versions that work with PowerCenter, see the TTU Suppor

PowerCenter uses the following techniques when extracting data from and loading data to the Teradata database: ETL (extract, transform, and load). This technique extracts data from the source systems, transforms the data within PowerCenter, and loads it to target tables. The PowerCenter Integration Service transforms all data. If you