Accessing DB2 Data With SAS 9

Transcription

Accessing DB2 Data with SAS 9A SAS White Paper

Table of ContentsIntroduction .1What is SAS/ACCESS Interface to DB2? .1What Software is Required for SAS to Access My DB2 Database? .2Data Processing Models.2How Long Does It Take to Generate the Result?.2What is the Size of the Result Set? .3How Many Times are the Results Accessed?.3How Many Users Will Be Using SAS Against the Result Set?.3How Fast is the Communication Link Between SAS and DB2?.3Usage Scenarios .4Scenario 1.4Scenario 2.4Scenario 3.4Scenario 4.5Accessing DB2 Data Using SAS.6Database Access from SAS .7When Would You Use SAS/ACCESS Translation to SQL?.7When Would You Use Explicit SQL Pass-Through?.8Using the SAS/ACCESS libname Engine.8Explicit Pass-Through Versus SQL Translation .9Loading and Adding Data .10Creating Tables .11Retrieving the Data into SAS .12What Are the Performance Impacts of These Operations? .12readbuff.13Threaded Read.14Putting Data into Your DB2 Database .16dbcommit .16insertbuff .17More on the SQL Procedure.18Joining Data Sets (Tables) .19Function Pushdown .19The Database Administrator’s Corner .21How Does SAS Use My Database? .21Connections.21Resource Consumption .22Debugging .23What is Ahead for SAS and DB2?.25The Primary Content Provider for Accessing DB2 Data with SAS 9 was Scott Fadden of the DB2 DataManagement Software group at IBM (sfadden@us.ibm.com).

Accessing DB2 Data with SAS 9IntroductionSince its founding in 1976, SAS has had a technological relationship with IBM. This relationshiphas expanded to include joint solution development and marketing. Partnering with IBM is a truewin-win situation for both SAS and IBM. SAS benefits from IBM’s experience in integratingtechnologies, while IBM aligns itself with a market leader in business intelligence. As a result, jointcustomers get the most complete set of intelligence and analytical solutions on the market.In the fall of 2002, IBM announced the latest version of DB2 — IBM DB2 Universal Database,Version 8.1 for UNIX, Windows, and Linux. DB2 V8.1 contains improvements in functionality andperformance that enhance SAS and DB2 integration. The alignment of the DB2 8.1 and SAS 9release schedules has provided a unique opportunity for better product integration andcoordinated development. With DB2 8.1 and SAS 9, customers can take advantage of the jointdevelopment efforts. Improvements have been made in many areas, including multi-row fetch,bulk load, threaded reads, and other functions.This paper explores the impact of SAS and DB2 configuration options by outlining differentmethods of accessing your DB2 database with a focus on performance. Examples are given tohighlight the performance trade-offs of choosing different access methods, SAS 9 applicationparameters, and DB2 8.1 configuration options. In addition, this paper highlights some of the newfeatures in SAS 9 and DB2 8.1. For complete product information for SAS 9, seesupport.sas.com. For information on DB2 8.1, see ibm.com/software/data/db2/udb/.This paper uses examples from a test system to demonstrate the performance impact of tuningoptions with SAS/ACCESS Interface to DB2. The test environment consists of SAS 9 and DB2 8.1running on a single, four-processor system running UNIX, with 4 GB of memory and 40 fibrechannel disks.The performance results of the test system demonstrate the impact of various configurationoptions. However, they are not intended as a method of comparing dissimilar configurations.Different configurations may yield different results.What is SAS/ACCESS Interface to DB2?SAS and DB2 communicate via SAS/ACCESS Interface to DB2 software. SAS/ACCESS Interfaceto DB2 software is a member of a large family of data access products offered by SAS. WithSAS/ACCESS you can use the power and flexibility of SAS software for to analyze and presentdata from a DB2 database. Your DB2 database tables appear native to SAS so that you can useSAS features and functionality to perform extracts of information, without having to learnStructured Query Language (SQL).SAS/ACCESS Interface to DB2 software translates read and write requests from SAS into theappropriate calls for DB2. The result of these calls is to retrieve data as logical views of the DB2database or as extracts of data into SAS data set form.1

Accessing DB2 Data with SAS 9SAS/ACCESS engine functionality ranges from automatic, behind-the-scenes operations thatrequire minimal database knowledge, to completely flexible operations that allow a databaseadministrator to finely tune the data access components of a SAS application. The method youuse depends on your infrastructure, database expertise, and operational goal. This paperexamines the translation process from a SAS application to the corresponding SQL that isrequired in order to exchange information with DB2.In the latest version of SAS/ACCESS Interface to DB2 software for SAS 9 and DB2 8.1, the jointprocessing capabilities have been greatly expanded. These new capabilities includeSAS/ACCESS threaded reads and expanded load support, DB2 Call Load Interface (CLI) load,and improved multi-row fetch performance.What Software is Required for SAS to Access My DB2 Database?Required Software:DB2, Version 7.1 or higherThe SAS System, includingBase SAS software, andSAS/ACCESS Interface to DB2softwareUpgrade Note:Since SAS Version 8, you nolonger need to bind anypackages to the database.There are many configuration options that provide flexibility in designing a unique solution thattakes advantage of the strengths of both SAS and DB2. SAS and DB2 may run on the samesystem or on different systems. Multiple SAS sessions can access a single database server, orone SAS session can access multiple database servers. SAS and DB2 run on many differentplatforms, and the platforms need not be the same for them to interact. For example, your DB2data warehouse might be running on AIX, and your SAS session on Windows. The testenvironment used for examples in this paper had SAS and DB2 running on the same system.No modifications to your DB2 database are necessary for the addition of SAS/ACCESS.SAS/ACCESS communicates with the database server using the DB2 CLI API included in theDB2 client software. Other SAS applications that run on top of Base SAS software can be addedas needed but are not required for running basic SAS applications. Regardless of how yourenvironment is designed, there are a few basic software components that are required: DB2,Version 7.1 or higher, Base SAS software, and SAS/ACCESS Interface to DB2 software.Data Processing ModelsSAS 9 supports many data processing models that allow you to be flexible when designing asolution. The performance of different solutions varies greatly and depends on variables. It isimpossible to evaluate every option because each environment is unique. Therefore, this paperoffers some questions you should consider when developing your data processing model.How Long Does It Take to Generate the Result?It is common for a process in a large data warehouse environment to take 4, 8, 10, or more hoursto generate a result that is used by multiple SAS procedures. If this is the case, you should savethe result set so that you do not need to query the database each time that you want to runanother analysis.2

Accessing DB2 Data with SAS 9What is the Size of the Result Set?If the result set is large, you should consider storing it in a separate database table or data mart.In many situations, it is a good idea to move the data to a system separate from the operationaldata store.How Many Times are the Results Accessed?If the result set is general and you need to run multiple SAS procedures against the same data,then you should maintain a copy of the result set. For example, if it takes one hour to generate theresult set, and 20 people need access to the data, you could save 19 hours of databaseprocessing and reduce the load on the DB2 production system by maintaining the result set.How Many Users Will Be Using SAS Against the Result Set?If your data needs to be accessed by multiple users, you should store the results in an offline datamart on the same server where SAS executes.How Fast is the Communication Link Between SAS and DB2?The answer determines whether you store the results locally or remotely. For example, if thedatabase server is in a different city than the SAS server, you should set up a local data mart sothat SAS requests do not need to be passed over the Wide Area Network (WAN).Figure 1: SAS server accessing the database server directly.3

Accessing DB2 Data with SAS 9Figure 2: SAS server accessing a data mart.Usage ScenariosThe following usage scenarios provide examples of applying the list of questions to developing adata processing model. These scenarios assume that the query generating the result takes manyhours to complete and is resource-intensive.Scenario 1You are the only user accessing the data. You need to access the database once, and you canreuse the result set to complete your analysis.In this case, you should create a local copy of the result set and use it for your analysis.Scenario 2Ten to twenty SAS users are accessing the data and querying it multiple times using differentprocedures.In this case, you should save the result set of the initial query in a data mart so that all SAS userscan access it.Scenario 3Your SAS users require input from multiple data sources.In many SAS environments, data is collected by one or more database administrators and sent tothe SAS user for analysis. You can streamline this process in two ways. You can retrieve the datadirectly from the source. SAS allows you to execute procedures against multiple data sources in asingle step, and SAS collects all the data for you. Alternatively, you can use IBM’s InformationIntegration technology to collect data. An Information Integration server provides a single view tomultiple data sources as if they all existed on a single DB2 server. This single view allows the4

Accessing DB2 Data with SAS 9database administrator to provide a secure, central data access point for SAS users. Bothmethods simplify the collection of data from multiple sources.Figure 3: Gateway accessing multiple data sources.Scenario 4You use SAS to analyze the status of your just-in-time inventory system.In this case, the data that you retrieve needs to be up-to-date. Accessing an operational DB2database is the answer. To enable this type of evaluation, you should consider using the DB2Materialized Query Table (MQT) or the DB2 8.1 Multidimensional Clustering (MDC) feature toimprove query performance.Using the DB2 MQT and/or MDC feature allow(s) you to organize your data for more efficientaccess by organizing or pre-computing the data required by your SAS application. MQT suppliesaggregations and joining of data. MDC allows you to organize the storage of a table along multipledimensions. In either case, SAS does not need special knowledge of the data to use thesefeatures. MDC is applied on the source table itself, defined when the table is created. MQT is aseparate table containing the aggregated results.For example, if you are analyzing data by region, you can use an MQT to pre-compute values byregion. In this case, when your SAS application requests the information, much of thecomputation is done. The following code example shows a MQT that contains informationaggregated by state.5

Accessing DB2 Data with SAS 9/* Pre-compute some information by state */CREATE TABLE StateMQTAS (SELECT avg(value) as value,avg(RENT1) as rent1,max(rhhinc) as MaxInc,min(rhhinc) as MinInc,avg(rhhinc) as AvgInc,count(*) as Population,stateFROM hrecs dbGROUP BY state)DATA INITIALLY DEFERRED REFRESH DEFERREDAccessing DB2 Data Using SASThe last section introduced a high-level overview of what SAS to DB2 communication entails. Thissection provides a more detailed look at the SAS/ACCESS Interface to DB2 software.To begin, you need to understand how SAS processes data.Most SAS analysis procedures require that the input be from one or more preprocessed SAS datasets. Other procedures and data steps are designed to prepare the data for processing.For example, the execution steps of a print procedure are:/* Step 1, Sort the SAS data set */proc sort data census.hrecs;by state;run;/* Step 2, Print the results */proc print data census.hrecs(keep State serialno);by state;run;In this example, you want to print the contents of the hrecs data set sorted by state. To do sousing a SAS data set, you use proc sort to order the data set and then proc print toproduce the report.Using SAS/ACCESS Interface to DB2 to read directly from the DB2 database can make executingthese procedures more efficient. In the above example, if your data source were DB2, you wouldnot need to sort the data for proc print. SAS/ACCESS automatically generates the SQLorder by clause and the database orders the result. This action is supported through theSAS/ACCESS translation to SQL engine.6

Accessing DB2 Data with SAS 9When migrating your SAS code so that you can use SAS/ACCESS Interface to DB2, make surethat you remove procedures that do not apply to the database (proc sort for example). In theexample above, if you run proc sort against the database, it will exit with an error indicatingthat the database table does not work in replace mode. This error does not impact performance.If, on the other hand, proc sort sends the data to a separate database table, SAS/ACCESSwill execute a select with order by statement and insert the results. Database tables are notordered, so this would be an unnecessary processing step and would impact performance.Performance Tip:PortingIf you are migrating code fromusing a SAS data set to usingDB2, make sure you removeunnecessary procedures likeproc sort. Even though they arenot necessary they will executeand take time to process.Database Access from SASThere are two ways to connect to your DB2 database from SAS. You can connect using thelibname engine, or you can connect directly to the database using the connect statement in theSQL procedure.When connecting to the database using the libname engine, SAS automatically translates SASdata access application code to SQL. Translation to SQL means that SAS/ACCESS processesthe SAS application code and then generates the appropriate SQL to access the database.When connecting directly to the database using the connect statement, you can use explicitSQL pass-through. Explicit SQL pass-through is a mechanism that allows you to pass unalteredSQL directly to the database server. Explicit SQL pass-through is useful for adding database-onlyoperations to your SAS application and is accessible using the SQL procedure (proc sql).Most SAS procedures and data steps use the SAS/ACCESS SQL translation engine. Followingis an example of SQL translation for the print procedure in the preceding example. When thesame proc print procedure is executed using SAS/ACCESS against a DB2 database, therequest is translated into SQL for processing by DB2./* SQL generated or proc print */select “state”,”serialno”from hrecsorder by state;This is the SQL generated for the proc print statement. As mentioned earlier, proc sortwould not be necessary in this case.When Would You Use SAS/ACCESS Translation to SQL?You should use the SAS/ACCESS translation to SQL when: You want to use SAS data access functionality (for example, threaded reads). You are joining data from multiple data sources. The application needs to be portable to different relational databases. The procedure or data step requires it (for example, proc freq, proc summary).7

Accessing DB2 Data with SAS 9When Would You Use Explicit SQL Pass-Through?Explicit SQL pass-through should be used when:* DB2 database processing steps are executed from a SAS application. You want to use DB2-specific SQL.Using the SAS/ACCESS libname EngineThe libname engine is used by SAS 9 to access data libraries (a data source). By way of thelibname engine, SAS uses the relational database library to access a DB2 database.The SAS libname engine allows you to easily modify applications to use different data sources.You can modify a SAS statement that uses a SAS data set to use a table in your DB2 databaseinstead by simply changing the libname definition. For example, the following is a script thataccesses a SAS data set named mylib.data1 to generate frequency statistics./* Define the directory /sas/mydata as a SAS library */Coding Tip:Accessing long or casesensitive table nameslibname mylib data “/sas/mydata”;/* Run frequency statistics against the data1 data set */If you need to access data fromtables that are not in alluppercase, set the followinglibname option:proc freq data mylib.data1;where state ’01’;table state tenure yrbuilt yrmoved msapmsa;run;preserv

Accessing DB2 Data with SAS 9 3 What is the Size of the Result Set? If the result set is large, you should consider stor ing it in a separate database table or data mart. In many situations, it is a good idea to move the data to a system separ