Teradata Can Do Sas Too - Stratia

Transcription

Teradata Can Do SAS Too !Harry Droogendyk, Stratia Consulting Inc.2014-05-29

Typical Architecture

Typical Architecturewhy isP11dying?!

Principlesprocessing datado it where it makes sensedon’t move data unnecessarilydetail vs summarys t r e t c h your brain – learn new functionalityuse the right tool / environmentstrengths / weaknessesdo it oncedo not store what you can easily computeUnix space emails ?

ComplicationsSQL is “set” orientedremember Relational Algebra ?data normalizationEmployee set joined to Department setyields “result set”requires abstractionwe ( and SAS ) are “row oriented”read a row, deal with it, read another row1st row ? Last row ?columns ? eg. array structures

Complicationsdata step is so friendlygranular controlexplicit controlcomfort zonebits and bytesSQL can be nebulous“sets” are uncomfortableuncontrollablebut, preferred by the DBwhy a cartesian product join ?!why are results sometimes different ?! eg. ORDER BY

AgendaTeradata can do SAS !!!don’t move the data !!start simple – all in Teradatasorting correctly for SASconditional processinguseful intermediate tablesOLAP / Analytical functionsfirst.lag & leadcumulative / specialized sumswhy isP11dying?!

Sortingcreate table tables asselect * from connection to teradata (select databasename, tablenamefrom dbc.tableswhere databasename in ( 'DDWV01','DDWV04I' )order by tablename );how many tables are in both views?could use SQL EXCEPTusing data step MERGE

Sortingdata idm;merge tables ( where ( databasename 'DDWV01' )in ddwv01 )tables ( where ( databasename 'DDWV04I' )in ddwv04i ) ;by tablename;flag ddwv01 ddwv04i * 2 ;run;ERROR: BY variables are not properly sorted on data setWORK.TABLES.ddwv01 1 ddwv04i 0 DatabaseName DDWV01 TableName tln loansFIRST.TableName 1 LAST.TableName 1 flag . ERROR 1 N 2199NOTE: The SAS System stopped processing this step because oferrors.

SortingASCII collation ?Teradata ignores case by defaultdata step MERGE / BY does notadd a PROC SORT step ?!

Sortingorder by tablename (casespecific)mode ANSI / Teradata – no effectMERGE / BY now -----------------11351135128912242342246

Conditional Logiccall dataqueue cd determines call centre category‘HA’, ‘H’‘CR’, ‘C’‘GN’,’GEN’Home / AutoCreditorGeneral linerecords call metricsinbound / outbound callsoffered, abandonedneed wide table of call data by datetranspose by queuedata step “IF” or “SELECT” ?noooooo

Conditional Logicselect c.call dt, c.operator id, n.operator nm,sum(case when queue cd in ( 'HA', 'H' )then ib cntelse 0end)as ha ib cnt, sum(case when queue cd in ('CR','C')then ib cnt else 0 end) as cr ib cnt, sum(case when queue cd in ('GN','GEN')then ib cnt else 0 end) as gen ib cnt, sum(case when queue cd in ('HA','H')then ob cnt else 0 end) as ha ob cnt, sum(case when queue cd in ('CR','C')then ob cnt else 0 end) as cr ob cnt, sum(case when queue cd in ('GN','GEN')then ob cnt else 0 end) as gen ob cnt fromdb.call data c groupby 1,2,3

Conditional LogicCASE statements can be used anywhereSELECTwithin functions, e.g. SUM ( )WHEREHAVINGetc functions like data step SELECT / WHENgranular control

WORK tablesintermediate tables can be helpfulreduce query complexityouter joins - left, right, full, innerverify intermediate resultsSAS WORK libno need to define, allocate space, cleanupTeradata temporary tablesdisconnect – gonzo

WORK tablesneed data for 2,000 accounts supplied in Excelimportcreate macro variable of IDs using SQL into:acct no in ( &mother of all macro vars ) 64K bytes ?IN () performance ?pull entire account table down to SASsubsetuse volatile tabledefine Teradata libnameproc appendpass-thru inner-join querywhy isP11dying?!

Teradata Temporary TablesCREATE GLOBAL TEMPORARY TABLEcannot be created WITH DATACREATE, subsequent INSERTtoooo lazy to get column definitionsCREATE VOLATILE TABLEcreate WITH DATAdefine a PRIMARY INDEXCOLLECT STATS

Teradata Temporary Tablesconnect to teradata ( &password database ddwv04imode teradata connection global);execute(create volatile table travel case active as (select distinct r.clm bnft cse idfrom ddwv04i.ins clm fncl evnte,ddwv04i.ins clm bnft cse evnt reltnwhere e.evnt sys src idr 75 snip group by r.clm bnft cse id)with data primary index ( clm bnft cse id )on commit preserve rows ) by teradata;

Teradata Temporary Tablestemporary table had a primary indexalign indexes as much as possible – think AMPSshow view teradata view.view name;show table teradata db.table name;get table nameget PIwith data primary index ( clm bnft cse id )assist optimizer in formulating query planexecute (collect statisticscolumn clm bnft cse idon travel case active) by teradata;

Teradata Temporary Tablesvolatile tables are . temporary !disconnect from teradata;say b’byequit;to persist across SQL Connectionslibname tdtemp teradata &passworddatabase ddwv04i mode teradataconnection global dbmstemp yes ;. multiple SQL connects / disconnects / quits .libname tdtemp clear;say b’bye

Teradata Temporary TablesTeradata v14execute ( create table claim master as (with claim status int as (select i ocactvty changes,max(i)as i xfrom fineos.vocstagechangegroup by 1)select a.*from fineos.voccase a,claim status int bwhere a.i b.i x) by netezza;* TD v14 ;

first.by varproc sort data sashelp.classout class;by sex age;run;data unique class;set class;by sex;if first.sex;run;19 observations read from WORK.CLASS.WORK.UNIQUE CLASS has 2 observations

first.by varrequires use of special database functionsWindowingOLAP or AnalyticalQUALIFYlimits result set, analogous to HAVINGOVERdefines grouping criteriaPARTITIONsimilar to GROUP BYORDER BYsorts result set before QUALIFY is applied

first.by varmove SAS data to Teradata volatile table- remember the 2,000 account Excel file ?no Teradata fastload options available for volatile tableslibname tdtemp teradata &password database ddwv04imode teradataconnection global dbmstemp yes ;proc appendbase tdtemp.sashelp classdata sashelp.class;run;%drop td table(lib tdtemp, table unique class);

first.by varexecute ( create volatile table unique class as (select * from sashelp classqualify row number() over( partition by sexorder by age desc, name) 1) with data) by teradata;on commit preserve rowsQUALIFYrow number 1OVERdefines grouping criteriaPARTITIONsexfirst.sex, specified row number 1ORDER BY descending age and nameSAS default is EQUALSTeradata parallel processingexplicitly define order

first.by var - resultsfirst. - 572.0112.5150.0first. - 1662.572.0112.5150.0could we use RANK instead of ROW NUMBER ?what if we want last.sex ?

%drop td table macro%macro drop td table(lib , table );%if %sysfunc(exist(&lib.&table)) %then %do;proc sql;drop table &lib.&table;quit;%end;%mend drop td table;

Lag / Lead FunctionalitySAS has LAG() functionfound in some databases as wellAnalyical / OLAP functionalityMIN / MAX / AVG with OVERrows between 1 following and 1 followingrows between 1 preceding and 1 precedingrows unbounded precedingetc PERIOD data typerange of date valuesuse EXPAND ON to generate rows- lead- lag- all before

Lag / Lead Functionalityperiod ( captr dt,coalesce (min (captr dt)over ( partition by ip rol id, alt noorder by captr dt/*return the next row, i.e. next highest value ofcaptr dt, if there isn't a next row, return theTeradata current date value*/rowsbetween 1 following and 1 following), current date)) as period dt

Lag / Lead FunctionalityPartition byCaptr Dt12014-03-0112014-03-12Teradata current date22014-05-0222014-05-1122014-05-29Teradata current date

PERIOD data type – EXPAND ON/*Now that we have the period() data value, create a rowfor each date between the beginning / ending date valuein the period dt field.We're only interested in dates that have a range fromthe MOR Start date - 6 months to the current IDM snap dt.*/expand on period dt as captr dt2by interval '1' dayfor period ( cast ('2011-04-30' as date),cast( %single(&idm snap dt) as date) ) select begin(captr dt2) as captr dt

Conditional Logic AnywherePERIOD requires begin date end datedata isn’t always prettybut you can do conditional logic in SQL to deal with itperiod( coalesce(req received dt,req created dt),case when app sts cd 'CLOSED'and app sts dt coalesce(req received dt,req created dt)and ( req closed dt is null or app sts dt req closed dt )then app sts dtwhen req closed dt is nullor req closed dt cast ( %single(&idm snap dt) as date )then cast ( %single(&idm snap dt) as date )else req closed dt end) as period dt

Summing DataSAS is easy goingcreate table class sum asselect name, sex, age, weight, height,sum(weight) as wgt sumfrom sashelp.classgroup by sex;NOTE: The query requires remerging summarystatistics back with the original data

Summing Datadatabases are not so tolerantERROR: Teradata prepare: Selected non-aggregatevalues must be part of the associated groupcumulative sumsusing OLAP / Analytical functionsusing CSUMother similar functions available, “moving”MAVGMDIFFMSUM

Summing Datacumulative claim reservesadjuster sets reserve at claim openreserve transactions occur as time goes onincrease if new costs come to lightdecrease as payments are made, or severity lessensoutstanding reserves are a liabilityneed to know outstanding reserves by day1.2.calculate cumulative reservesgenerate daily reserve totals

Summing DataClaimNoTrans DtReserveAmtPymt NoteAmtAccumReserve12014-04-10 500Open50012014-04-12-35012014-04-13 60022014-04-09 1,20022014-04-11-80022014-04-12-400outstanding reserves on Apr 11?350 PaymentAddingOpen800 PaymentClose1507501,2004000

Summing Data - ANSIselect clm bnft cse id, event dt,sum ( evnt amt )over ( partition by clm bnft cse idorder by event dtrows unbounded preceding )as os reserve amtfrom travel reserves;what’s missing ?partition byorder byrows – reset sum on claim case ID change– regulates order of rows into sum– include this row and all rows before it

Summing Data - Teradataselect clm bnft cse id, event dt, csum(evnt amt, event dt) as os reserve amtfrom travel reservesgroup by clm bnft cse idwhat’s missing ?CSUMevnt amtevent dtGROUP BY– Teradata only, not ANSI– summed column– sort column(s)– specifies “reset” column(s)GROUP BY is equivalent to PARTITION BY in previous query

Generate Daily Reserve Rowsselect clm bnft cse id, os reserve amt, period(event dt,coalesce( min(event dt)over ( partition by clm bnft cse idorder by event dtrows between 1 following and 1 following), current date )) as period dtfrom travel cum os reservescreate PERIOD with adjacent rows by EVENT DT

Generate Daily Reserve Rowsselect clm bnft cse id, begin(event dt2)as event dt, os reserve amtfrom ( select coalesce( )) as period dt)expand on period dt as event dt2by interval '1' dayfor period ( cast(%single(&MORStartDate) as date),cast(%single(&idm snap dt) as date) )voila, daily outstanding reserves

Conclusion do stuff where it makes senseo use Teradata's powero summarize, subset, sort in DBo don’t move data unnecessarilyo rarely, if ever, move detail data be concerned with efficiencyo coding, execution & storage be inquisitiveo new releases bring new functionalityo exploit the strengths of your tools

Conclusion web Dieter Noethhttp://stackoverflow.com/users/2527905/dnoeth

ContactHarry w.stratia.ca/papers

OLAP / Analytical functions first. lag & lead cumulative / specialized sums why is P11 dying?! Sorting create table tables as select * from connection to teradata ( select databasename, tablename from dbc.tables where databasename in ( 'DDWV01' ,'DDWV04I' ) order by tablename ); how many tables are in both views? could use SQL EXCEPT using data step MERGE. Sorting data idm; merge tables .