SSIS Best Practices

Transcription

Irish SQL Academy 2008. Level 300Bob Duffy

DTS 2000SSIS 20051.75 Developers*Figures are only approximations and should not be referenced or quoted

Optimize andStabilize the basicsMeasureTuneParallelize Minimize staging (else use RawFiles if possible) Hardware Infrastructure: Disks, RAM, CPU, Network SQL Infrastructure: File Groups, Indexing, Partitioning Replace destinations with RowCount Source- RowCount throughput Source- Destination throughput OVAL performance tuning strategy The Three S‟s Data Flow Bag of Tricks Lookup patterns Script vs custom transform

Sharpen Increase the efficiency ofevery aspectShare Parallelize, partition,pipelineSpend Buy faster, bigger, betterhardware

But be aware of limitations

s)Blocking(asynchronous)

spx

Source data Source serversEMC CX600run SSIS2 Gb Fiber ChannelDestinationserver runs SQLServerDatabaseEMC CX3-801 Gb Ethernetconnections4 Gb Fiber ChannelSource servers:Unisys ES3220L2 sockets each with 4 coreIntel 2 GHz CPUs4 GB RAMWindows Server 2008SQL Server 2008Destination server:Unisys ES7000/One32 sockets each with dual coreIntel 3.4 GHz CPUs256 GB RAMWindows Server 2008SQL Server 2008

sysES7000/one Enterprise ServerMicrosoft Windows Server 2008 x64 Datacenter Edition32 socket dual core Intel Xeon 3.4 GHz (7140M)256 GB8 dual port 4Gbit FCIntel PRO/1000 MT Server AdapterPre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)EMC Clariion CX3-80 (Qty 1)11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit Storage:4UnisysES3220LWindows2008 x64 Enterprise Edition2 socket quad core Intel Xeon processors @ 2.0GHz4 GB1 dual port 4Gbit Emulex FCIntel PRO1000/PT dual portPre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)

C1C1C1C1

Orders TablePartition1Orders 1Partition2Orders 2Partition3Orders 3Partition4Orders 4Partition5Orders 5Partition6Orders l.5orders.tbl.6.Orders 55orders.tbl.55Partition56Orders on55orders.tbl.56

(Package detailsremoved to protectthe innocent)

Follow MicrosoftDevelopment Guidelines Iterative design, development & testingUnderstand the Business People & Processes Kimball‟s ETL and SSIS books are an excellent referenceGet the big picturePlatform considerations Resource contention, processing windows, SSIS does not forgive bad database design Old principles still apply – e.g. load with/without indexes? Will this run on IA64 / X64? No BIDS on IA64 – how will I debug? Is OLE-DB driver XXX available on IA64? Memory and resource usage on different platforms

ProcessModularity Break complex ETL into logically distinct packages (vsmonolithic design) Improves development & debug experiencePackageModularity Separate sub-processes within package into separate Containers More elegant, easier to develop Simple to disable whole Containers when debuggingComponentModularity Use Script Task/Transform for one-off problems Build custom components for maximum re-use

Concise naming conventionsConformed “blueprint” design patternsPresentable layoutAnnotationsError LoggingConfigurations

Get as close to the data as possible Limit number of columns Filter number of rowsDon‟t be afraid to leverage TSQL Type conversions, null coercing, coalescing, data type sharpening select nullif(name, „‟) from contacts order by 1 select convert(tinyint, code) from salesPerformance Testing & Tuning Connect Output to RowCount transform See Performance Best Practices„FastParse‟ for text files

BEFORE:AFTER:selectselect * from etl.uf FactStoreSales(@Date)dbo.Tbl Dim Store.SK Store ID, Tbl Dim Store.Store Num,isnull(dbo.Tbl Dim Merchant Division.SK Merch Div ID, 0) as SK Merch Div IDfrom dbo.Tbl Dim Storeleft outer join dbo.Tbl Dim Merchant Divisionon dbo.Tbl Dim Store.Merch Div Num dbo.Tbl Dim Merchant Division.Merch Div Numwhere Current Row 1

Use the power of TSQL to clean the data 'on the fly'

Avoid izeancillary data Too many moving parts is inelegant and likely slow But don‟t be afraid to experiment – there are many ways tosolve a problem Allocate enough threads EngineThreads property on DataFlow Task See Performance Talk Synchronous vs. Asynchronous components Memcopy is expensive For example, minimize data retrieved by LookupTx

Three Modes ofOperationTradeoffmemory vs.performanceCan use MergeJoin componentinstead Full Cache – for small lookup datasets No Cache – for volatile lookup datasets Partial Cache – for large lookup datasets Full Cache is optimal, but uses the most memory, also takes time toload Partial Cache can be expensive since it populates on the fly usingsingleton SELECTs No Cache uses no memory, but takes longer Catch is that it requires Sorted inputs See SSIS Performance white paper for more details

CustomcomponentsScripts Can written in any .Net language Must be signed, registered and installed – butcan be widely re-used Quite fiddly for single task Can be written in VisualBasic.Net or C# Are persisted within a package – and havelimited reuse Have template methods already created for you

rf/archive/2008/02/27/etl-world-record.aspx

Kimball‟s ETL and SSIS books are an excellent reference Understand the Business Resource contention, processing windows, SSIS does not forgive bad database design Old principles still apply -e.g. load with/without indexes? Get the big picture Will this run on IA64 / X64? No BIDS on IA64 -how will I debug?