Data Warehouse On A Budget How To Really Do More With Less

Transcription

Data Warehouse on a BudgetHow to really do more with lessDiscussionOctober 11, 2009

Agenda Essentials (foundation products) Key components (design build products) Inexpensive, high quality substitutes A word about acceleratorsHow to deliver value quickly Other optionsAlternatives to mainstream thoughtConfidential and Proprietary2

About me Led design/build of seven (7) large scale(over 5TB) data warehouses designed, built,and deployed in the Financial Services,Transportation, Supply Chain, Retail, Utility,and Professional Services industries Over twenty-five (25) data marts (specialpurpose subject areas) designed, built, anddeployed across a wide variety of industries Eight (8) data warehouse executiveassessments prepared and delivered formanagement review and action. In addition,five (5) detailed business cases prepared tosupport the investment in the analyticenvironment Five (5) commercial off-the-shelf productsdeveloped and marketed worldwide to theSoftware Engineering and HealthcareindustriesConfidential and Proprietary3

EssentialsFoundation componentsConfidential and Proprietary4

Essentials – Fundamental PatternConfidential and Proprietary5

Essentials – Full DeploymentConfidential and Proprietary6

Essentials - Federated EnvironmentsClaims ProcessingSAP FinancialsCRM3rd PartyE-CommerceWarrantyAnalysisFederated Meta Data RepositoriesAWARECommonStaging AreaReal Time ODSFederatedClaimsProcessingData arehouseFederatedMarketingDataWarehouseSubset DataMartsReal Time DataMining andAnalyticsReal Time s, etc.AnalyticalApplicationsConfidential and Proprietary7

Essentials – Common Information ModelConfidential and Proprietary8

Essentials – EIIConfidential and Proprietary9

Design Build ComponentsConfidential and Proprietary10

A closer look Confidential and Proprietary11

Key ProcessesConfidential and Proprietary12

Test Automation StrategyT e s t Au to m a tio n S tra te g yU n it B a s ic sT e s t E x e c u tio nR ec orded T estS cripte d T estD ata D rive n T es tT est A utom ationF ra m e w orkT est R unn erT est C ase O bjectT e s t D is c o ve ryT est E num erationT est S electionT est M ethodA ss ertion M ethodF o ur P hase T es tA s s e rtio n M e s s a g eT es t A uto m ation F ram ew orkT e s t F ix tu re S tra te g yT e s t D e fin itio nF res h F ixtureS hared F ixtureIm m utableP re-builtS tanda rd F ixtureT est C as e O bjectT es t C ase C lassS h a re d F ix tu re P a tte rn sF re s h F ix tu re P a tte rn sC onstructionInline S etupD elegated S etupIm plicit S etupLa zy F ixture S etupC reation M e thodO bjec t M etho dS uite F ixture S etupC haine d T estAccessF inder M ethodR e s u lt V e rific a tio n P a tte rn sD ec ora ted S e tupF ixture R egis tryD elta A s sertionF ix tu re T e a r D o w n P a tte rn sS tate V erificationB ehavior V erificationIn line T ear D ow nIm plicit T ear D ow nG arbage C ollected T earDownA utom ated T ea r D o w nE xpected O bjec tG u a rd A s s e rtio nC ustom A ss ertionConfidential and ProprietaryV erific ation M ethodD elta A ss ertion13

Test Automation Strategy - RealizedConfidential and Proprietary14

Data QualityData Quality ProcessData ProfilingMeasureQuantifies the numberand types of defectsAssess the nature andcause of the defectsAnalyzeParseIsolate and identifydata elements in datastructuresData EnhancementAppend additional dataenhancing theinformation valueEnhanceMatch and ConsolidateData CleansingStandardizeMatchIdentify duplicaterecords within multipletables, databasesNormalize data values andformats according to businessrules and third-partyreferencesConsolidateCombine unique dataelements from matchedrecords into a singlesourceCorrectManagement Reporting and OversightVerify, scrub, andappends data based uponalgorithms, business rulesprovided from asecondary sourceConfidential and ProprietaryProvide reporting withinthe data quality processReport15

Data Quality – Why it is neededSQL Server 2008Data Profiling Task in Integration Services16

ChoicesEnabling TechnologiesConfidential and Proprietary17

The choices ORACLE, SAP, IBM, Informatica––– Microsoft–– Good, well rounded general purpose platformMissing key management and meta-data elementsOpen Source (Pentaho, Jaspersoft, and Infobright)–– PowerfulExpensiveDemands high skill levels to deploy successfullyValidated the market for open source BI reporting and ETL toolsGood, special purpose tools in the right hands (Talend)Alternatives––Wherescape REDSpecial Purpose Tools (SeeWhy, Pervasive)Confidential and Proprietary18

Total Cost of Ownership Labor intensive Subject to Vendor Driven Architecture (VDA) Expensive (maintenance, hidden support costs)? Missing critical management components Customization and development costs Meet organizational capability and align with objectives––– Expensive and time consuming if notJAVA or .NETUNIX or MicrosoftTechnical debt––Quick and dirty is expensiveShould invest more heavily in designConfidential and Proprietary19

How to save 10 million dollars Replace –––––––––AIX with LinuxWebsphere with JBOSSDomino with Alfresco or Drupal (ECM)Cognos with PentahoTivoli Monitoring with HypericTivoli Netview with ZenossTivoli (Netcool) with OpenNMSTivoli Configuration Manager with PuppetTivoli Provisioning Manager with OpenQRMwhile staring into the abyss John Willis: IT Management and Cloud fidential and Proprietary20

Seriously Most of our costs are in our people (4-5x)– Development– Support– Maintenance Need for consistent, repeatable process controls–––––Enable cost efficiencyDeliver information products faster and less expensiveReduced complexityComponent reuseImproved communication Leverage standardization benefits––––Less variance in work productsSolve problems onceImproved quality (defects caught earlier in cycle)Adopt standardized reference models, and templatesConfidential and Proprietary21

Seriously Open Source may not be so “Open”– Align with internal skills and core competencies UNIX vs. Windows Java vs. .NET Perl vs. Powershell or WSH PHP vs. ASP Windows DW Stack may not be complete– Management– Metadata– Flexibility Do not try to build a system whose complexityexceeds the organization's capabilities to deliverConfidential and Proprietary22

What is the best solution on a budget? Probably something in betweenPlatform (don’t forget virtualization in development)Database and Storage ArchitectureMiddlewareData Profiling and Quality ToolsConfiguration Management and ALMTest Automation and Continuous Integration Cruise Control NANT MAVEN– Reporting and Information Delivery Reporting Services Excel (Server based – zero footprint)––––––Confidential and Proprietary23

Inexpensive, high quality substitutesAlternatives to mainstream thoughtConfidential and Proprietary24

Zenoss Core - monitoring and systems managementConfidential and Proprietary25

Puppet – Automated Systems AdministrationConfidential and Proprietary26

Subversion – Version ControlConfidential and Proprietary27

Maven and Eclipse – Build and Manage ProjectsConfidential and Proprietary28

Pentaho (BI-Suite)Confidential and Proprietary29

JaspersoftConfidential and Proprietary30

TalendConfidential and Proprietary31

INFOBrightConfidential and Proprietary32

Protégé and the Essential Architecture ProjectConfidential and Proprietary33

DB Designer 4Confidential and Proprietary34

A word about acceleratorsConfidential and Proprietary35

Wherescape REDConfidential and Proprietary36

Wherescape REDConfidential and Proprietary37

MethodologyAlong the way Confidential and Proprietary38

MIKE2.0 (Methodology)Confidential and Proprietary39

Comprehensive Process ModelsConfidential and Proprietary40

Self documentingConfidential and Proprietary41

Questions and reference links Wherescape REDhttp://www.wherescape.com/home/home.aspx Talendhttp://www.talend.com/index.php Essential Projecthttp://www.enterprise-architecture.org/ Mike 2.0http://mike2.openmethodology.org/ Pentaho BI Enterprise Suitehttp://www.pentaho.com/ nfoBrighthttp://www.infobright.com/InfoBright

Questions and reference links JasperSofthttp://www.jaspersoft.com/ John Willis: IT Management and Cloud 0-million-dollars-while-staring-intothe-abyss/ Cruise Controlhttp://cruisecontrol.sourceforge.net/ Mavenhttp://maven.apache.org/ NANThttp://nant.sourceforge.net/ Subversion, Puppethttp://subversion.tigris.org/, http://reductivelabs.com/trac/puppet/

Data Warehouse on a BudgetHow to really do more with lessThank You

Data Warehouse on a BudgetHow to really do more with lessMr. Parnitzke is a hands-on technology executive, trusted partner,advisor, software publisher, and widely recognized databasemanagement and enterprise architecture thought leader. Over his careerhe has served in executive, technical, publisher (commercial software),and practice management roles across a wide range of industries. Now ahighly sought after technology management advisor and hands-onpractitioner his customers include many of the Fortune 500 as well asemerging businesses where he is known for taking complex challengesand solving for them across all levels of the customer’s organizationdelivering distinctive value and lasting :Applied Enterprise Architecture (pragmaticarchitect.wordpress.com)The Corner Office (cornerofficeguy.wordpress.com)Data management professional (jparnitzke.wordpress.com)Essential Analytics (essentialanalytics.wordpress.com)The program office (theprogramoffice.wordpress.com)Confidential and Proprietary45

Open Source (Pentaho, Jaspersoft, and Infobright) – Validated the market for open source BI reporting and ETL tools – Good, special purpose tools in the right hands (Talend) Alternatives – Wherescape RED – Special Purpose Tools (SeeWhy, Pervasive) The choices. Confidential and Proprietary 19 Labor intensive Subject to Vendor Driven Architecture (VDA) Expensive .