Transcription
Data Warehouse on a BudgetHow to really do more with lessDiscussionOctober 11, 2009
Agenda Essentials (foundation products) Key components (design build products) Inexpensive, high quality substitutes A word about acceleratorsHow to deliver value quickly Other optionsAlternatives to mainstream thoughtConfidential and Proprietary2
About me Led design/build of seven (7) large scale(over 5TB) data warehouses designed, built,and deployed in the Financial Services,Transportation, Supply Chain, Retail, Utility,and Professional Services industries Over twenty-five (25) data marts (specialpurpose subject areas) designed, built, anddeployed across a wide variety of industries Eight (8) data warehouse executiveassessments prepared and delivered formanagement review and action. In addition,five (5) detailed business cases prepared tosupport the investment in the analyticenvironment Five (5) commercial off-the-shelf productsdeveloped and marketed worldwide to theSoftware Engineering and HealthcareindustriesConfidential and Proprietary3
EssentialsFoundation componentsConfidential and Proprietary4
Essentials – Fundamental PatternConfidential and Proprietary5
Essentials – Full DeploymentConfidential and Proprietary6
Essentials - Federated EnvironmentsClaims ProcessingSAP FinancialsCRM3rd PartyE-CommerceWarrantyAnalysisFederated Meta Data RepositoriesAWARECommonStaging AreaReal Time ODSFederatedClaimsProcessingData arehouseFederatedMarketingDataWarehouseSubset DataMartsReal Time DataMining andAnalyticsReal Time s, etc.AnalyticalApplicationsConfidential and Proprietary7
Essentials – Common Information ModelConfidential and Proprietary8
Essentials – EIIConfidential and Proprietary9
Design Build ComponentsConfidential and Proprietary10
A closer look Confidential and Proprietary11
Key ProcessesConfidential and Proprietary12
Test Automation StrategyT e s t Au to m a tio n S tra te g yU n it B a s ic sT e s t E x e c u tio nR ec orded T estS cripte d T estD ata D rive n T es tT est A utom ationF ra m e w orkT est R unn erT est C ase O bjectT e s t D is c o ve ryT est E num erationT est S electionT est M ethodA ss ertion M ethodF o ur P hase T es tA s s e rtio n M e s s a g eT es t A uto m ation F ram ew orkT e s t F ix tu re S tra te g yT e s t D e fin itio nF res h F ixtureS hared F ixtureIm m utableP re-builtS tanda rd F ixtureT est C as e O bjectT es t C ase C lassS h a re d F ix tu re P a tte rn sF re s h F ix tu re P a tte rn sC onstructionInline S etupD elegated S etupIm plicit S etupLa zy F ixture S etupC reation M e thodO bjec t M etho dS uite F ixture S etupC haine d T estAccessF inder M ethodR e s u lt V e rific a tio n P a tte rn sD ec ora ted S e tupF ixture R egis tryD elta A s sertionF ix tu re T e a r D o w n P a tte rn sS tate V erificationB ehavior V erificationIn line T ear D ow nIm plicit T ear D ow nG arbage C ollected T earDownA utom ated T ea r D o w nE xpected O bjec tG u a rd A s s e rtio nC ustom A ss ertionConfidential and ProprietaryV erific ation M ethodD elta A ss ertion13
Test Automation Strategy - RealizedConfidential and Proprietary14
Data QualityData Quality ProcessData ProfilingMeasureQuantifies the numberand types of defectsAssess the nature andcause of the defectsAnalyzeParseIsolate and identifydata elements in datastructuresData EnhancementAppend additional dataenhancing theinformation valueEnhanceMatch and ConsolidateData CleansingStandardizeMatchIdentify duplicaterecords within multipletables, databasesNormalize data values andformats according to businessrules and third-partyreferencesConsolidateCombine unique dataelements from matchedrecords into a singlesourceCorrectManagement Reporting and OversightVerify, scrub, andappends data based uponalgorithms, business rulesprovided from asecondary sourceConfidential and ProprietaryProvide reporting withinthe data quality processReport15
Data Quality – Why it is neededSQL Server 2008Data Profiling Task in Integration Services16
ChoicesEnabling TechnologiesConfidential and Proprietary17
The choices ORACLE, SAP, IBM, Informatica––– Microsoft–– Good, well rounded general purpose platformMissing key management and meta-data elementsOpen Source (Pentaho, Jaspersoft, and Infobright)–– PowerfulExpensiveDemands high skill levels to deploy successfullyValidated the market for open source BI reporting and ETL toolsGood, special purpose tools in the right hands (Talend)Alternatives––Wherescape REDSpecial Purpose Tools (SeeWhy, Pervasive)Confidential and Proprietary18
Total Cost of Ownership Labor intensive Subject to Vendor Driven Architecture (VDA) Expensive (maintenance, hidden support costs)? Missing critical management components Customization and development costs Meet organizational capability and align with objectives––– Expensive and time consuming if notJAVA or .NETUNIX or MicrosoftTechnical debt––Quick and dirty is expensiveShould invest more heavily in designConfidential and Proprietary19
How to save 10 million dollars Replace –––––––––AIX with LinuxWebsphere with JBOSSDomino with Alfresco or Drupal (ECM)Cognos with PentahoTivoli Monitoring with HypericTivoli Netview with ZenossTivoli (Netcool) with OpenNMSTivoli Configuration Manager with PuppetTivoli Provisioning Manager with OpenQRMwhile staring into the abyss John Willis: IT Management and Cloud fidential and Proprietary20
Seriously Most of our costs are in our people (4-5x)– Development– Support– Maintenance Need for consistent, repeatable process controls–––––Enable cost efficiencyDeliver information products faster and less expensiveReduced complexityComponent reuseImproved communication Leverage standardization benefits––––Less variance in work productsSolve problems onceImproved quality (defects caught earlier in cycle)Adopt standardized reference models, and templatesConfidential and Proprietary21
Seriously Open Source may not be so “Open”– Align with internal skills and core competencies UNIX vs. Windows Java vs. .NET Perl vs. Powershell or WSH PHP vs. ASP Windows DW Stack may not be complete– Management– Metadata– Flexibility Do not try to build a system whose complexityexceeds the organization's capabilities to deliverConfidential and Proprietary22
What is the best solution on a budget? Probably something in betweenPlatform (don’t forget virtualization in development)Database and Storage ArchitectureMiddlewareData Profiling and Quality ToolsConfiguration Management and ALMTest Automation and Continuous Integration Cruise Control NANT MAVEN– Reporting and Information Delivery Reporting Services Excel (Server based – zero footprint)––––––Confidential and Proprietary23
Inexpensive, high quality substitutesAlternatives to mainstream thoughtConfidential and Proprietary24
Zenoss Core - monitoring and systems managementConfidential and Proprietary25
Puppet – Automated Systems AdministrationConfidential and Proprietary26
Subversion – Version ControlConfidential and Proprietary27
Maven and Eclipse – Build and Manage ProjectsConfidential and Proprietary28
Pentaho (BI-Suite)Confidential and Proprietary29
JaspersoftConfidential and Proprietary30
TalendConfidential and Proprietary31
INFOBrightConfidential and Proprietary32
Protégé and the Essential Architecture ProjectConfidential and Proprietary33
DB Designer 4Confidential and Proprietary34
A word about acceleratorsConfidential and Proprietary35
Wherescape REDConfidential and Proprietary36
Wherescape REDConfidential and Proprietary37
MethodologyAlong the way Confidential and Proprietary38
MIKE2.0 (Methodology)Confidential and Proprietary39
Comprehensive Process ModelsConfidential and Proprietary40
Self documentingConfidential and Proprietary41
Questions and reference links Wherescape REDhttp://www.wherescape.com/home/home.aspx Talendhttp://www.talend.com/index.php Essential Projecthttp://www.enterprise-architecture.org/ Mike 2.0http://mike2.openmethodology.org/ Pentaho BI Enterprise Suitehttp://www.pentaho.com/ nfoBrighthttp://www.infobright.com/InfoBright
Questions and reference links JasperSofthttp://www.jaspersoft.com/ John Willis: IT Management and Cloud 0-million-dollars-while-staring-intothe-abyss/ Cruise Controlhttp://cruisecontrol.sourceforge.net/ Mavenhttp://maven.apache.org/ NANThttp://nant.sourceforge.net/ Subversion, Puppethttp://subversion.tigris.org/, http://reductivelabs.com/trac/puppet/
Data Warehouse on a BudgetHow to really do more with lessThank You
Data Warehouse on a BudgetHow to really do more with lessMr. Parnitzke is a hands-on technology executive, trusted partner,advisor, software publisher, and widely recognized databasemanagement and enterprise architecture thought leader. Over his careerhe has served in executive, technical, publisher (commercial software),and practice management roles across a wide range of industries. Now ahighly sought after technology management advisor and hands-onpractitioner his customers include many of the Fortune 500 as well asemerging businesses where he is known for taking complex challengesand solving for them across all levels of the customer’s organizationdelivering distinctive value and lasting :Applied Enterprise Architecture (pragmaticarchitect.wordpress.com)The Corner Office (cornerofficeguy.wordpress.com)Data management professional (jparnitzke.wordpress.com)Essential Analytics (essentialanalytics.wordpress.com)The program office (theprogramoffice.wordpress.com)Confidential and Proprietary45
Open Source (Pentaho, Jaspersoft, and Infobright) – Validated the market for open source BI reporting and ETL tools – Good, special purpose tools in the right hands (Talend) Alternatives – Wherescape RED – Special Purpose Tools (SeeWhy, Pervasive) The choices. Confidential and Proprietary 19 Labor intensive Subject to Vendor Driven Architecture (VDA) Expensive .