InfoSphere Warehouse 10 - NDM

Transcription

IBM SoftwareWhite PaperIBM InfoSphere WarehouseDeliver actionable, real-time operational business insightsInformation Management

2IBM InfoSphere WarehouseAccess to timely, accurate information is critical for enterprisesthat are striving to better serve their customers, beat thecompetition and foster innovation. Limited by inflexiblelegacy data warehouse and incomplete business intelligence(BI) solutions, IT teams are often challenged to meet businessusers’ information needs. The result is a proliferation ofisolated data marts and data warehouses, which often containduplicate, incomplete or conflicting data.Worse yet, most BI solutions built on legacy data warehousesare a patchwork of components from multiple vendors, creatingmany integration and enterprise standardization hurdles.Downsides can include frustrated end users, significant delaysin getting new reports, high development and maintenancecosts and poorly performing solutions that do not scale. ITteams must be able to support business requirements for avariety of end users while containing costs, accommodatingever-increasing data volumes and dealing with the constantlychanging needs of different business constituents.IBM InfoSphere Warehouse software provides a powerfulrange of capabilities that go beyond those of traditionalwarehouses. This comprehensive platform integrates thestrength of the IBM DB2 database with a dynamic datawarehousing infrastructure.Comprehensive data warehouse solutionsfor enterprise and departmental needsIBM InfoSphere Warehouse offers extended features fromdesign to analytics and visualization. Multiple editions areavailable to match the specific business needs of small, midsizeand large organizations (see Table 1).Table 1: Products and components included in InfoSphere Warehouse editionsInfoSphere nterpriseEditionDeveloperEditionDB2 Enterprise Server Edition DB2 Database Partitioning Row and Column Access Control Label-Based Access Control DB2 Storage Optimization Feature Continuous Data IngestMulti-temperature Storage SQL Warehousing Tool (SQW) Cubing Services IBM Cognos 10 Business Intelligence Intelligent Miner Text Analytics Time Travel QueryIncluded Components

Information ManagementTable 1: Products and components included in InfoSphere Warehouse editions (continued)InfoSphere nterpriseEditionDeveloperEditionIBM WebSphere Application Server WebSphere (MQ) Services Federation High Availability Disaster Recovery(including multiple secondaries) IBM Tivoli System Automation Advanced Copy Services InfoSphere Optim Performance ManagerExtended Edition InfoSphere Optim Design Studio IBM Data Studio InfoSphere Federation Server(DB2, Oracle, Informix) SQL Replication Workload Management Homogeneous Q-Replication(3 DB2 for LUW databases)* InfoSphere Data Replication* InfoSphere Federation Server (use with DB2for z/OS , IBM i, non-IBM data sources)*** InfoSphere Optim Query Workload Tuner InfoSphere Optim Configuration Manager DB2 Recovery Expert DB2 Merge Backup InfoSphere Optim High Performance Unload InfoSphere Data Architect ** **InfoSphere Warehouse PacksCustomer Insight Market and Campaign Insight Supply Chain Insight * InfoSphere Warehouse editions include queue-based replication only and allow up to three DB2 servers; connection to IBM System z requires separate license; any otherreplication requires separately purchasing InfoSphere Data Replication.** Limited to 10 users in Data Architect.*** Requires DB2 Connect; may purchase DB2 Connect Enterprise Edition instead of purchasing IBM Federation Server.3

4IBM InfoSphere WarehouseWith advanced capabilities such as online analytical processing(OLAP), data mining and text analytics, InfoSphere WarehouseEnterprise Edition is well-suited for real-time enterpriseanalytics. Performance optimization and compression featurescan make building and managing a large data warehouseaffordable and help significantly reduce the cost of ownership.End-to-end tooling capabilities enable architects andadministrators to efficiently design, deploy and maintain anenterprise data warehouse.Departmental and mid-market users can benefit from theindustry-leading capabilities of InfoSphere Warehouse whileretaining the ability to choose pricing and packaging optionsthat are appropriate for smaller implementations.Enhanced functionality: InfoSphere WarehouseAdvanced EditionsInfoSphere Warehouse Advanced Editions, which areavailable in Enterprise and Departmental Editions, deliver acomprehensive set of InfoSphere Warehouse Packs for simpledeployment of leading prebuilt industry models and IBMCognos Business Intelligence reports. This functionalityenables organizations to easily understand and gain insight intotheir markets, campaigns, customers and supply chains.Simplified deploymentTo speed and simplify deployment, InfoSphere WarehouseDepartmental Edition is available on a virtual image that canbe deployed on any VMware server platform. IT teams canbypass install and setup activities, reduce time-to-value andfree database administrators (DBAs) to focus on higher-valueprojects. The virtualized InfoSphere Warehouse DepartmentalEdition provides the flexibility to manage fluctuation in databaseactivity. IT teams have the freedom to adjust processors, datastorage capacity and hardware quickly and easily to adapt tochanging business needs.InfoSphere Warehouse Packs help speed deploymentInfoSphere Warehouse Packs are recent additions to theIBM Industry Models portfolio, which provides structuredand deployable business content for a growing number ofindustry initiatives. These optional offerings contain physicaldata models, data mining examples and sample reportsbased on industry-specific business issues that helporganizations compress project timelines and reduce costand risk compared with custom, in-house data warehouseprojects. The following packs are currently available: InfoSphere Warehouse Pack for Customer InsightAdditional management and architecture tools included in theAdvanced Enterprise Edition can help organizations deploy,manage, grow and architect increasingly business-critical andcomplex warehouse environments. InfoSphere Warehouse Pack for Market and CampaignInsight InfoSphere Warehouse Pack for Supply Chain InsightInfoSphere Warehouse Advanced Editions include allthree InfoSphere Warehouse Packs; organizations canuse as many packs as they wish to help accelerate thedeployment of their data warehouses.

Information ManagementStreamlined terabyte pricingInfoSphere Warehouse is available under a simple, terabytebased pricing structure that allows organizations to pay onlyfor the compressed user data stored in the warehouse. Aswarehouse data volume grows, organizations simply purchaseadditional warehousing capacity (see Table 2). This enablesthem to exploit new hardware innovations and processortechnologies without affecting software costs. The deepcompression capabilities of InfoSphere Warehouse furtherincrease the value of this pricing option.A powerful database foundationInfoSphere Warehouse is powered by the IBM DB2 forLinux, UNIX and Windows (DB2 for LUW) data server.With its massively scalable, shared-nothing architecture,DB2 provides high performance for mixed-workload queryprocessing against both relational and native XML data.Advanced features such as data partitioning, compression,multidimensional clustering (MDC), materialized querytables (MQT) and OLAP capabilities make DB2 a robustengine for dynamic warehousing.Table 2: Warehouse capacity and processing informationInfoSphereWarehouse itionDeveloperEditionProcessor limit16 coresLimited to number ofprocessors in 4 sockets*UnlimitedUnlimitedN/ALimited-use socket limit4 sockets4 socketsN/AN/AN/ATerabyte limit***15 TB15 TBUnlimited**Unlimited**N/ADB2 Instance Memory limit64 GBUnlimited*UnlimitedUnlimitedN/APlatformsLinux, AIX, HP-UX, Solaris, Windows* User data must be spread across two or more active data partitions when using more than 16 processor cores and/or 64 GB of instance memory.** When licensed using the terabyte pricing, user data must be spread across two or more active data partitions.*** Terabyte limit is based on the actual user data, and licensing is per database.5

6IBM InfoSphere WarehouseOutstanding scalability and high performanceInfoSphere Warehouse provides advanced capabilities for datapartitioning, giving IT users multiple ways to distribute dataacross servers for large-scale parallelism and linear scalability.The shared-nothing architecture of DB2 helps ensure thatperformance will not degrade as the warehouse grows. Andbecause InfoSphere Warehouse can physically cluster data onmultiple dimensions, order data by value range and limit I/Oto relevant data partitions, it helps reduce the work needed toresolve many queries.Database partitioning for massive parallel processingInfoSphere Warehouse transparently splits the database acrossmultiple partitions and uses the horsepower of multipleservers to satisfy requests for large amounts of information.SQL statements are automatically decomposed into subrequests that are executed in parallel across the partitions.Results of the sub-requests are joined to provide final results.Table partitioning for efficient, flexible administrationTable partitioning offers easy roll-in and roll-out of table data,flexible index placement and efficient query processing.Rolling in partitioned table data allows a new range to beeasily incorporated into a partitioned table as an additionaldata partition. Rolling out partitioned table data enables dataranges to be efficiently separated from a partitioned table forsubsequent purging or archiving.Table partitioning enhances the flexibility of table-leveladministration by allowing administrative tasks to beperformed on individual data partitions. These tasks includedetaching and reattaching a data partition, backing up andrestoring individual data partitions and reorganizingindividual indexes. Time-consuming maintenance operationscan be streamlined by breaking them down into a series ofsmaller operations. For example, backup operations can workdata partition by data partition when the data partitions areplaced in separate table spaces.MDC for enhanced query performanceMultidimensional clustering provides a flexible method tocontinuously and automatically cluster table data in multipledimensions. This clustering reduces the amount of I/Orequired. In addition, it helps reduce the need for databasemaintenance activities such as reorganization.DB2 Adaptive Compression for optimized storageefficiency and reduced costsUsing InfoSphere Warehouse, organizations can leveragestorage optimization technology in DB2 that is designed tosignificantly reduce disk space requirements and improvequery performance. The InfoSphere Warehouse featuresallow organizations to automatically compress additional typesof data, further reduce storage requirements and providequick access to data from disk. With DB2 indexes andtemporary tables can be automatically compressed to reducestorage costs—an especially useful feature in large datawarehouse environments.Adaptive Compression can also help reduce storage costsand improve performance, especially for large I/O-boundwarehouse applications and query workloads. Data rowcompression contributes to storage space savings and helpsreduce I/O overhead. At the same time, the stored pages are

Information Managementcompressed, which further enhances the compression ondisk. Also, because data is compressed, more rows can becached in the database’s buffer pool to improve query responsetime and DBAs do not need to perform REORG operationsas frequently.Advanced workload managementInfoSphere Warehouse workload management capabilities aredesigned to enable real-time delivery of business insightswithout compromising performance. With traditional servers,the strain of mixed workloads can inhibit the delivery ofinformation to a broad set of users and applications. With theadvanced workload management provided by InfoSphereWarehouse, DBAs can establish and enforce service levels forend users by prioritizing queries from different users andapplications and then controlling the number of underlyingresources dedicated to those processes.Continuous Data Ingest for data loadingwithout downtimeDB2 enables organizations to transparently load data fromexternal sources into InfoSphere Warehouse databases withoutdowntime—even allowing real-time business analysis anddecision making during the loading process. The ContinuousData Ingest feature allows IT departments to load data acrossmultiple agents at the same time, while also dynamicallyswitching between the various external load sources to helpmaximize resource utilization. Continuous Data Ingesteliminates the data ingest latency created by batch-loading dataon infrequent schedules, and is a significant feature for businessusers that need current operational data in the warehouse.Time Travel Query for historical analysisThe Time Travel Query feature in DB2 supports fast andeasy time-based trend analysis and applications. Through thestandard query language, users can query temporal tables todetermine the changes occurring over a range of time. As thedata ages and becomes less relevant, it can be removed fromthe database.Enhanced BI query performanceInfoSphere Warehouse incorporates significant enhancementsfor BI style queries. The addition of a zigzag join featuremeans DBAs can significantly reduce the time for complexmultidimension business queries by nearly three times.1Enhanced query joins and optimizer enhancements help tofurther increase performance of other analytic queries,reducing the need for additional indexes.Exceptional securityThe InfoSphere Warehouse Advanced Security feature isdesigned to help critical systems maintain security for theirkey data. It lets DBAs decide who has write access and whohas read access to individual rows and columns in any giventable based on either users or groups. Once the label-basedaccess control (LBAC) rules are defined, data access controlis managed by the DB2 data server and is completelytransparent to end users. In addition, the Advanced Securityfeature enables DBAs to determine how that data will bepresented to an application or user. This allows for differentviews of the same data, but with various output masks tosecure sensitive data.7

8IBM InfoSphere WarehouseHigh availabilityMultidimensional data analysisInfoSphere Replication Server technology is included in alleditions of InfoSphere Warehouse. Organizations lookingto provide active/active availability can use bidirectionalQ-replication between a pair of source and target DB2 forLUW data servers.The Cubing Services for OLAP feature enables multidimensionaldata analysis without extracting data from the warehouse.InfoSphere Warehouse includes native support for theMicrosoft PivotTable Service, enabling ad hoc analyses ordelivery of standard spreadsheet reporting—all while workingwithin the Microsoft Excel application. In addition, CubingServices cubes are first-class data providers to the IBM Cognosplatform. The entire suite of Cognos clients and applicationscan leverage these powerful warehouse-based data cubes.End-to-end support for XML documentsFor many organizations, the ability to effectively integrate XMLinto their information management environment has become abusiness necessity. With InfoSphere Warehouse, organizationscan now manage and analyze large volumes of XML data thatwere previously locked away in transactional systems. IBM DB2pureXML technology supports XML data in partitionedtables, MDC tables, declared temporary tables, user-definedfunctions and partitioned database environments.Embedded data mining capabilitiesAn MQT is a pre-summarized, pre-aggregated table thatstores query results as data. The DB2 software optimizertransparently redirects queries from base tables to matchingMQTs, helping to improve the performance of complexaggregate queries.Unlike solutions that require end users to extract data fromthe warehouse, independently analyze it and then send theresults back to the warehouse, InfoSphere Warehouseprovides embedded data mining, modeling and scoringcapabilities. These capabilities enable business users to workwith current data and deliver analytics in real time, helpingthem quickly discover revenue opportunities. InfoSphereWarehouse supports standard data-mining model algorithmssuch as clustering, associations, classification and prediction;additional algorithms may be imported in industry-standardPredictive Model Markup Language (PMML) format.Analytics for discovering hidden insightsPowerful text analytics for deep insightFast performance for complex queriesEmbedded analytics capabilities deliver a set of sophisticated yeteasy-to-use tools within the data warehouse. These tools providevaluable business intelligence to a wide pool of end users.Most BI solutions cannot access the majority of informationcaptured across the organization, such as call center notes,customer feedback and free-form text fields, along withdocuments and web pages. InfoSphere Warehouse supports theanalysis of previously untapped unstructured data, helping toprovide additional insights into customer and product issues.

Information ManagementEnd-to-end tools for enhanced flexibilityand process efficiencyInfoSphere Warehouse provides a set of tools that helpsimplify data warehouse and analytics development anddeployment. These interfaces enable users to design thewarehouse and populate data structures, as well as performanalytics and manage data mining and multidimensionalcubing through common interfaces.Support for enterprise mashupsInfoSphere Warehouse includes an enterprise mashup platformbased on IBM Mashup Center. The mashup platform providesa foundation that meets the analytical and visualization needsof small organizations, as well as departments and enterprisebusinesses. Mashup capabilities empower knowledge workers touncover new business insights by easily assembling informationfrom the data warehouses with multiple sources in eye-catchingweb applications.Rich business intelligence capabilitiesIBM Cognos Business Intelligence allows business users toevaluate a rich set of BI capabilities without incurring upfrontcosts. Business users can easily access data from their datawarehouse; reporting and analysis features allow them to deliverrelevant information how, when and where it is needed. Theweb-based user interface, enterprise-class service-orientedarchitecture (SOA) foundation and the ability to access any andall data sources enable business users to easily develop and deployreports on the data assets within the warehouse. Combined withthe InfoSphere Warehouse Packs (available in the AdvancedEditions), Cognos Business Intelligence provides a quick wayto deploy warehouse reporting and gain rapid value andinsights from data.9Built-in tools for data modelingDesign Studio provides a graphical user interface that enablesarchitects to design, model, reverse engineer and validatephysical database schemas. Design Studio is based on IBMInfoSphere Data Architect software and can import andexport models from a variety of sources, including CA ERwin.The SQL Warehousing Tool enables DBAs to prepare andpopulate the data warehouse structures required for datamining, multidimensional analytics and embedded analytics.Data flows, control flows and transformations can be builtusing Design Studio and deployed within the warehouse.InfoSphere Optim tools help DBAs andde

* InfoSphere Warehouse editions include queue-based replication only and allow up to three DB2 servers; connection to IBM System z requires separate license; any other replication requires separately purchasing InfoSphere Data Replication. *