Capacity Planning For Internet Services - DESY - IT

Transcription

Capacity Planning forInternet ServicesQuick planning techniques for high growth ratesby Adrian Cockcroft and Bill WalkerSun Microsystems, Inc.901 San Antonio RoadPalo Alto, CA 94303-4900 USA650 960-1300Fax 650 969-9131Part No. 806-3684-10May 2000, Revision 01

Copyright 2000 Sun Microsystems, Inc., 901 San Antonio Road Palo Alto, CA 94303-4900 USA. All rights reserved.This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation.No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors,if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark inthe U.S. and other countries, exclusively licensed through X/Open Company, Ltd. For Netscape Communicator , the following notice applies:Copyright 1995 Netscape Communications Corporation. All rights reserved.Sun, Sun Microsystems, the Sun logo, SunReady, Solaris, SunPS, Sun BluePrints, Sun Express, Sun Enterprise, Solstice Enterprise Manager,Solaris Resource Manager, Solstice DiskSuite, Sun StorEdge, Sun Enterprise SyMON, Java, Starfire, SunPCi, JumpStart, Solaris JumpStart, andThe Network Is The Computer are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and othercountries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S.and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledgesthe pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sunholds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPENLOOK GUIs and otherwise comply with Sun’s written license agreements.RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Government is subject to restrictions of FAR 52.227-14(g)(2)(6/87) andFAR 52.227-19(6/87), or DFAR 252.227-7015(b)(6/95) and DFAR 227.7202-3(a).DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

Contents1.Introduction12.Theoretical Principles3Performance Management3Layers of Service Architecture5Phases of Performance ManagementBaselining67Load Planning8Capacity Planning9Resource ManagementService Level Agreements910Caveats and Problems with Service Level AgreementsProduction Environment EngineeringOverview111213IT Frameworks13The ISO FCAPS IT FrameworkIT Extended Frameworks15FCAPS IT Extended FrameworkThe IT Service ModelApplying ITIL14161820The SunReady Approach21iii

Monitoring and ManagementMeasurement PrinciplesConclusion3.212324Suggested Processes25Service Level Management26Service Level Agreements (SLAs)Identifying a Service2628The Service Definition28Problem Reporting and EscalationReporting and Review3336Costs, Chargebacks, and ConsequencesArbitration and Conflict ResolutionReassessment and Updating39Inventorying the Enterprise39Baselining Business Services403638Capacity Estimation and Consolidation ProcessesQuantifying Capacity41Consolidating WorkloadsResource ManagementThe Never-Ending CycleSummary4.42434445Scenario Planning47A Recipe for Successful Scenario PlanningModelling Capacity and LoadModelling Load53Modelling CapacityTweaking the ModelivCapacity Planning for Internet Services7467534741

Predicting Tomorrow74Fitting the Model to the Data by TweakingSummary5.75Capacity EstimationOverview747777System Measurement Frames78Sun Constant Performance Metric (SCPM) Value Capacity PlanningAmdahl’s Law82Geometric ScalabilityMeasuring UtilizationSCPM Measurement838687Measuring Disk Storage CapacitySCPM Load Planning9294Workload Characterization95Capacity Planning for Complex Disk SubsystemsCapacity Measurements for Single DisksMeasurements on a Single DiskCached Disk Subsystems969797Cached Disk Subsystem Optimizations105105Host-Based Write Cache Model Interconnect ParametersEstimating Capacity for Complex Disk SubsystemsSingle Disks81106107107Mirrored Disks107Concatenated and Fat Stripe DisksStriped Disk Accesses108108RAID5 for Small Requests109RAID5 for Large Requests110Contentsv

Cached RAID5Cached Stripe111112Capacity Model MeasurementsDisk and Controller CapacityPerformance Factor P112113113Cache Performance Impact FactorsService Time and Cache Hit erations ViewpointManagement ViewpointEngineering ViewpointExample Scenarios117117118118Operations Viewpoint Implementation118Implementing with Sun Management CenterAlert Monitoring with SunMCHandling Alarms in SunMC120121125Key Performance Indicator Plots130Operations Viewpoint Implementations SummaryManagement Viewpoint ImplementationWeekly Summary138Scenario Planning138Weekly Problem Summary140External Monitoring Summary141Engineering Viewpoint ImplementationSE Orcollator LogsviCapacity Planning for Internet Services143135142135

SAS Data Import and AnalysisSummary7.144144Tools Overview and Evaluations145Tools and Products for Performance ManagementServer Consolidation145145Domains and Dynamic ReconfigurationSolaris Resource Manager146146Solaris Bandwidth Manager147Load Sharing Facility and Codeine148Sun Management Center (SunMC)148SunMC Hardware Diagnostic Suite 1.0Sun Configuration & Service Tracker149150BMC Patrol and Best/1 151Foglight Software (RAPS)SAS IT Service Vision (CPE)152153Hyperformix/SES Workbench and StrategizerAurora Software SarCheck154Capacity Planning with TeamQuest ModelCreating the ModelSummary154155167A.Sun Constant Performance MetricsB.ReferencesGlossary153169189195Contentsvii

viiiCapacity Planning for Internet Services

FiguresFIGURE 2-1Four-Phase Approach to Performance Management 6FIGURE 2-2Load PlanningFIGURE 2-3FCAPS High-Level IT Framework 14FIGURE 2-4ISO FCAPS IT Extended FrameworkFIGURE 2-5Drill-Down IT Document Hierarchy 18FIGURE 2-6IT Business Reference Model - I22FIGURE 2-7IT Business Reference Model - II23FIGURE 4-1Example Physical Architecture Sketch 48FIGURE 4-2Example Dataflow Architecture Sketch 49FIGURE 4-3Example Physical Architecture Sketch With Utilizations 50FIGURE 4-4Dataflow Architecture With Latencies and Intensities 51FIGURE 4-5Daily Workload Variations ExampleFIGURE 4-6Chart of Daily Workload Variation FactorFIGURE 4-7Chart of Seasonal Load Variations by Month59FIGURE 4-8Chart of Geometric Exponential Growth Rate62FIGURE 4-9Graph of Marketing Campaign Boost 65FIGURE 4-10Chart of Combined LoadFIGURE 4-11Chart of Application CPU Usage per Transaction 69FIGURE 4-12Chart of Hardware Upgrade Capacity Increase FactorsFIGURE 4-13Chart of Combined Load vs. Capability 73FIGURE 5-1Graph of CPU Utilization Over a 24-Hour Sample 7981656576771ix

xFIGURE 5-2Graph of CPU Utilization by Workload 80FIGURE 5-3Graph of CPU Utilization After Load BalancingFIGURE 5-4Eight-CPU Geometric ScalabilityFIGURE 5-564-CPU Geometric Scalability 85FIGURE 5-6Sample SCPM Process 87FIGURE 5-7Sample Spreadsheet of SCPM Process 88FIGURE 5-8CPU Utilization GraphFIGURE 5-9System Quanta Consumed GraphFIGURE 5-10Upgraded CPU Utilization Graph 91FIGURE 5-11Simple Disk Model 97FIGURE 5-12Disk Head Movements for a Request Sequence 99FIGURE 5-15Two-Stage Disk Model Used by Solaris 2 OE 100FIGURE 5-13Example iostat -x OutputFIGURE 5-14Example iostat -xn OutputFIGURE 5-16SE-Based Rewrite of iostat to Show Service Time CorrectlyFIGURE 5-17Complex I/O Device Queue ModelFIGURE 5-18Single Disk Schematic 107FIGURE 5-19Mirrored Disks Schematic 107FIGURE 5-20Concatenated and Fat Stripe Disks SchematicFIGURE 5-21Striped Disk Accesses SchematicFIGURE 5-22RAID5 for Small Requests Schematic 109FIGURE 5-23RAID5 for Large Requests SchematicFIGURE 5-24Cached RAID5 SchematicFIGURE 5-25Cached Stripe SchematicFIGURE 6-1SunMC ConsoleFIGURE 6-2Load Health Monitor ModuleFIGURE 6-3Host Details WindowFIGURE 6-4SunMC Console8489Capacity Planning for Internet 2

FIGURE 6-5Domain Status Details Window 127FIGURE 6-6Alarms Details Window 128FIGURE 6-7Acknowledged AlarmsFIGURE 6-8CPU User and System Time for a DayFIGURE 6-9Five-Minute Load Average and Number of CPUs Online 131FIGURE 6-10Disk Utilization – Busiest Disk and Average Over All DisksFIGURE 6-11Disk Throughput Read and Write KB/sFIGURE 6-12Network Throughput Over a DayFIGURE 6-13Memory and Swap UsageFIGURE 6-14Memory Demand Viewed as Page Residence TimeFIGURE 6-15Example Management Status Report 137FIGURE 6-16Management Report: Weekly Summary Section 138FIGURE 6-17Management Report: Scenario Planning SummaryFIGURE 6-18Management Report: Weekly Problem Summary 140FIGURE 6-19Management Report: Site Availability and Performance Summary 141FIGURE 7-1TeamQuest Model 155FIGURE 7-2TeamQuest Model: Adjust and SolveFIGURE 7-3TeamQuest Model: Solved with ReportsFIGURE 7-4TeamQuest Model: Steps of Compound Growth ModelFIGURE 7-5TeamQuest Model: Workload Stretch Factor Graph 159FIGURE 7-6TeamQuest Model: Workload Throughput GraphFIGURE 7-7TeamQuest Model: Components of Response Time for Database App2 161FIGURE 7-8TeamQuest Model: System Active Resource Utilization GraphFIGURE 7-9TeamQuest Model: Change CPU Definition 163FIGURE 7-10TeamQuest Model: Upgraded System Stretch Factor Graph 164FIGURE 7-11TeamQuest Model: Upgraded Components of Response Time Graph 165FIGURE 7-12TeamQuest Model: Upgraded System Active Resource 166Figuresxi

xiiCapacity Planning for Internet Services

TablesTABLE 2-1Layers of Service Architecture 5TABLE 2-2IT Extended Framework 15TABLE 2-3FCAPS IT Extended Framework SubdivisionTABLE 3-1Key Performance Indicator Examples 32TABLE 3-2Tool Definitions for Key Performance IndicatorsTABLE 3-3Sample Severity Levels 34TABLE 3-4Example Support CostsTABLE 4-1CPU Peak Load Factor by WeekdayTABLE 4-2Seasonal Load Variations by Month 59TABLE 4-3Geometric Exponential Growth in User Activity 61TABLE 4-4Marketing Campaign Load Boost FactorsTABLE 4-5Combined Load CalculationTABLE 4-6Application CPU Usage per TransactionTABLE 4-7Hardware Upgrade Capacity Increase Factors 70TABLE 4-8Combined Capability Calculation 72TABLE 5-1sar Disk Output1533375764666892xiii

xivCapacity Planning for Internet Services

AcknowledgmentsBill thanks Hattie Hall for her support and encouragement during the writing of thisbook.During the training classes Adrian presented as part of the WICS program atStanford University, he learned a great deal about capacity planning techniques fromDr. Neil Gunther. Members of the Computer Measurement Group, especially thoseat the meetings in the United States, Australia, and Italy, provided inspiration andideas that span the UNIX and mainframe-oriented worlds. Dave Fisk provideddetailed feedback on the disk capacity estimation section. Many thanks to all thosewho listened and provided feedback during presentations.Dave Blankenhorn provided a detailed discussion of Service Level Agreements thatwas incorporated into the principles and processes chapters.SunPSSM talent and the SunReady team were instrumental in helping Bill put histhoughts into words and acting as a sounding board for ideas. Our express thanks toJean-Marc Bernard, Brian Carter, Jefre Futch, and Jeffrey Lucove for their help.Ken Pepple coauthored our technical seminar series on performance management,which was used as a framework for the principles chapter. Jason Fish provided theorganizational talent for the technical seminars, and helped fill classrooms aroundthe world. Bill Bishop and Tom Bankert gave us the latitude to pursue this area indepth, and to spend the time refining the seminar and writing this book.Thanks to David Deeths for the detailed technical review and modifications thatwere instrumental in getting this book completed.Many thanks to Barbara Jugo for project management and to Terry Williams forediting this book.xv

xviCapacity Planning for Internet Services

PrefaceThe Sun BluePrints series provides best practices information in book form for theusers of Sun products. The series looks at the combination of techniques andmethodologies of assembling Sun and third-party products that are needed to solvereal-world problems and business needs.Capacity Planning for Internet Services provides detailed yet concise recipes forperforming capacity planning tasks for high-growth-rate Internet services. Itassumes that there is very little time or expertise available to perform these tasks.Who Should Use This BookThis Sun BluePrints book is primarily intended for Capacity Planners, OperationsManagers, Systems and Database Administrators, Systems Integrators, and SystemsEngineers. It can be used as a first introduction to the subject of capacity planningand performance management in an Internet-oriented environment. The techniquesdescribed are quite generic, and the only areas of this book that are Sun specificrelate to understanding some of the measurements available on the Sun platform. Itdoes not assume any background in capacity planning and avoids the detailedmathematics of queueing theory as much as possible. However, many references areprovided for advanced reading.How This Book Is OrganizedThe two authors of this book bring together a wide variety of experiences to providea practical but innovative guide to the problems of capacity planning in highgrowth-rate Internet environments. Adrian Cockcroft is a Distinguished SystemsEngineer at Sun and is well-known for his expertise and many presentations onperformance tuning and tools. Adrian initiated this writing project, structured andscoped the book, and is primarily responsible for the Scenario Planning andObservability chapters. Bill Walker is one of the most senior members of Sun’sProfessional ServicesSM (SunPSSM) organization. He has worked on many largeInternet sites, currently designs processes and methodologies for the SunPS team toxvii

implement, and regularly presents training classes on performance tuning. Bill isprimarily responsible for the Capacity Estimation, Suggested Processes, and ToolsOverview and Evaluations chapters.Chapter 1, “Introduction,“ is an overview of the reasons why this Sun BluePrintsseries was written and a description of the problems it tries to address.Chapter 2, “Theoretical Principles,” explains some of the underlying principlesinvolved in managing high growth rates and in making trade-offs betweenconflicting requirements.Chapter 3, “Suggested Processes,” describes how to implement processes andprocedures for establishing service level agreements and performing capacityplanning.Chapter 4, “Scenario Planning,” introduces a simple way to identify your primarybottleneck, perform capacity estimation, define future scenarios, and performspreadsheet-based capacity planning.Chapter 5, “Capacity Estimation,” describes how to examine CPU, memory, disks,and networks to determine the capacity available and the utilization of that capacity.A closer examination of the problems in obtaining capacity and utilization forbottleneck estimation is also presented.Chapter 6, “Observability,” looks at the reporting requirements for a site in termsof the needs of operations, engineering, and senior management.Chapter 7, ”Tools Overview and Evaluations,” looks at the many tools available forcapacity planning, and contains a detailed look at the TeamQuest tools that theSunPS team uses to perform capacity planning studies.Appendix A, “Sun Constant Performance Metrics,” provides Sun ConstantPerformance Metrics (SCPM) estimation tables for Sun servers.Appendix B, “References,” provides a list of publications and Web sites for findingout about more advanced techniques, and contacting tool vendors.Related BooksThese books provide relevant background material; for full details and moresuggestions, see the references in Appendix B.xviii The Sun BluePrints OnLine Web site:http://www.sun.com/blueprints/online.html The Practical Performance Analyst by Dr. Neil Gunther Configuration and Capacity Planning for Sun Servers by Brian WongCapacity Planning for Internet Services

Sun Performance and Tuning – Java and the Internet by Adrian Cockcroft andRichard Pettit Resource Management, a Sun BluePrints book by Richard McDougall, AdrianCockcroft, Evert Hoogendoorn, Tom Bialaski, and Enrique Vargas Solaris PC Netlink Performance, Sizing, and Deployment, a Sun BluePrints book byDon DeVitt Backup and Restore Practices for Sun Enterprise Servers, a Sun BluePrints book byStan Stringfellow and Miroslav Klivansky, with Michael BartoFor a list of Sun documents and how to order them, see the catalog section of theSunExpress Internet site at http://www.sun.com/sunexpressHow to Access Sun Documentation OnlineThe docs.sun.com Web site enables you to access Sun technical documentationonline. You can browse the docs.sun.com archive or search for a specific book titleor subject. The URL is http://docs.sun.com/.What Typographic Changes MeanTABLE P-1 describes the typographic changes used in this book.TABLE P-1Typographic ConventionsTypeface orSymbolMeaningExampleAaBbCc123The names of commands, files,and directories; on-screencomputer outputEdit your.login file.Use ls -a to list all files.machine name% You have mail.AaBbCc123What you type, contrasted withon-screen computer outputAaBbCc123Command-line placeholder;replace with a real name orvalueTo delete a file, type rm filename.AaBbCc123Book titles, new words orterms, or words to beemphasizedRead Chapter 6 in User’s Guide. Theseare called class options.You must be root to do this.machine name% suPassword:Prefacexix

Shell Prompts in Command ExamplesTABLE P-2 shows the default system prompt and superuser prompt for the C shell,Bourne shell, and Korn shell.TABLE P-2xxUNIX Shell PromptsTypeface or SymbolMeaningC shell promptmachine name%C shell superuser promptmachine name#Bourne shell and Korn shellprompt Bourne shell and Korn shellsuperuser prompt#Capacity Planning for Internet Services

CHAPTER1IntroductionAlmost every business, from a corner shop to a multinational corporation, is facedwith competitive pressure to “go online” and provide services via an Internet site. Inaddition, a large number of new online businesses are being implemented in a maddash to capture the attention and wallets of a huge and fast-growing number ofInternet users. Success is measured by growth in the number of pages viewed,registered users, and in some cases, by the amount of business transacted.Success comes at a cost. Rapid growth can overwhelm the ability of the site toprovide services with acceptable performance. There have been many reports of Websites that have suddenly attracted too many users and collapsed under the strain.Startup dot-com companies spend most of their investors’ funds on advertising asthey attempt to establish their name in the collective consciousness of consumersand the media. Established companies are concerned about maintaining theirpreexisting brand image while gaining credibility as they add Internet services totheir traditional businesses. Therefore, it is important to provide enough capacity tocope with sudden increases in load.Traditional computer installations have a relatively static number of users, and adetailed understanding of their workload patterns can be obtained. Growth rates inthis environment are relatively low, and costs can be optimized by careful capacityplanning. Internet services are available to many millions of potential users of theservice. The load on the service depends upon the whim of the users. If a highprofile advertisement or news item reaches a large number of people, there is a greatopportunity to expand the user base as long as the site can cope with the load.Growth rates for successful sites in this environment are very high, and very hard toplan for. It is normal to lurch from one crisis to the next and to throw hardwarequick fixes at the problem regardless of the cost.Capacity planning is an optimization process. Service level requirements can bepredicted and balanced against their costs. Even if there are few cost constraints, it isimportant to have good estimates of how much spare capacity the site has andwhether it can survive the next load peak.Capacity planning is a well-known discipline, particularly for sites that have amainframe-oriented background. When very high growth rates occur, timeconstraints prevent normal techniques from being applied. This Sun BluePrints book1

charts a course through the available techniques and tools; examines time scales andreturn on investment for different methodologies; provides a framework fordecomposing big problems into solvable subproblems; and gives simple, practicalexamples that provide results in hours thanks to spreadsheet-based techniques. Ifyou wait until you have chosen and purchased an expensive tool, you will then needweeks or months to learn how to use it. These tools are useful and powerful, andtheir use is also described in detail.The topics covered in this book can be divided into the following sections: Principles and processes Scenario planning techniques The effective use of toolsCompared to conventional capacity planning techniques, the Internet servicecapacity planning techniques described in this book must cope with high rates ofchange, work with limited system administrator experience, and steer a paththrough confusing choices and a lack of tools.Because there are also increased availability requirements, it is important to givepriority to simple, common-sense principles that can be followed consistently.2Chapter 1Introduction

CHAPTER2Theoretical PrinciplesWith the exponential growth of the Internet and consumer electronic commerce onthe Internet, service quality has become a key component of success. Electroniccommerce and commercial portals on the Internet expose the business front officeand the related business back office systems to scrutiny by the direct consumer, aswell as by the news media.Sun servers have entered the traditional datacenter environment where systemavailability, manageability, and service availability are key components in providinga solution to business requirements. With an established and encompassing“production environment” life-cycle plan, robust solutions can be safely and reliablyplaced into production and evolved to meet the changing and growing needs of thebusiness.This chapter presents methods for managing performance and establishing servicelevel agreements (SLAs). It also examines the IT frameworks designed to providesolutions for production environment business requirements. Additional ITframeworks from ISO FCAPS (fault, configuration, application, performance, andsecurity) models are presented, and tips are offered for implementation.Performance ManagementPerformance management is the measurement, analysis, and optimization ofcomputer resources to provide an agreed-upon level of service to the end-user. Bydefining performance management and identifying the key components required tosafely and accurately implement performance management in the datacenter, youcan minimize the risks associated with high growth rates. These risks include: System downtime due to unexpected overload Negative customer feedback Loss of potential business due to poor response times Loss of customer loyalty due to perceived lack of service quality3

The key components that we will concentrate on to define our scope of performancemanagement will include: Throughput Latency UtilizationThroughput is defined as the number of defined actions performed in a given periodof time. Latency is defined as the time that it takes to complete a well-defined action.Throughput and latency can be applied at a high level while measuring transactionsat the end-user level. They can also be examined for discrete events such as networkpackets, disk activity, or system centerplane bandwidth consumption. Each of theselevels of detail can be measured, reported, and analyzed for impact on overallsystem performance, provided that you understand the events being monitored andthe capabilities of the resources involved in providing those actions.Utilization is usually expressed as the percentage of the overall capability of a givenresource consumed during a defined action or quantity of actions. Resourceutilization and resource utilization planning are the cornerstones of capacityplanning. Utilization is a measure of system resource impact, throughput defines thequantity of services, and latency defines the quality of the services being provided.4Chapter 2Theoretical Principles

Layers of Service ArchitectureSeveral layers of resources and resource consumption can be defined, tuned,measured, and reported within the service architecture. Categorizing these layers(see TABLE 2-1) and defining the expectations for each level provides the guidelinesfor the design and implementation of a system.TABLE 2-1Layers of Service ArchitectureLayerComponentsBusinessNumber of usersBatch job definitionsReport schedulesBusiness hoursApplicationN-tiered architectureDatabase layoutSoftware architectureAccess methodsOperating SystemKernel tuningOS revisionsDisk volume layoutsHardwareCPUDiskMemoryInfrastructureNetwork architectureEnterprise managementBackup strategiesEach of these layers of the overall service architecture affords opportunities fortuning, measurement, reporting, and management. Each layer will have its ownparticular scale of benefit and investment to introduce change.The Business layer often provides the most significant opportunities for “tuning”and has the most significant contribution to the overall architecture. The Applicationlayer and Hardware layer can also provide a significant and obvious impact on theoverall performance of the architecture. The Operating System and Infrastructurelayers are often where administrators look for some magic cure, but these layersoften provide the least opportunity for impacting the performance of a system.Layers of Service Architecture5

Phases of Performance ManagementPerformance management can be applied in an iterative, cyclic, four-phase approach(see FIGURE ityPlanningFIGURE 2-1Four-Phase Approach to Performance ManagementThe output of each phase is used as input to the following phase. Each phase mustreach a steady state in which the results can be published and transferred to thefollowing phase. Once the four phases are locked in place and all results have beenpublished, those results are fed into the next generation of phases. The next sectionsdiscuss each phase in detail.A relatively simple change and configuration management process integrated intothe performance management deliverables can greatly improve the efficiency andaccuracy of performance management. This change management can be as simple asgenerating and maintaining revision numbers for the documentation and reportsthat are produced.Historical revisions should be retained for future examination and change analysis.These revisions also provide a history of changes to the data analyzed and theresulting conclusions. This compilation of historic data and conclusions can helpreduce repetition of effort and acts as a guide that displays the impacts (bothpositive and negative) of past load planning and capacity planning.6Chapter 2Theoretical Principles

BaseliningBaselining creates a snapshot of a system as it currently exists and generates reportsthat describe the system performance and the characterization of the workload beingmeasured. To baseline a system, we first describe the goals of system performance inthroughput, latency, and utilization within each level of the service architecture.The business requirements of the service being provided by the establishedworkload must first be defined. These end-user service level requirements caninclude: Transaction rates Transactional volumes Hours of operation Critical time frames for processing batch loads Concurrent user session requirementsThese are the same business requirements defined in the SLA, which is explained indetail in Chapter 3, “Suggested Processes.”A configuration inventory establishes a record of the current state of the five layersof the service architecture and provides a reference for modifying that architecture.An accurate representation including hardware, software, and operating systemversions is critical to creating an accurate inventory. This configuration inventory isconsidered “locked down” for the life span of the baselining process. This causes allchanges to the operating environment to be considered a tuning event that expiresthe current system state and triggers a new baselining cycle.At this point, service performance is monitored and measured against the goalsdefined in the SLA. In addition, system performance is monitored and measuredagainst the desired resource consumption thresholds defined in the key performanceindicator (KPI) document, as addressed in Chapter 3.If any tuning opportunities are identified in any of the five layers of systemperformance, the identified changes to the system or architecture are implementedand remeasured against the previous snapshot. This helps to determine the positiveor negative effects of those changes on system performance. The old snaps

Monitoring and Management 21 Measurement Principles 23 Conclusion 24 3. Suggested Processes 25 Service Level Management 26 Service Level Agreements (SLAs) 26 Identifying a Service 28 The Service Definition 28 Problem Reporting and Escalation 33 Reporting and Review 36 Costs, Chargebacks, and Consequences 36 Arbitration and Conflict Resolution 38