AMD On AMD: Production Consolidation Using VMware And The AMD Opteron .

Transcription

AMD on AMD:Production Consolidation using VMwareand the AMD Opteron ProcessorMichael WinslettManager of AMD IT Architecture

AMD: A Leader in InnovationAMD designs and producesinnovative microprocessorsand low-power processorsolutions for the computer,communications, andconsumer electronicsindustries.Founded: 1969Headquarters: Sunnyvale, CAEmployees: 10,750 worldwideSales Mix: 78% international2005 Revenue: 5.8 billion2006 Revenue:Q1 1.3 billionQ2 1.2 billion

AMD: A Global EnterpriseFrimley, UKLongmont, ColoradoMoscow, RussiaBeijing, China- Greater China HQ- European Service CenterDresden, Germany- Design CenterBoston, Massachusetts- Boston Design Center- Fab 30, Fab 36 & Fab 38Microprocessors- Dresden Design CenterSeoul, KoreaTokyo, JapanAustin, Texas- Marketing, Ops, DesignSunnyvale, California- AMD global HQMexico City, MexicoBangalore, India- AMD India HQ- Engineering CenterPenang, MalaysiaSao Paulo, Brazil- Microprocessor AssemblyTest and PackagingSuzhou, China- MicroprocessorAssembly and TestSingapore- Microprocessor Test, Markand Pack and many sales locations Worldwide

AMD: Market Momentum12 consecutive quarters in which the year-over-yearmicroprocessor sales growth exceeded 20%26% of world wide x86 serverprocessor market in Q206*37% of the world wide and47% of the USA 4-socketserver business**AMD-based computersoffered by leadinghardware providers,including Acer, Dell, Fujitsu,Fujitsu Siemens, HP, IBM,Lenovo, Sun, and NECAMD Opteron Processor Momentum30%20%15%10%5%0%* Mercury Research**Gartner26%25%4Q032Q044Q042Q054Q052Q06

Why AMD Launched the AMD on AMD ProjectMany older servers nearing end-of-life and ready for replacementEven though these servers utilized older technology, they were stillin many cases significantly under-utilizedMany servers not based on AMD Opteron processor technologyDesire to modernize this environment while moving to an AMDbased platformPower, space, and cooling challenges in the datacenterOur main corporate datacenter was essentially fullSqueezing more servers into the datacenter through increaseddensity would require substantial investment in our power andcooling infrastructureConsolidation addressed both of these issues

About AMD ITApproximately 150 AMD employees globallyMost operational services in the US provided through our co-sourcingpartner, HCL Technologies, Ltd.Centralization of global servicesRegional teams supporting operations and region-specific solutionsTeam locationsAustin, TXSunnyvale, CADresden, GermanySingaporePenang, MalaysiaSuzhou, ChinaFrimley, UKMarkham, Canada

Why AMD Went VirtualAlthough consolidation can be accomplished in the physical server world,this poses significant challenges and risksRe-architecting applications to share serversLack of flexibility in balancing loadDifficulty in recovering from server failuresDifficult to manage resource consumption among multiple applicationsVirtualization provided many benefitsAllowed consolidation while preserving our current applicationarchitectureTransformed to physical servers to commodity computing engines No application-specific configuration exists on the physical boxes,hence they are very easy to replicate and replaceIncreased redundancy for most systemsQuick and easy server provisioning – going from weeks to minutes

AMD on AMD ObjectivesShowcase an enterprise-class virtual server infrastructure based on theAMD Opteron processorObtain efficiencies and benefits resulting from virtualization and consolidationReduced power and space consumptionStandard design and tools used globally Initial design developed as part of the implementation in Austin, Texas. This design was replicated to AMD sites in Sunnyvale, Dresden, Singapore,Penang, and SuzhouReduced per-server cost and labor requirementsImproved hardware resource utilizationAccelerated server provisioning from weeks to minutesImproved overall availabilityImproved flexibility for Disaster RecoveryElimination of non-AMD-based serversComplete the project worldwide by the end of 2006

Project MethodologyWe identified a partner with a proven track record in large-scaleconsolidation using VMware: RapidAppRapidApp follows a consistent 3-stage methodology for consolidationprojects based on VMware – Design, Planning, and Implementation

Project MethodologyDesignSurveyed the server landscape in Austin AMD provided server hardware specifications, CPU utilization,memory utilization, and I/O rates for all systemsFactors in selecting candidate systems Type and age of physical server Resource utilization Benefits the server will realize from virtualization (e.g. automaticrecovery from hardware failures)RapidApp developed a paper design to support the identifiedcandidate systems and to provide a VMWare ESX infrastructurebased on best practicesWorking together, we identified the key management processes thatneeded to be developed to operate a large production ESX 3.0environment successfully

Project MethodologyPlanningWith the design as a roadmap, RapidApp worked with AMD todevelop a plan for implementationThe plan considered available resources, skills, acceptable downtimes, costs, and overall project objectivesWe established evening hours each week Monday-Thursday astimeslots for migrations, but we did not assign down times to specificsystems at this stageThe output was a detailed project plan, identifying resources required,and an estimated cost for implementationCost included both the hardware and software licensing costs as wellas RapidApp consulting resources to assist in the implementation

Project MethodologyImplementationImplementation included three major phases Process development and testing Migration of non-production systems on ESX 3.0 releasecandidate code Migration of production systems on ESX 3.0 production codeChose to engage with RapidApp to provide resources and expertisefor the migration to augment the AMD teamUtilized the period before production ESX 3.0 code was available torefine the management processes for a virtual environmentAdvertised the available migration timeslots and allowed applicationowners to sign up for a timeslot on a first-come, first-served basisMigrated approximately 25 non-production systems using ESX 3.0RC codeUpgraded to ESX 3.0 production code and continued migrations ofproduction systems

Why VMware ESX 3.0?We wanted to be on a current VMware release when the project wasdone. We did not want to be facing a significant upgrade in the nearterm once the project was finishedSince we did not have an existing virtual environment to migrate to 3.0,the risk of using the newer release was minimalWe wanted the improved VMFS locking capabilities in 3.0 that supportlarger ESX farmsWe wanted automatic recovery of VMs from aVMware ESX host failureWe wanted the larger memory size allowed for VMs

Infrastructure SizingInfrastructure sizing was based on detailed analysis of observedsystem performance and resource utilization on the original physicalsystems and the desired consolidation ratios and cost efficiency wewanted to achieveAnalysis accounted for capability differences between different classesof serversExperience during implementation was consistent with what theanalysis predictedVery few performance problems once systems were migrated to VMsIn the few cases where problems arose, the cause was always achange in resource utilization that occurred after our initial analysisWe added a checkpoint in the migration process to avoid this issue

ESX Server Hardware SelectionThe HP DL585 quad-processor, dual-core AMD Opteron processorbased server was a “sweet spot” for both consolidation ratio andcost efficiencyLarge enough to achieve a substantial reduction in servers (potentiallygreater than 30:1 consolidation ratio based on our average VM size)Not so large that it posed excessive risk from a server failureWe had an excellent track record with this server in traditional physicalserver rolesThe HP DL385 dual-processor, dual-core AMD Opteron processorbased server was chosen for a separate smaller farm in ourinternet DMZSmaller number of virtual servers in this environment were wellsupported by the smaller systemThe DL385 incorporates appropriate hardware redundancy for aserver in this roleWe chose not to use blade servers because they did not support thenumber of networking and SAN ports we required

Why AMD64 Technology?Enhances performancewhile offering theflexibility to supportboth 32- and 64-bitapplicationsAssists data centersin controlling powerconsumption andheat output64-bit andMulti-CoreDirect ConnectArchitectureEliminates the 20-year old frontside bus, increasing systemefficiency and scalabilityPerformanceper-wattAMD VirtualizationIncreases utilization by enablingthe running of separate, secureoperating environments

ESX Server ConfigurationsProduction/Internal ESX ServersHP DL585 ChassisQuad Processor configuration using AMD Opteron 875s48 GB of Physical Memory2 - On-board 1000Mb Network Cards3 - NC7170 PCI-X Dual Port Network Cards2 - Single Port FC HBAs QLogic QLA2340Expected VM capacity between 24 and 38 VMs per hostDMZ ESX ServersHP DL385 ChassisDual Processor configuration using AMD Opteron 280s12 GB of Physical Memory2 - On-board 1000Mb Network Cards1 - NC7170 PCI-X Dual Port Network CardsRedundant power supply and fan configurationExpected VM capacity between 10 and 16 VMs

Network and SAN Connectivity (Prod)8 Total Physical NICs2-Production VMs2-Test/Dev VMs2-ESX Console1-VMotion1-AvailableConnections splitamongst physicalcards and PCIBuses2 QLogic HBAsConnections to twofiber switches4 “paths” to eachSAN LUN

Network and SAN Connectivity (DMZ)4Total PhysicalNICs2-DMZ VMs1-ESX Console1-VMotionConnections splitamongst physicalcards and PCIBuses2 QLogic HBAsConnections totwo fiber switches4 “paths” to eachSAN LUN

Processes and ProceduresMore than a dozen documented processes/procedures created to managethe new environment including:Automated ESX Server builds using AltirisVM request and deployment processHost and VM recovery procedurePatching and patch testing processes for hosts and VirtualCenterWeekly and Monthly preventative maintenance tasks for ESXSnapshot process for VMs including standard SLAs for snapshot deliveryQA Checklists for both Host and VM buildsProcess / procedure to create and update VM templatesin the environmentProcedures for granting and denying access to the MGMT toolsMonitoring configuration standards for VMs and HostsVMFS/LUN provisioning process

Tools Used in Managing the EnvironmentExtensive use of VMotionDistributed Resource Scheduler (in manual mode)Distributed Availability ServicesESX hosts are monitored via VirtualCenterVirtual Machines are monitored via HP OpenView agentVeritas NetBackup used for backing up VMs at the VM levelConsidering more advanced backup processes as tools arereleased for ESX 3.0Tool for Physical to Virtual migration (P2V): UltimateP2V

Consolidation AchievedIn Austin, 117 servers consolidated to 7 active ESX 3.0 servers plus 2 swing serversIn Sunnyvale, approximately 33 servers consolidated to 2 active ESX 3.0 serversplus 1 swing serverNew VMs have been added in Austin to bring the total number to 180Overall Physical/Virtual ratio between the two sites23:1 (not including swing servers)17:1 (including swing serverConsolidation ratios will increase as new VMs are added to the environment

Power Savings (Including Cooling)Projected power reduction in Austin 00.0079% reduction in power consumption 69K/year in power savings in AustinEstimated 100K/year in total powersavings globally.Power Consumption Before(kilowatts)Power Consumption After(kilowatts)

AMD on AMD – Financial AssumptionsAnalysis covers B3 Data Center (Phase 1 of Project)Consolidation (Virtual machines to Physical Server): 22 to 1Capacity to allow for 200 Virtual Machines (135 in initial scope)Hardware Refresh cycle every 3 years, one third per yearMoving to virtual environment will free up B3 power capacity(cost avoidance)Cost opportunity to reduce B3 Data Center Kilowatt consumptionCost Opportunity for reduced support costs (TBD)Internal discount rate: 13%

AMD on AMD – Financial Summary3 Year Net Present Value: 1.7MPayback Period: 1 YearTimeframeSavings AreaFinancial ImplicationYear 1 - 2006Purchase of less servers to meet AMD onAMD goalReduced capital request in 2006Year 2 - 2007Purchase of less physical servers forincreased capacity requirementsReduction in 2007 capitalrequest for server capacityYear 2 - 2007Cost Avoidance – B3 DC CapacityNo increase in B3 DC powercostsYear 4 - 2009Reduced cost for server refreshReduced capital request in 2009On GoingReduced B3 DC Support CostsReduction in DC supportexpense

Other BenefitsImproved redundancy for many systemsAutomatic recovery of VMs from ESX host hardware failuresGreater flexibility in Disaster Readiness optionsAssuming replicated data, ESX servers at a DR site could be quicklyrepurposed to run mission-critical systemsImproved standardization of management processes

Next StepsWe expect to complete rollout worldwide by the end of 2006We will utilize this design in ATI datacenters as we work to integrateAMD and ATIWe will continue to track developments in the virtualization space andincorporate those into our standard design as appropriate

ConclusionAMD on AMD has been very successful for usSignificant consolidation achievedRelieved power and cooling stress in Austin Without this program, we would have had to do significantdatacenter infrastructure upgradesAchieved the expected benefits of virtualization Improved redundancy and DR flexibility Greatly reduced time to provision servers Improved efficiency of Operations staffFinancial ROI 2006 Advanced Micro Devices, Inc. All rights reserved.AMD, the AMD arrow logo, AMD Opteron, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. All other names used in thispresentation are for identification purposes only and may be trademarks of their respective companies.

Presentation DownloadPlease remember to complete yoursession evaluation formand return it to the room monitorsas you exit the sessionThe presentation for this session can be downloaded r the following to download (case-sensitive):Username: cbv repPassword: cbvfor9v9r

Analysis covers B3 Data Center (Phase 1 of Project) Consolidation (Virtual machines to Physical Server): 22 to 1 Capacity to allow for 200 Virtual Machines (135 in initial scope) Hardware Refresh cycle every 3 years, one third per year Moving to virtual environment will free up B3 power capacity (cost avoidance) Cost opportunity to reduce B3 .