Concepts Introduced In Chapter 6 Warehouse-Scale Computers

Transcription

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingConcepts Introduced in Chapter 6IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingWarehouse-Scale ComputersA cluster is a collection of desktop computers or serversconnected together by a local area network (LAN) to act as asingle larger computer.A warehouse-scale computer (WSC) is a cluster comprised oftens of thousands of servers.The cost may be on the order of 150M for the building,electrical and cooling infrastructure, the servers, and thenetworking equipment that houses 50,000 to 100,000 servers.A WSC can be used to provide internet services.introduction to warehouse-scale computingprogramming modelsinfrastructure and costscloud computingsearch - Googlesocial networking - Facebookvideo sharing - YouTubeonline sales - Amazoncloud computing services - Rackspaceand many more applicationsIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingImportant Design Factors for WSCsIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingList of Outages and Anomalies for a Cluster of 2400 ServersWSC goals and requirements in common with servers.cost-performance - work done per dollarenergy e ciency - work done per jouledependability via redundancynetwork I/Ointeractive and batch processing workloadsWSC aspects that are distinct from servers.Ample parallelism is always available in a WSC.Operational costs represent a greater fraction of the cost of aWSC.Customization is easier for the scale of an WSC.Figure 6.1 List of outages and anomalies with the approximate frequencies of occurrences in the first

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingProgramming Models for WSCsIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingMonthly Map Reduce Usage at GoogleMapReduce (or the open source Hadoop) is the most popularframework for batch processing in a WSC.Map applies a programmer-supplied function to each logicalinput record to produce a set of key-value pairs.Reduce collapses these values using anotherprogrammer-supplied function.Both tasks are highly parallel.IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureProgramming Models for WSCs (cont.)Figure 6.2 Monthly MapReduce usage at Google from 2004 to 2016. Over 12 years the number of MapRedjobs increased by a factor of 3300. Figure 6.17 on page 461 estimates that running the September 2016 workloAmazon's cloud computing service EC2 would cost 114 million. Updated from Dean, J., 2009. Designs, lessonadvice from building large distributed systems [keynote address]. In: Proceedings of 3rd ACM SIGOPS Internaon Large-ScaleProgrammingDistributedModelsSystems andwith the 22nd CloudACMComputingSymposium onCloud Computing Workshop IntroductionWSCMiddleware,Architecture Co-locatedWSC InfrastructureOperating Systems Principles, October 11–14, 2009, Big Sky, Mont.Average CPU Utilization of 5000 Servers at Google 2019 Elsevier Inc. All rights reserved.There is often a high variability in performance between thedi erent WSC servers due to a variety of reasons.varying load on serversle may or may not be in a le cachedistance over network can varyhardware anamoliesA WSC will start backup executions on other nodes when taskshave not yet completed and take the result that nishes rst.Rely on data ( le) replication to help with read performanceand availability.A WSC also has to cope with variability in load.serversentire WSCOften WSC services are performed with in-house software toreduce costs and optimize for performance.

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingStorage for a WSCIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingWSC NetworkingA WSC uses a hierarchy of networks for interconnection.The standard rack holds 48 servers connected by a 48-portEthernet switch.A rack switch has 2 to 8 uplinks to a higher switch. So thebandwidth leaving the rack is 6 (48/8) to 24 (48/2) times lessthan the bandwidth within a rack.There are array switches that are more expensive to allowhigher connectivity.There may also be Layer 3 routers to connect the arraystogether and to the Internet.The goal of the software is to maximize locality ofcommunication relative to the rack.A WSC uses local disks inside the servers as opposed tonetwork attached storage (NAS).The Google le system (GFS) uses local disks and maintainsat least three replicas to improve dependability by covering notonly disk failures, but also power failures to a rack or a clusterof racks by placing the replicas on di erent clusters.A read is serviced by one of the three replicas, but a write hasto go to all three replicas.Google uses a relaxed consistency model in that all threereplicas have to eventually match, but not all at the same time.IntroductionProgramming ModelsWSC ArchitectureHierarchy of Switches in a WSCWSC InfrastructureCloud ComputingIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingLayer 3 Network Used to Link Arrays TogetherFigure 6.8 A Layer 3 network used to link arrays together and to the Internet (Greenberg et al., 2009). Abalancer monitors how busy a set of servers is and directs traffic to the less loaded ones to try to keep the se

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingLatency, Bandwidth, Capacity of a WSC Memory HierarchyIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingWSC Locationproximity to Internet backbone optical bersproximity to users of service to reduce Internet access latencyelectricity availability and costproperty tax ratelow risk from environmental disastersstability of countrylow temperature to decrease cooling costncy, bandwidth, and capacity of the memory hierarchy of a WSC (Barroso et al., 2013).this same information.IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingWSC Power and Cooling 2019 Elsevier Inc. All rights reserved.power usage of just the WSC IT equipment33% for processors30% for DRAM10% for disks5% for networking22% for other components within the serversAir conditioning is used to cool server room, requiring10%-20% of IT equipment power mostly due to fans.Chilled water is often used to cool the air, requiring 30% to50% of IT equipment power. Outside cooling towers canleverage lower outside temperature.IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingMeasuring WSC E ciency7powerPower utilization e ectiveness (PUE) is a widely used simplemetric.Median PUE reported in a 2006 study was 1.69.PUE performancetotal facility powerIT equipment powerBandwidth is an important metric as there may be manysimultaneous user requests or metadata generation batch jobs.Latency is also an important metric as it is seen by users whenthey make requests. Users will use a search engine less as theresponse time increases. Also users are more productive inresponding to interactive information when the systemresponse time is faster as they are less distracted.

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingPower Utilization E ciency of 19 Datacenters in 2006IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingAverage PUE of 15 Google WSCs over TimeFigure 6.11 Average power utilization efficiency (PUE) of the 15 Google WSCs between 2008 and 2017spiking line is the quarterly average PUE, and the straighter line is the trailing 12-month average PUE. For Q4the averages were 1.11 and 1.12, urein 2006WSC(GreenbergInfrastructure et al.,Cloud2009).ComputingIntroduction0 PowerutilizationProgrammingefficiencyof 19 dataThe power forairg (AC) and other uses (such as power distribution) is normalized to the power for the IT equipment inthe PUE. Thus, power for IT equipment must be 1.0, and AC varies from about 0.30 to 1.40 times thee IT equipment. Power for “other” varies from about 0.05 to 0.60 of the IT equipment.Negative User Impact of Delays at Bing Search ServerAs the response time (server delay) increases, the increase 2019ElsevierInc. Allrightsreserved.time to the nextclickincreasesevenmoresince users will getdistracted.Revenues will decrease as users become less satis ed and willuse the search engine less.Programming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingGoogle WSC Innovations to Improve Energy E ciency 2019 Elsevier Inc. All rights reserved.Modi ed server containers.11Separated hot and cold chambers to reduce variation in airtemperative, which allows air to be delivered at highertemperatures due to less severe worst-case hot spots.Operating servers at higher temperatures allowed use of coolingtowers instead of the more ine cient traditional chillers.Shrunk distince of air circulation loop to reduce energyrequired to move air.Located WSCs in more temperate climates to allow more useof evaporative cooling.Deployed extensive monitoring to measure actual PUE.Designed motherboards that only need a single 12-volt supplyso that a UPS could be provided using standard batteries witheach server.

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingIntroductionCost of a WSCProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingCase Study for a WSCcapital expenditures (CAPEX)CAPEX is the cost to build a WSC, which includes thebuilding, power and cooling infrastructure, and initial ITequipment (servers and networking equipment).operational expenditures (OPEX)OPEX is the cost to operate a WSC, which includes buyingreplacement equipment, electricity, and salaries.IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureMonthly OPEX for Previous Case StudyCloud ComputingFigure 6.13 Case study for a WSC, rounded to nearest 5000. Internet bandwidth costs vary by application, sare not included here. The remaining 18% of the CAPEX for the facility includes buying the property and the costIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud Computingconstruction of the building. We added people costs for security and facilities management in Figure 6.14, whichnot part of the case study. Note that Hamilton's estimates were done before he joined Amazon, and they are noton the WSC of a particular company. Based on Hamilton, J., 2010. Cloud computing economies of scale. In: PapPresented at the AWS Workshop on Genomics and Cloud Computing, June 8, 2010, Seattle, WA. on GenomicsCloud20100608.pdf.re 6.14 Monthly OPEX for Figure 6.13, rounded to the nearest 5000. Note that the 3-year amortization ofers means purchasing new servers every 3 years, whereas the facility is amortized for 10 years. Thus, thertized capital costs for servers are about three times more than for the facility. People costs include three securityd positions continuously for 24 h a day, 365 days a year, at 20 per hour per person, and one facilities person fora day, 365 days a year, at 30 per hour. Benefits are 30% of salaries. This calculation does not include the cost ofork bandwidth to the Internet because it varies by application nor vendor maintenance fees because they vary bypment and by negotiations.Advent of Cloud Computing 2019 Elsevier Inc. All rights reserved.Cloud computing can be thought of as providing computing asa utility, where a customer pays for only what they use, just aswe do for electricity. Cloud computing relies on increasinglylarger WSCs which provide several bene ts if properly set upand operated.improvements in operational techniqueseconomies of scalereduces customer risks of over-provisioning orunder-provisioning

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingImprovements in Operational TechniquesIntroductionWSC ArchitectureWSC InfrastructureWSC InfrastructureCloud ComputingWSCs o er economies of scale that cannot be achieved with adata center.WSC economies of scale bene tsfailover - Automatically restarting an application that failswithout requiring administrative intervention.rewall - Examines each network packet to determine whetheror not it should be forwarded to its destination.virtual machine - A software layer that executes applicationslike a physical machine.Protection against denial-of-service attacks.Programming ModelsWSC ArchitectureEconomies of ScaleWSCs have led to innovations in system software to providehigh reliability.IntroductionProgramming ModelsCloud ComputingReducing Customer RisksWSCs reduce risks of over-provisioning or under-provisioning,particularly for start-up companies.Providing too much equipment means overspending.Providing too little equipment means demand may not be ableto be met, which can give a bad impression to potential newcustomers.5.7 times reduction in storage costs7.1 times reduction in administrative costs7.3 times reduction in networking costsWSC economies of scale advantagesvolume discount price reductionsPUE of perhaps 1.2 versus PUE of 2.0 for a data centerbetter utilization of WSC by being available to the publicIntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingAmazon Web ServicesAmazon o ered Amazon Simple Storage Service (Amazon S3)and Amazon Elastic Computer Cloud (Amazon EC2) in 2006.Relied on virtual machines.Provides better protection for users.Simpli ed software distribution within a WSC.The ability to reliably kill a virtual machine made it easier tocontrol resource usage.Being able to limit use of resources simpli ed providingmultiple price points for customers.Improved exibility in server con guration.Relied on open source software.Provided service at very low cost.No contract required.

IntroductionProgramming ModelsWSC ArchitectureWSC InfrastructureCloud ComputingFallacies and PitfallsFallacy: Capital costs of a WSC facility are higher than theservers that it houses.Pitfall: Trying to save power with inactive low power modesversus active low power modes.Pitfall: Using too wimpy a processor when trying to improveWSC cost-performance.Fallacy: Replacing all disks with Flash memory will improvecost-performance of a WSC.

The standard rack holds 48 servers connected by a 48-port Ethernet switch. A rack switch has 2 to 8 uplinks to a higher switch. So the bandwidth leaving the rack is 6 (48/8) to 24 (48/2) times less . networking stack of traditional switches. Introduction Programming Models WSC Architecture WSC Infrastructure Cloud Computing