Cloud Auto-Scaling With Deadline And Budget Constraints

Transcription

Preliminary version. Final version appears In Proceedings of 11th ACM/IEEE InternationalConference on Grid Computing (Grid 2010). Oct 25-28, 2010. Brussels, Belgium.Cloud Auto-Scaling with Deadline and Budget Constraints,,Ming Mao Jie Li Marty HumphreyDepartment of Computer ScienceUniversity of VirginiaCharlottesville, VA, USA 22904{ming, jl3yh, humphrey}@cs.virginia.eduAbstract—Clouds have become an attractive computingplatform which offers on-demand computing power andstorage capacity. Its dynamic scalability enables users toquickly scale up and scale down underlying infrastructure inresponse to business volume, performance desire and otherdynamic behaviors. However, challenges arise whenconsidering computing instance non-deterministic acquisitiontime, multiple VM instance types, unique cloud billing modelsand user budget constraints. Planning enough computingresources for user desired performance with less cost, whichcan also automatically adapt to workload changes, is not atrivial problem. In this paper, we present a cloud auto-scalingmechanism to automatically scale computing instances basedon workload information and performance desire. Ourmechanism schedules VM instance startup and shut-downactivities. It enables cloud applications to finish submitted jobswithin the deadline by controlling underlying instancenumbers and reduces user cost by choosing appropriateinstance types. We have implemented our mechanism inWindows Azure platform, and evaluated it using bothsimulations and a real scientific cloud application. Resultsshow that our cloud auto-scaling mechanism can meet userspecified performance goal with less cost.Keywords-cloudcomputing;scalability; integer s have become an attractive computing platformwhich offers on-demand computing power and storagecapacity. Its dynamic scalability enables users to scale upand scale down the underlying infrastructure in response tobusiness volume, performance desire and other dynamicbehaviors. To offload cloud administrators’ burden andautomate scaling activities, cloud computing platforms havealso offered mechanisms to automatically scale up and downVM capacity based on user defined policy, such as AWSauto-scaling [1]. Using auto-scaling, users can define triggersby specifying the performance metrics and thresholds.Whenever the observed performance metric is above orbelow the threshold, a predefined number of instances willbe added to or removed from the application. For example, auser can define a trigger like “Add 2 instances when CPUusage is above 60% for 5 minutes”.Such automation largely enhances the cloud dynamicscalability benefits. It transparently adds more resources tohandle increasing workload and shuts down unnecessarymachines to save cost. In this way, users do not have toworry about capacity planning. The underlying resourcecapacity can be adaptive to the application real-timeworkload. However, challenges arise when people lookdeeper into the mechanisms.In cloud auto-scaling mechanisms, performance metricsnormally include CPU utilization, disk operation andbandwidth usage, etc. Such infrastructure level performancemetrics are good indicators for system utilization information.But it cannot clearly reflect the quality of service a cloudapplication is providing or tell whether the performancemeets user’s expectation. Choosing appropriate performancemetric and finding precise threshold is not a straightforwardtask, and cases become more complicated if the workloadpattern is continuously changing. Moreover, consideringindividual utilization information only may not robust toscale [9]. For example, a cluster going from 1 to 2 instancescan increase capacity by 100%, while going from 100 to 101instances can only increase capacity by 1%. Current simpleauto-scaling mechanisms normally ignore such non-constanteffects when adding a fixed number of resources.Another factor such auto-scaling mechanisms overlook isthe time lag to boot a VM instance. Though instanceacquisition requests can be made at any time, they are notimmediately available to users. Such instance startup lagtypically involves finding the right spot for the requestedinstances in cloud data center, downloading specified OSimage, booting the virtual machine, and finishing networksetup, etc. Based on our experiences and research [5], itcould take as long as 10 min to start an instance in WindowsAzure, and such startup lag can change over time. In otherwords, it’s very likely that users may request instances late ifthey do not consider instance startup time factor.Cost is also an issue worth careful consideration whenusing cloud. Cloud computing instances are charged byhours. A fraction of an hour is counted as a whole hour.Therefore, it could be a waste of money for machines shutdown before a whole hour operation. In addition to noticingthe full hour principal, clouds now usually offers variousinstance types, such as high-CPU and high I/O instances.Choosing appropriate instance types based on the applicationworkload can further save user money and improveperformance. We believe cloud scaling activities can be donebetter by considering using different instance types than justmanipulating instance numbers.In this paper, we present a cloud dynamic scalingmechanism, which could automatically scale up and scaledown underlying cloud infrastructures to accommodatechanging workload based on application level performancemetric – job deadline. During the scaling activities, the

mechanism tries to form a cheap VM startup plan bychoosing appropriate instance types, which could save morecost compared to only considering one instance type.The rest of this paper is organized as following. SectionII introduces the related work. Section III identifies cloudscaling characteristics and describes application performancemodel. Section IV formalizes the problem and details ourimplementation architecture in Windows Azure platform.Section V evaluates our mechanism using both simulationsand a real scientific application. Section VI concludes thepaper and describes future works.II.RELATED WORKThere have been a number works on dynamic resourceprovisioning in virtualized computing environment[9][10][12][4]. Feedback control theory has been applied inthese works to create autonomic resource managementsystems. In [9][10], target range is proposed to solve thecontrol stability issue. Further in [9], it focuses on controlsystem design. It points out that resizing instances is a coarsegrained actuator when applying control theory in cloudenvironment and proposed proportional threasholding to fixthe non-constant effect problem. These works useinfrastructure level performance metrics and mainly focus oncontrol theory application in cloud environment. They do notconsider various VM types or total running cost. In [8],dynamic scaling is explored for cloud web applications.They considered web server specific scaling indicators, suchas the number of current users and the number of currentconnections. The work uses simple triggers and thresholds todetermine instance number and does not consider VM typeinformation and budget constraints as well. In [4], theyconsidered extending computing capacity using cloudinstances and compared the incurred cost of different policies.Particularly in cloud computing, dynamic scalabilitybecomes more attractive and practical because of theunlimited resource pool. Most cloud providers offer cloudmanagement API to enable users to control their purchasedcomputing infrastructure programmatically, but few of themdirectly offers a complete solution for automatic scalabilityactivities in cloud. Amazon web service auto-scaling serviceis one of them. AWS auto-scaling is a mechanism toautomatically scale up and down virtual machine instancesbased on user defined triggers [1]. Triggers describe thethresholds of observed performance metric, which includeCPU utilization, network usage and disk operations.Whenever the monitored metric is above the upper limit, apredefined number of instances will be started, and when it isbelow the lower limit, a predefined number of instances willbe shut down. Another work worth mentioning here isRightScale [3]. It works as a broker between users and cloudproviders by providing unified interfaces. Users can interactwith multiple cloud providers on one screen. The nicelydesigned user interface, highly customized OS images andmany predefined utility scripts enable users to deploy andmanage their cloud applications quickly and conveniently. Indynamic scaling, they borrow the idea of “triggers andthresholds” but extend scaling indicator choices broadly.Including system utilization metrics, they further supportsome popular middle-ware performance metrics, such asMysql connections, Apache http server requests and DNSqueries. However, these scaling indicators may not be able tosupport all application types and not all of them can directlyreflect quality of service requirements. Also, they do notconsider cost explicitly. To the best of our knowledge, ourwork is the first auto-scaling mechanism which addressesboth performance and budget constraint in cloud.III.CLOUD SCALINGA. Cloud Scaling Characteristics and AnalysisAs a computing platform, clouds own distinctcharacteristics compared to utility computing and gridcomputing. We have identified the following characteristicswhich can largely affect the way people use cloud platforms,especially in cloud scaling activities.Unlimited resources limited budget. Clouds offer usersunlimited computing power and storage capacity. Though bydefault the resource capacity is capped to some number, e.g.,20 computing units per account in Windows Azure, suchusage cap is not a hard constraint. Cloud providers allowusers to negotiate for more resources. Unlimited resourceenables applications to scale to extremely large size. On theother hand, these unlimited resources are not free. Everycycle used and byte transferred are going to appear on thebill. Budget cap is a necessary constraint for users toconsider whey they deploy applications in clouds. Therefore,a cloud auto-scaling mechanism should explicitly consideruser budget constraints when acquiring resources.Non-ignorable VM instance acquisition time. Thoughcloud instance acquisition requests can be made at any timeand computing power can be scaled up to extremely large, itdoes not mean cloud scales fast. Based on our previousexperiences and research [5], it could take around 10 moreminutes from an instance acquisition request until it is readyto use. Moreover, such instance startup lag could keepchanging over the time. On the other side, VM shuttingdown time is quite stable, around 2-3 minutes in WindowsAzure. This implies that users have to consider two issues incloud dynamic scaling activities. First, count in thecomputing power of pending instances. If an instance is inpending status, it means it is going to be ready soon.Ignoring pending instances may result in booting moreinstances than necessary, therefore waste money. Second,count how long the pending instance has been acquired andhow long further it needs to be ready to use. If the startuptime delay can be well observed and predicted, applicationadmin can acquire machines in advance and prepare early forworkload surges.Full hour billing model. The pay-as-you-go billingmodel is attractive, because it saves money when users shutdown machines. However, VM instances are always billedby hours. Fraction consumption of an instance-hour iscounted as a full hour. In other words, 10 minute and 60minute usage are both billed as 1 hour usage and if aninstance is started and shut down twice in an hour, users willbe charged for two instance hours. The shutting down timetherefore can greatly affect cloud cost. If cloud auto-scaling

mechanisms do not consider this factor, it could be easilytricked by fluctuate workloads. Therefore, a reasonablepolicy is that whenever an instance is started, it is better to beshut down when approaching full hour operation.Multiple instance types. Instead offering one suit-for-allinstance type, clouds now normally offer various instancetypes for users to choose. Users can start different types ofinstances based on their applications and performancerequirement. For example, EC2 instances are grouped intothree families, standard, high-CPU and high-memory.Standard instances are suitable for all general purposeapplications. High-CPU instances are well suited forcomputing intensive application, like image processing.High-memory instances are more suitable for I/O intensiveapplication, like database systems and memory cachingapplications. One important thing is that instances arecharged differently and not necessarily proportional to itscomputing power. For example, in EC2, c1.medium coststwice as much as m1.small. But it offers 5 times morecompute power than m1.small. Thus for computing heavyjobs it is cheaper to use c1.medium instead of the leastexpensive m1.small. Therefore, users need to chooseinstance type wisely. Choosing cost-effective instance typescan both improve performance and save cost.B. Cloud Application Performance ModelIn this paper, we consider the problem of llymanipulating the running instance types and instancenumbers. Instead of using infrastructure level performancemetrics, we target application level performance metric, theresponse time of a submitted job. We believe a directperformance metric can better reflect users’ performancerequirements, therefore can better instruct cloud scalingmechanisms for precise VM scheduling. At the same time,we introduce cost as the other goal in our cloud scalingmechanism as well. Our problem statement is how to enablecloud applications to finish all the submitted jobs before userspecified deadline with as little money as possible. To keepthe cloud application performance model general and simple,we consider a single queue model as shown in Fig. 1. Also,we make following assumptions. Workload is considered as non-dependent jobssubmitted in the job queue. Users don’t haveknowledge about incoming workload in advance. Jobs are served in FCFS manner and they are fairlydistributed among the running instances. Everyinstance can only process a single job at one time. All the jobs have the same performance goal, e.g. 1hour response time deadline (from submission tofinish). Deadline can be dynamically changed VM instances acquisition requests can be made atany time, but it may take a while for newly requestedpending instance to be ready to use. We call suchtime VM startup delay. There could be different classes of jobs, such ascomputing intensive jobs and I/O intensive jobs. Ajob class may have different processing time ondifferent instance types. For example, a computing intensive job can run faster on high-CPU machinesthan high-I/O machines.The job queue is large enough to hold allunprocessed jobs and its performance scales wellwith increasing number of instances.Figure 1. Cloud application performance modelIV.SOLUTION & ARCHITECTUREBased on the problem description in previous section, weformalize the problem in this section and present ourimplementation architecture in Windows Azure.A. SolutionOne of the key insights to this problem is that, to finishthe all submitted jobs before the deadline, auto-scalingmechanism needs to ensure that the computing power of allacquired VM instances is large enough to handle theworkload. We summarize the key variables in the Table. I.TABLE I.KEY VARIABLES USED IN CLOUD PERFORMANCE MODELVariablesMeaningJjthe jth job classnjthe number ofVthe VM typethe ith instance (running or pending)Iicvdvsit j ,vDCWPJ j submitted in the queuethe cost per hour of VM type Vaverage startup delay of VM typeVthe time already spent in pending status of Iiaverage processing time of running job J j on Vdeadline (e.g. 1 hour or 100 seconds)budget constraint (dollars/hour)Workload – jobs need to be finishedcomputing power – jobs can be finishedUsing the above notations, we define the systemworkload as a vector W. For each job class J j , there are n jsubmitted jobs.W (J j , nj )The computing power of instance Ii can be representedas a vector Pi . The idea is to calculate how many jobs can befinished for each job class before the deadline on instanceI i .We use deadline and individual completion time (assumeall the jobs are finished by that instance) ratio to approximatethe number of jobs that can be finished.

Pi ( J j ,D nj tj j ,type ( I i )nj)For instance whose status is pending, its computingpower can be represented as following, where si is the timealready spent in starting the instance.( D (dtype( Ii ) si )) n jPi ( J j ,) j t j ,type( Ii ) n jTherefore, the total computing power of current instancecan be represented as Pi . Clearly if W P, we need toistart more instances Pi ' ( ' means new instances) to handlethe increased workload. The problem becomes finding a VMinstance combination planܲ ᇱ , in which i Pi ' W PAt the same, we also want to minimize the cost we spendfor these newly added instances.Min ( i ctype ( I i ') )In the cases where there are insufficient budget, the ideato generate as much computing power as possible within thebudget constraintsMax( Pi ') citype ( I i ') C i ctype ( Ii )When one instance I s is approaching full hour operation,we need to decide whether to shut-down the machine or not.In this case, we can calculate the computing power withoutinstance I s , and compare with the workload. If thecomputing power is still big enough to handle the workload,we can remove the instance. P P WiisTo better explain the problem, we can go through asimple example. Assume we have three job classes ( j1 , j2 ,j3 ) and three VM types ( V1 , V2 , V3 ). Currently, theworkload in the system is [60, 60, 60] and there are tworunning instances I1 and I 2 . Our goal is to find a VM typecombination [ n1 ' , n2 ' , n3 ' ], whose computer power isgreater than or equal to target computing power and theircost is minimal among all the possible VM typecombinations. x 60 10 10 40 y 60 5 20 35 j3 : z 60 20 5 35 { { { { P ' W I1 I 2j1 :j2 :j1 : 10 10 10 x 45 j2 : n1 ' 5 n2 ' 20 n3 ' 10 y 35 5 10 z 35 j3 : 20 {{{ {V1V2V3 P'Min(c1n1 ' c2 n2 ' c3 n3 ')Wherec1n1 ' c2 n2 ' c3n3 ' ctype ( I1 ) ctype ( I 2 ) CFrom the above analysis, our cloud auto-scalingmechanism is reduced to several integer programmingproblems. We try to minimize the cost or maximize thecomputing power with either computing power constraints orbudget constraints. There are quite a few standardapproaches to solve integer programming problems, such ascutting-plane and branch-and-bound methods [13] [14]. Wewill not duplicate the details here.In addition to determining the number and type of VMinstances, there are some other cases like admission controland deadline miss handling which are also interested to thinkabout in cloud auto-scaling mechanisms. However, ourwork’s intension is not to create a hard real-time cloudsystem which all jobs’ deadline are guaranteed, we focus onautomatic resource provisioning based on both performancegoals and budget constraints. Deadline is just the metric wechoose, because it can better reflect users’ performancedesire. Therefore, in real practice we believe these are morelike policy questions. Users can choose their own policiesbased on their applications. For example, to maintain serviceavailability and basic computing power, users can decide theminimum number of running instances. In other words, eventhere is no workload, a cloud application will always have atleast 1 running instance. For admission control cases, whenthere’s insufficient budget, auto-scaling mechanism couldeither accept the job and try to run with maximumcomputing power within the user budget constraints or userscan simply deny the job. In either case, users may want to getnotification from the mechanism. For deadline miss handling,users can either leave it alone or allow auto-scalingmechanism to increase as many instances as possible tospeed up the remaining processing. In our implementation,we have implemented these policies and let user to configurewhich policy is most appropriate for their cases, and usersare allowed to implement their own policies as well.B. ArchitectureWe have designed and implemented our cloud autoscaling mechanism in Windows Azure [3]. Figure 2 showsthe architecture of our implementation. The implementationincludes four components. They are performance monitor,history repository, auto-scaling decider and VM manager.Performance monitor observes the current workload in thesystem, collects actual job processing time and arrivalpattern information, and updates the history repository. VMmanager works as the adapter between our auto-scalingmechanism and cloud providers. It monitors all pending andready VM instances, and updates history repository withactual startup time of different VM types. Moreover, itexecutes VM startup plan generated by auto-scaling deciderand directly invokes cloud provider resource provisioningAPIs. In our case, it is Windows Azure management API.Our intention is that VM manager hides all cloud providerdetails and can be easily replaced with other cloud adapters.Such information hiding enhances the reusability and

customizability of our implementation when working withdifferent cloud providers. History repository contains twodata structures. One is the configuration file, which includesapplication deadline, budget constraint, monitor executioninterval information, etc. As shown in Fig. 2, applicationadministrators can dynamically control the behavior of cloudauto-scaling mechanism by changing the configuration file.The other data structure is historical data table, whichrecords the historical job processing time and arrival patterninformation provided by performance monitor, and instancestartup delay information provided by VM manager. Bymaintaining historical data, the repository improves the inputparameter preciseness and also helps decider to prepare forpossible workload surges early. Decider is the core of ourcloud auto-scaling mechanism. Relying on real-timeworkload and VM status information from performancemonitor and VM manager, as well as configurationparameters and historical records from history repository, itsolves the integer programming problem we formalized inthe previous section and generates a VM startup plan for VMmanager to execute. The VM startup plan could be emptybecause the workload may be well handled by exitinginstances or it can contain instance type and number pairs tonotify VM manager acquire enough computing power. In ourcurrent implementation, we use Microsoft Solver Foundation[11] to solve the integer programming problem. Acquiringinstance actions are initialed by decider. After every sleepinterval, it invokes the logic to determine the VM startupplan. On the other side, releasing instance actions areinitialed by VM manager because it monitors which instanceis approaching full hour operation and could be the potentialshut-down targets. But it has to ask decider to see whetherremaining computing power is large enough to handle theworkload. We have published our current implementation asa library and plug it in MODIS application [7]. Theevaluation of our mechanism in this real scientificapplication can be found in the next section.Min( i ctype ( Ii ') ) jPj ' W PFigure 2. Architecture of Cloud auto-scaling in AzureV.EVALUATIONIn this section, we evaluate our mechanism using bothsimulations and a real scientific application (MODIS)running in Windows Azure. Through simulation framework,we can easily control the input parameters, such as workloadpattern and job processing time, which helps to identify thekey factors in our mechanism. Moreover, using simulationextensively reduces the evaluation time and cost. Thescientific application tests our mechanism’s performance inreal environment.In our evaluation, we simulated three types of jobs. Theyare mix, computing intensive and I/O intensive. At the sametime, we simulated three types of machines. They areGeneral, High-CPU and High-I/O machines. We summarizetheir simulation parameters in Table II. The simulation datais derived from pricing tables and instance descriptions ofEC2. For example, in EC2, c1.medium instance costs twiceas much as m1.small. But it offers 5 times more computepower than m1.small [1]. In our case, we assume mix jobsare half computation and half I/O. The speedup factor ofpowerful machines is 4-5.TABLE II.AVAREAGE PROCESSING TIMEMixAvg 30 jobs/hourSTD 5 jobs/hourGeneral0.085 /hourDelay 600sHigh-CPU0.17 /hourDelay 720sHigh-IO0.17 /hourDelay 720sComputingIntensiveAvg 30 jobs/hourSTD 5 jobs/hourI/O IntensiveAvg 30 jobs/hourSTD 5 jobs/hourAverage 300sSTD 50sAverage 300sSTD 50sAverage 300sSTD 50sAverage 210sSTD 25sAverage 75sSTD 15sAverage 300sSTD 50sAverage 210sSTD 25sAverage 300sSTD 50sAverage 75sSTD 15sA. DeadlineFor deadline performance goal, we consider two cases. 1)Stable workload with changing deadline. We generate theworkload using Table II and plot the job response time in Fig.3. Every data point in the graph reflects the job response timein every 5 minutes and we record average, minimum andmaximum response time for all the jobs finished in thatinterval. The deadline is first set as 3600s, then changed to5400s and finally switched back. The purpose is to evaluateour mechanism’s reaction to dynamic user performancerequirement change. Fig. 3 shows that more than 95% ofjobs are finished within the deadline and most of the misseshappen at the second deadline change. This is mainlybecause our auto-scaling mechanism runs every 5 minutesand VM instances can only be ready 10-12 minutes laterafter acquisition requests. Besides, we also calculate theinstantaneous instance utilization rate. Job processing isconsidered as utilized while all the other cases, such aspending and idling, are considered as unutilized. The highutilization rate (average 94%) shows that our mechanismdoes not aggressively acquire instances to guarantee thedeadline, and 6% of time is spent on VM startups.2) Changing workload with fixed deadline. In this test,we fix the deadline to 3600s and create three workload peaks.Base workload is 30 mix jobs per hour. The first workloadpeak adds another 300 mix jobs per hour. The second peakadds 300 computing intensive jobs per hour, and the thirdone adds 300 I/O intensive jobs per hour. The purpose of this

test is to evaluate our mechanism’s reaction to suddenincreasing workload and job type changes. Such workloadpattern is normally seen in large volume data processingapplications, in which data computation and analysis isperformed in day time, and data backups and movements areperformed in nights and holidays. From Fig. 4, we can seethat the deadline goal is well met for all three workloadpeaks. When workload goes back to normal, the overacquired instances during peak moments quickly reduce jobresponse time. As more and more unnecessary instances areshut down (approaching full hour operation), the responsetime goes back to average.Stable Workload & Changing DeadlineResponse (sec)Utilization 02030405060Time (hour)dea dlinea vgutiliza tion7080maxminFigure 3. Stable workload with changing deadlineTo evaluate the performance of our mechanism, inaddition to the four choices, we also calculate the possibleoptimal cost for the same workload and compare our solutionwith it. The optimal solution can be obtained because weknow the workload in advance and we assume we canalways put a job to the most cost-effective machines, e.g.,put computing intensive jobs on High-CPU instances forprocessing. From Fig. 5, we can see that by considering allavailable instance types (Choice #4), our mechanism canadapt to the workload changes and choose cost-effectiveinstances. In this way, the real-time cost is always close tothe optimal cost. On the other side, General instances alwaysperforms on average for all three workload peaks, whileHigh-CPU and High-IO can only save cost on its preferredworkload surges. Fig. 6 shows the accumulated cost. Choice#4 incurs 14% more cost than the optimal solution and saves20% cost compared to General instance choice, 45%compared to High-CPU and High-IO. Because of symmetry,High-CPU and High-IO instances end up with almost thesame cost. General instances has lower cost on average,therefore, in the long run, it outperforms High-CPU andHigh-IO cases. By choosing appropriate instance types,choice #4 can incur less cost in all three workload peaks likethe optimal solution, hence, it outperforms all the other cases.There are two reasons why our solution cannot make theoptimal decision. Auto-scaling decider does not know thefuture workload and can only make decisions locally. Second,it cannot control the running instance for processing a job.Changing Workload & Fixed DeadlineResponse (sec)4000Worload ne2030avg4050Time (hour)max6070min43250100800workloadFigure 4. Changing workload with fixed deadlineB. CostUsing the same evaluation as we did for changingworkload fixed deadline, we compare the cost of usingdifferent types of VM instance. The VM type combinationsare illustrated in Table III. Fig. 5 shows the comparisonresult.1020Choice #130Choice #24050Time (hour)Choice #36070Choice #480OptimalFigure 5. Instantaneous cost of changing workload & fixed deadlineAccumulated Cost140120100Cost (Dollar)10Cost (Dollar/hour)30030000Instantaneous Cost63508060TABLE III.INSTANCE TYPE40Choice #1Choice #2Choice #3Choice #4

business volume, performance desire and other dynamic behaviors. To offload cloud administrators' burden and automate scaling activities, cloud computing platforms have also offered mechanisms to automatically scale up and down VM capacity based on user defined policy, such as AWS auto-scaling [1].