Maximizing Profit In Cloud Computing System Via Resource Allocation

Transcription

Maximizing Profit in Cloud Computing System via Resource AllocationHadi Goudarzi and Massoud PedramUniversity of Southern California, Los Angeles, CA 90089{hgoudarz,pedram}@usc.eduAbstract—With increasing demand for high performancecomputing and data storage, distributed computing systemshave attracted a lot of attention. Resource allocation is one ofthe most important challenges in the distributed systemsspecially when the clients have some Service LevelAgreements (SLAs) and the total profit in the system dependson how the system can meet these SLAs. In this paper, anSLA-based resource allocation problem for cloud computingis considered and a distributed solution to this problem ispresented. The processing, data storage, and communicationresources are considered as three dimensions in whichoptimizations are performed. Simulation results demonstratethat the proposed heuristic algorithm is robust (produces highquality solutions independent of the initial solution provided)and produces solutions very close to the “optimum” (bestsolution found by Monte Carlo simulation).1I.INTRODUCTIONDemand for computing power has been increasing due to thepenetration of information technologies in our daily interactionswith the world both at personal and public levels, ing,andcommunication services. At personal level, the wide scalepresence of online banking, e-commerce, SaaS (Software as aService), social networking and so on produce workloads of greatdiversity and enormous scale. At the same time computing andinformation processing requirements of various publicorganizations and private corporations have also been increasingrapidly. Examples include digital services and functions requiredby the various industrial sectors, ranging from manufacturing tohousing, from transportation to banking. Such a dramatic increasein the computing demand requires a scalable and dependable ITinfrastructure comprising of servers, storage, network bandwidth,physical infrastructure, Electrical Grid, IT personnel and billionsof dollars in capital expenditure and operational cost to name afew.The IT infrastructure provided by the datacenterowners/operators must meet various service level agreements(SLAs) established with the clients. The SLAs include computepower, storage space, network bandwidth, availability andsecurity, etc. Infrastructure providers often end up overprovisioning their resources in order to meet the clients’ SLAs.Such over provisioning may increase the cost incurred on thedatacenters in terms of both the electrical energy cost and thecarbon emission. Therefore optimal provisioning of the resourcesis imperative in order to reduce the cost incurred on the datacenteroperators as well as the environmental impact. The problem ofoptimal resource provisioning is challenging due to the diversitypresent in the client applications being hosted and the SLAs. Forexample: some client applications may be compute-intensivewhile others may be memory intensive, some applications mayrun well together while others do not, etc.The IT infrastructure provided by the large datacenterowners/operators is often geographically distributed. This helpswith reducing the peak power demand of the datacenters on thelocal power grid, allow for more fault tolerance and reliable1This work is sponsored in part by grant from NationalScience Foundation (NSF).operation of the IT infrastructure, and even, reduced cost ofownership. A datacenter however comprises of thousands to tensof thousands of server machines, working in tandem to provideservices to the clients, see for example [1] and [2]. In such a largecomputing system, energy efficiency can be maximized throughsystem-wide resource allocation and server consolidation, this isin spite of non energy-proportional characteristics of currentserver machines [3].Resource management in distributed systems is one of themost important challenges. Due to more requests for resource inthese systems, input size of this problems are much larger thanordinary resource allocation problems in centralized computingsystems. There is number of papers discussing the resourceallocation in grid computing systems e.g., [4][5]. Due todifference between cloud computing system and grid computingsystem, it is necessary to address the resource allocation problemin cloud computing systems separately.In our paper a cloud computing system is composed of someclusters that have different number and possibly different types ofservers. The total profit in this system is the total price gainedfrom the clients subtracted by the cost of operating the activeservers in the system. Clients in this system are applicationsoftware that require processing, data storage and communicationresources in this “on-demand capacity provisioning” or “leasemodel of the IT infrastructure” [6]. Clusters and servers aremodeled based on these three capabilities: computational, storage,and networking bandwidth. Cost of operation of active servers isrelated to the degree of their use of these resources [7].We focus on the SLA based resource allocation in the cloudcomputing system. Each client in this system is related to a classof clients that have a pre-defined utility function based on theirresponse time requirements. To manage the resource allocationproblem between two decision times, clients are assigned to onlyone cluster in each decision epoch. This helps compensate fordynamic changes in the system that do not need another decisionat the cloud level and can be handled at a cluster or server level.Although a central resource manager is responsible for theresource management in these cloud computing systems, the localagents are used to parallelize the solution and decrease thedecision time. This helps with the scalability of the proposedsolution.The rest of this paper is organized as follows. Related worksare presented in the next section. In section III, the system modeland problem formulation are presented. The optimization problemand the optimization solution presented for this problem arepresented in section IV and V, respectively. The simulation resultsare presented in the section VI and the conclusions are on the lastsection.II. RELATED WORKThe distributed resource allocation problem is one of the mostchallenging problems in the resource management problems. TheSLA based distributed resource allocation has attracted attentionof the research community in the last years.Our paper considers the resource management problem in acloud computing system. Key features of our formulation andsubsequent proposed solution are that we: Consider heterogeneous clusters in number of servers andtype of servers deployed in each cluster

Use a three dimensional model of the resources in theclusters, i.e., computational, storage and networkingcapabilities, and Perform distributed decision making to reduce the decisiontime by parallelizing the solutionNo previous work considers all these aspects together whenaddressing the cloud level resource management problem. In thefollowing we provide a review of most relevant prior work.Srikantaiah et al. [7] presented energy aware consolidation todecrease the total energy consumption of the cloud computingsystem. The authors presented an experimental method to modelthe energy consumption of the servers based on the CPU and diskutilization. Based on this method, a simple heuristic to consolidatethe processing works in the cloud computing system is presented.In [9], the authors extend the work in [8] and present aproblem statement with clients that have discrete utility functions.The authors proposed a heuristic to solve the problem of assigningdifferent client classes to different servers to maximize the totalprofit. Ardagna et al. [10] extend their work to multi-tier profitoptimization for continuous utility functions and proposed aniterative approach to reach the solution.A mathematical formulation for the resource allocationproblem in clusters is presented in [11]. The authors describe amethod to find the best resource assignment in a cluster in the casethat the application has certain resource requirements. Chandra etal. [12] introduce a dynamic resource allocation method in sharedclusters to minimize the overall penalty resulting from notsatisfying the SLA requirements in the response time. For thisoptimization, online measurements of the most importantparameters in the system are used to predict the next system stateand to allocate resources on that basis. An economic approach tomanage shared resources and minimize the energy consumption inhosting centers is described in [13]. The authors present a solutionthat dynamically resize the active servers and respond to thethermal or power supply events by down grading the service basedon the SLA.The problem of multi class queues in clusters is addressed in[14] and the authors used the proposed model to solve the profitoptimization problem in the clusters.III. PROBLEM FORMULATION AND SYSTEM MODELIn this paper, a cloud computing facility (datacenter) which iscomposed of a number of clusters is considered. This cloudcomputing environment has a central manager that has someinformation about all clusters as well as the clients. Clusters arecharacterized by the number and type of computing, data storage,and communication resources that they control. All of theseresources are assumed to be allocated within each server. Eachcluster comprises of potentially heterogeneous servers chosenfrom a set of known server classes. Each server class is modeledby its processing capacity ( , normalized by a defined unitcapacity), local data storage capacity ( ) and communicationcapacity ( ) as well as its operation cost which is related to theenergy consumption for each server. The operation cost of a) plus a costserver is modeled as a constant operation cost () linearly related to the utilization of the server in the(processing domain. Figure 1 shows the structure of the targetcloud computing system with 5 clusters and a central managementnode.Each client is identified by a unique id, represented by index i. Each server inThe client class of the ith client is shown bythe datacenter is similarly identified by a unique id, captured byindex j. Servers are grouped into a set of clusters, indexed by k.Set of servers in each cluster is shown by.Figure 1. Structure of the cloud computing system in this problem.In this system, all requests of each client are assigned to a) is used tosingle cluster. A pseudo-Boolean integer (determine if the ith client is assigned to the kth cluster (1) or not(0). The presented heuristic in this paper is applicable with a littlemodification to the cases that this assumption is not made but thisrestriction is helpful in doing dynamic resource allocation bycluster-level resource managers (in case of a large and suddenchange in the service generation characteristics of a client and tosatisfy the requested SLA condition at all times.) In each cluster,service requests of the assigned clients are dispersed to servers(requests generated by a single client can be assigned to more thandenotes theone server). To capture this effect, parameterportion of the ith client’s requests assigned to the jth server. Figure2 shows the cluster dispatcher architecture.Figure 2. An example of the architecture of the cluster dispatcher.To model the multi class queues in the system, GeneralizedProcessor Sharing (GPS) is used [15]. It has been shown that theGPS can be implemented by weighted fair queuing if the servicetimes for packets are not too large. To be able to find theanalytical form of the response time, requests for each client areassumed to follow a Poisson distribution with mean of(predicted based on the behavior of the client.) It can be shown[18] from the properties of the Poisson distribution that if theserequests are distributed by a system that has one input and someoutputs and the input is connected to each output with probabilityof , the requests at each output have a Poisson distribution withmean of.Except the disk resource that is allocated based on the constant), it is assumed that all other kinds ofneed of the clients (resources in the servers and clusters are allocated using the GPS.,anddenote Portion of the processing, communicationand data storage resources of the jth server which is allocated tothe ith client.All multi class single server queues can be replaced by singleclass single server queues using GPS. To model the response timeof the requests in the system, by using the known formula for theaverage service time of the requests in M/M/1 queues, the averageresponse time of the clients’ requests in each resource can becomputed. To model the total response time in the system, weassume that the service times for different resources areindependent and additive. Pipelining in the system shows that thisassumption is reasonable because if the concatenated service timeand one queue per client were used, the pipelining between

processing and communication was ignored. By this assumptionabout the queues, communication resource queue has the inputfrom the processing resource queue. So, the overall averageresponse time for a client can be presented as (1)anddenote the Average processing andwherecommunication execution time of the ith client’s requests on jthserver with a single unit of resource.The terms of the summation capture the average service timeof the ith client in the jth server in processing and communicationqueues. To simplify the equations in rest of the paper, we useandto denote these terms.Although the agreed request arrival rates are used to determinethe profit, predicted average request arrival rates are used toallocate resources to clients. This can help us to use resourcesmore efficiently in cases that we know that the actual requestarrival rates are smaller than agreed request arrival rates.The goal of the resource management problem is to maximizethe total profit from serving clients. In this system, decisionmaking interval (called decision epoch from here on) can bedefined based on the behavior of the dynamic parameters in thesystem. For example, the frequency that request arrival ratechanges for different clients affects the acceptable decision time.This is because the solution found by the presented algorithm isacceptable only as long as the parameters used to find the solutionare approximately valid. Although some small changes in theparameters can be effectively tracked and responded to by properreaction of request dispatchers in the clusters, large changescannot be handled by the local managers. In the remainder of thispaper, the resource allocation problem for each decision epoch ispresented and a solution is presented but we do not discuss theestimation, prediction and dynamic changes in the system becausethese issues are outside the scope of the present paper.IV. OPTIMIZATION PROBLEMThe total profit maximization problem is formulated below.(2)subject to: ,,(3)(4)1,1,,(5)1,(6)1,(7) ,,,(8) ,, (9)0,1 ,,,(10)0,1 , 1,(11)0,1 ,,(12)0,0,0,01,,where denotes a pseudo-Boolean integer to determine if the jthserver is ON (1) or OFF (0) and is a small positive value.Moreover,denotes the non-increasing utility function ofthe class clientwith response time equal to and is the agreed arrival rate of the ith client in its contract with the cloudservice provider.In this problem,‘s,‘s, ‘s and‘s are the variableswhereas the other parameters are constant or functions of thesevariables. In the objective function, the first term is thesummation of the service prices which are the function of theresponse time and the second term is the summation of theoperation cost of the active servers in the clusters. In theconstraints, (3) determines the active servers based on theallocated resources. Constraint (4) and (5) are used to limit thesummation of the processing, communication and data storageresource utilities in each server in the clusters. Constraint (6)ensures that all requests generated by a client are served in oneserver cluster. Constraint (7) shows the lower limit on the resourceshares in each server. Constraint (8) shows the disk requirement ofa client assigned to a server. Assignment of a client to a server iswhich is equal to zero ifis zero anddetermined withotherwise is equal to one based on the constraint (9). Constraints(10) to (12) specify the variable domains.Problem (2) may be decomposed to the assignment of clientsto clusters and allocation of resources to the assigned clients ineach cluster. For the first problem, the initial client set ( ) shouldbe partitioned into smaller sets ( ), where,:(13)For the second problem, resources in the clusters should beallocated to the assigned set of clients to maximize the earnedprofit. This problem can be presented as follows.(14)subject to the constraints of the problem presented in (2) for eachcluster. This problem is a mixed integer non-linear program. Ascan be seen in the problem formulation, the number of constraintsthat should be considered to find the optimal solution is , in which is the number of clients in theand is the number of servers in the kth cluster.It can be shown that even if the number and types of the activeservers are determined in the problem and the discrete utilityfunctions are estimated with continuous decreasing utility functionbased on response time, the objective function is neither convexnor concave (the Hessian matrix is not positive or negativedefinite) and so the problem cannot be solved with the convexoptimization methods.V. OPTIMIZATION METHODThe problem presented in the previous section is a hardproblem at both cloud and cluster levels. The simple problemsolvers cannot solve this problem except in the case of very smallinput size by running exhaustive search or by using stochasticoptimization methods such as the Simulated Annealing or GeneticSearch. In this section, a heuristic solution is presented for thisproblem.The presented heuristic is focused on distributed decisionmaking instead of centralized management. This is becausecomplexity of the solution in case of centralized management ishigh whereas distributed decision making can handle the problemin parallel and reduce the time needed to reach a good solution bya big factor with limited amount of communication. Figure 3shows the pseudo code of the overall heuristic.A good initial solution is generated in the first step of thisheuristic. To find an initial solution, clients are processed

sequentially and assigned to the best cluster on that time. Thisgreedy algorithm is repeated for a number of times to generatedifferent initial solutions and the best initial solution is selected.To generate the final solution, a local search is used toimprove the quality of the initial solution. It is important to notethat this local search is not only used to change client assignmentto decrease the resource saturation in some of clusters but also tocombine the clients to decrease the number of active servers in theclusters. For example, if a server in a cluster serves a client andunassigned capacities in other servers is enough to serve thatclient with the same price, this local search will transfer the clientto the other servers so as to decrease the cost of operation.Algorithm Resource Alloc ()// Find an initial solutionFor iter 1 to num init solns {For k 1 to num clusters {curr statek state of the cluster at end of prev. epoch;}Randomize client processing order;For i 1 to num clients {For k 1 to num clusters {( ,) Assign Distribute (i,k); };kopt,i values;Assign client i to cluster kopt,i based on itsUpdate current state of cluster indexed by kopt,i ; }Find total profit;}Select the best initial solution;,, values// Improve the solution by optimizing ,While (Steady){For i 1 to num servers {(,) Adjust ResourceShares(;}For i 1 to num clients { Adjust DispersionRates (,);}For i 1 to num clusters {TurnON servers( ,,);TurnOFF servers ( ,,) ;}Find total profit; }Figure 3. Pseudo code for the overall resource allocation heuristic.In this section, first a method to find a good starting point forthe system is presented. Next, an iterative approach to improve thequality of solution with assigned clients is presented.A. Finding an Initial solutionTo start assigning clients to the clusters, each cluster isassumed to have an initial state. This initial state can be a result ofthe resources allocated to the previously assigned and runningclients on the servers or other applications that are not related tothe cloud computing system and are running on the cluster. Thisinitial state can be specified in terms of the used (alreadyallocated) capacity of the processing, data storage andcommunication resources in the clusters and different serverswithin the clusters.To generate an initial solution, a greedy approach tosequentially select a cluster and add a client to the selected clusteris used. For each client, the best possible cluster to execute theapplication is found by finding the highest approximated profit forthe client on each cluster with respect to the current state of theclusters. This process continues until all clients are assigned to theclusters. An approximated version of the profit is used to captureincompleteness of information in the system with respect to theassigned clients to the servers and clusters.Assignment of each client to a server is defined by finding the,and. To find these values, the utilityamount offunction is used by a linear form of.In each step, a client (say, the ith client) is assigned to one ofthe clusters. To find the best possible profit for the client in eachcluster (say the kth cluster), the following optimization problem issolved.(15)101,1, ,11,,, and denote thein addition to constraint (9). Notice that ,previously allocated portion of the processing, communication anddata storage resources in the jth server, respectively.This objective function is neither a convex nor a concavefunction. To solve this problem,can be fixed in order to makethe problem a concave optimization problem with respect to(which can be solved optimally and in polynomial time byandthe Lagrangian method and Karush-Kuhn-Tucker (KKT)conditions [17]) as follows:(16)Note thatis calculated using the same formula bychanging the superscripts from to . The parentheses with twolimits mean that the value in the parentheses is limited betweenthese upper and lower bounds. To respect the disk constraint, onlyservers that have enough remaining disk capacity for the client istaken to the account to generate the initial solution.is assumed to have discreteTo find the complete solution,values and for each server the solution for complete range of ’sis found with the closed form formula. For all inactive servers in0 and0), this problemeach server class in the cluster (needs to be solved only once. Now, the final solution for (15) canbe found by dynamic programming (DP) method to combine thesolutions found for each server and generate the best assignmentsolution (for different servers to satisfy constraint of 1). For this DP, at most one of the all possibleineach server is selected for the final solution. To manage thecomplexity and parallelize the solution, this DP computation canbe done for each server class in the cluster and the results shouldbe combined after finding all the data. This procedure is calledAssign Distribute (i,k) in the pseudo code.B. Improving the solution by changing resource allocationAfter assignment of clients to clusters and allocation ofresources, the total utility may be increased using local searchmethods. Total profit maximization problem without changing theassigned set of clients to clusters is neither a convex nor a concaveproblem and we decompose it to the following parts.1) Maximizing utilities with constantThis improvement can be done by optimizing resourceallocation in each server once the set of its assigned clients andare fixed. This problem is formulated as follows.(17)subject to:, 1, , 1,wheredenotes the set of assigned clients that hasgreaterthan zero. This optimization maximizes the total utility withoutchanging the assignment parameters.

By changing the problem to minimization form, the hessianmatrix of the objective function is positive definite, so theoptimization problem is convex and the solution can beisdetermined by the KKT conditions. Based on this method, ifin the following form:a fixed parameter, the ith client has(18)/Note thatcalculation uses the same formula by changingterm. Bythe superscripts from to and removing theapplying the constraints in problem (17), the amount ofandcan be determined by a numerical optimization method such asbinarysearch.Thisprocedureisdenotedasin the pseudo code.Adjust ResourceShares(2) Making servers active or inactiveServers can be turned ON or OFF to increase the utility priceor decrease the operation cost and improves the total profit in thesystem. If by making a server active, the total utility improvementis more than the operation cost of the server, this action willimprove the total profit. For each client, increasing the utility isbased on a change in its response time.For selecting servers to activate, the best possibleimprovement after activation should be found and compared to theoperational cost of the server. In each cluster, if a server classexists that has at least one inactive server, the problem to find thebest set of clients to be assigned to the new server to maximize thetotal profit should be solved to decide whether activate a server ornot. Finding the best improvement in the total profit by making aserver active is a nonlinear mixed integer programming problemand we used decomposition method and dynamic programming tofind a suboptimal solution with low complexity. Details of thisproblem are omitted for brevity.This optimization should be executed for each server class ineach cluster that has at least one inactive server in the final,)solution. This procedure is called TurnON servers( ,in the pseudo code.When portion of the ith client’s requests served by the newactive server is zero, solution of this optimization problem can beused to adjust the assignment parameters to optimize the totalprofit while the allocated resources to different portion of theclient’s requests ( ) are fixed. This is the dual problem of theproblem presented in (17). This procedure is denoted as,) in the pseudo code.Adjust DispersionRates (To find a server to make inactive, the servers are ranked basedon the total approximated utility function and server with lowestapproximated utility is the first candidate for being inactive. Tocheck that if making a server inactive can increase the total profitof the system, all clients that are assigned to the selected serverare removed from the server and reassigned to the active servers.This may help in finding better assignment of clients to servers. Ifthe total profit of the new system is more than the previous case,the selected server is turned OFF otherwise the selected server isremoved from the candidate set of servers to turn OFF to processother servers and explore more cases in next iterations.,) in theThis procedure is called TurnOFF servers( ,pseudo code.VI. EXPERIMENTAL RESULTSTo evaluate the proposed solution, we implemented theresource allocation method. There is no existing softwareframework for this, so we ended up implementing all componentsof the system, from clients to servers and clusters by ourselves.This implementation is done for resource allocation of the systemthat is modeled in section III.The number of clusters in the cloud computing, the number ofdifferent server classes, and the number of different utility classesare set to 5, 10 and 5, respectively. We used normalized amountsfor most of the components in the system instead of real values.For each utility class, and (mean of the execution time) forthe clients are generated with (uniformly distributed) randomvariables between 0.4 and 1. The value of for each client is setwith a random variable between 0.5 and 4.5. The utility class ofeach client is set with a random assignment form the existingandfor each server class are set withutility classes. ,random variables between 2 and 6 and values ofis set withfor eachrandom variable between 1 and 3. The amount ofserver class is set with another random variable between 2 and 6.for each client is set with a random variableThe amount ofbetween 0.2 and 2.For the implemented heuristic, number of iterations for theinitial solution was set to 3. The best initial solution from thesethree initial solutions was used to find the final solution after thelocal search. To compare the results of the proposed heuristic witha known resource allocation method, a modified version of theProportional Share (PS) scheduling [8] was used. The original PSdistributes the client’s requests between all active servers; thisstrategy increases the response time of the clients. Also the classof clients is not considered in the original version of thisscheduling scheme. We thus modified the original PS to decreasethe number of the servers that each client is assigned to andconsider the utility class of the clients. In this modified scheme,for the active servers are added and the problem ofvalues offinding the resource shares in the PS is solved by the assumptionof having only one server with the processing capacity equal tothe sum of all server processing capacities. Furthermore, theaverage of the processing service rate of the clients on the activeservers is used as the processing service rate on that server. Alsoto consider the class of clients, this average service rate ismultiplied with the slope of the utility function.After this calculation, the amounts of the processing capacityfor different clients are allocated based on a method inspired fromthe First Fit heuristic for the bin packin

system-wide resource allocation and server consolidation, this is in spite of non energy-proportional characteristics of current server machines [3]. Resource management in distributed systems is one of the most important challenges. Due to more requests for resource in these systems, input size of this problems are much larger than