A SIMULATION FRAMEWORK FOR CLUSTER-BASED WEB SERVICES - Unimore

Transcription

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKA SIMULATION FRAMEWORK FOR CLUSTER-BASED WEBSERVICESEMILIANO CASALICCHIODipartimento di Informatica, Sistemi e ProduzioneUniversità di Roma “Tor Vergata”E-mail: casalicchio@uniroma2.itRICCARDO LANCELLOTTIDipartimento di Iingegneria dell'InformazioneUniversità di Modena e Reggio EmiliaE-mail:riccardo.lancellotti@unimore.itMARCO EMILIO POLEGGICERN-IT/INFN-CNAFE-mail: Marco.Emilio.Poleggi@cern.chAbstract: We propose a simulation framework, namely CWebSim, specifically designed for the performanceevaluation and capacity planning of cluster-based Web services. A broad variety of Web cluster configurationscan be simulated through CWebSim. Its modularity permits the definition of different mechanisms, algorithms,network topologies and hardware resources. Also, two workload input alternatives are possible: a trace-drivenmode and a distribution-driven mode that encompasses the most recent results on Web workloadcharacterization. We present two case studies to show how CWebSim can be used to test cache cooperationprotocols and Web switch dispatching algorithms.Keywords: Simulation framework, Cluster-based Web systems, Performance evaluation, Caching, Cooperationalgorithms1. INTRODUCTIONmodular design which allows the combinations of abroad variety of technical features, such as theadoption of different request dispatching policiesand internal network hardware. By redefining thenode functions and interconnections, complexproxy-caching systems or multi-tier architectures fore-commerce services can be easily simulated.Simulation is a common practice for theperformance evaluation and capacity planning ofWeb-based systems. Indeed, the complexity ofcurrent Web architectures often makes analyticalsolutions of the related mathematical modelsinfeasible. In this paper we present CWebSim: asimulation framework conceived for cluster-basedWeb services. These architectures are frequentlyused in practice: they are built on pools of servernodes, also known as Web farms/clusters, that areinterconnected by a LAN with the goal of sharingthe load of incoming requests. Many alternativesexist and CWebSim can be used to evaluate most ofthem, especially those acting at the higher levels,that is, application protocols, server-level caching,file systems. Nevertheless, CWebSim remains adetailed simulation model of a Web cluster,encompassing the main issues about the hardware,the operating system and the application layers, suchas internal network and disk transfers, overheads dueto request dispatching and processing. Specialattention has been posed also to the workload modelthat reproduces a Web environment: in the case ofthe synthetic workload, realistic distributions fordocument sizes and requests are adopted, whereasthe trace-based method has the appreciable featureof preserving the time dependencies. The CWebSimsimulation framework can be customized intoseveral Web service architectures, thanks to itsI.J. of SIMULATION Vol. 7 No 6Figure 1. A basic cluster of cooperating Web serversTo the best of our knowledge, no simulation tool inliterature is specifically oriented to Web clusters.General purpose frameworks exist for simulatingcomputer networks, such as ns-2, OPNETModeler and other tools that are considered inSection 5. CWebSim is written in C and uses theCSIM process-based simulation library (seeMesquite, 2001): this provide us with an adequate10ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKbasis of classes and functions to be used as thebuilding blocks for the implementation of complexsimulation models. CWebSim can be ported to mostoperating systems thanks to the different CSIMdistributions available: we tested it successfully onLinux and on various Unix-like platform. Someefforts can be necessary to implement CWebSimthrough other simulation languages/libraries, butthere is no theoretical limit to its porting, becausethe design of CWebSim relies on CSIM features thatare common to many other library-based simulationtools.We present two case studies where we simulatethrough CWebSim a set of clustered HTTP servers.In the first case, the nodes cooperate for globalcaching purposes, as shown in Figure 1, with theaim of improving the performance of standard Webclusters, composed of stand-alone nodes. Thesecond case study focuses on dispatching algorithmalternatives that can be adopted at the front-endcomponent of the Web cluster, namely, the Webswitch.The rest of the paper is organized as follows. Section2 gives a detailed description of the simulationframework we designed for the evaluation of genericWeb-based services. In Section 3 and 4, we discussthe use of CWebSim for simulating, respectively,global caching mechanisms and Web switchdispatching algorithms. Some related work isdiscussed in Section 5. We outline conclusions andfuture work in Section 6.implement simulation services such as input/outputand statistic gathering routines.Figure 2. Software modules of CwebSim2.1. Target system modulesThe target system can be seen as a set of nodes thatare interconnected through one or more networklinks. Each node is an abstraction of a physicalcomputer unit, such as a PC or a workstation, andcan be configured with different hardwarecapabilities, so that specialized nodes can be easilymodeled. A process abstraction mechanism allowsCSIM threads to be activated on the nodes. The mainhardware components we consider are CPUs, harddisks, memory banks and network interface cards(NICs): the related models are implemented byappropriate CSIM facilities having, if needed, theirown queueing system. These components areincluded in the Hardware definition modules:CPUs are round-robin-scheduled service centers.CSIM threads engage the centers for a timesliceconfigurable to approximate the behavior ofcurrent operating system schedulers. The servicetime depends on the requested operation.Hard-disks are FCFS (First-Come-First-Served)centers with service time defined by a constantpart (average values from off-the-shelf devicedatasheets are considered for the controller delayand the seek time) plus a part that is proportionalto the requested data amount through a transferrate parameter.Memory banks are not modeled as independentservice centers: they are accessed through theCPU, like in real systems. The service time isproportional to the amount of the transferreddata.NICs can be defined according to various modelsavailable in the Network module, as described inSection 2.1.4.2. CWebSim: A WEB CLUSTER SIMULATIONTOOLIn this paper we present the architecture details andsome applications of a simulation tool conceived forthe performance evaluation of cluster-based Webarchitectures. CWebSim (‘C’ stands for “cluster”) is adiscrete-event simulator implemented through theCSIM package: a library of routines for processoriented simulations. The simulation frameworkunderlying CWebSim can be customized to representmany classes of Web architectures, but in this paperwe focus on locally distributed HTTP servers, alsoknown as Web clusters. Therefore, in the followingdiscussion we consider only the main components ofWeb clusters, that are the Web switch, the serversand the internal network, disregarding some externalissues such as DNS servers, gateways and routers.Figure 2 shows an high-level view of the CWebSimsoftware architecture. CWebSim has a modularsoftware structure conceived to isolate theimplementation of the target model from theauxiliary simulation routines. The target system’sbehavior is defined by a set of four modules(Dispatching module, Client module, Server module,and the subset of Hardware definition modules) thatimplement the core Web component models; thesemodules are described in Section 2.1. The remainingmodules (Input module, Output module and Gathermodule) are described in Section 2.2: theyI.J. of SIMULATION Vol. 7 No 611ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKAs the main focus of the simulation is on Web-basedapplications, we find unnecessary to model lowlevel factors, such as operating system delays orMAC contention. Being typically two or three ordersof magnitude lower than application service times,these overheads are negligible with respect to thecosts of Web-based services hosted on a cluster.A Web cluster model is obtained by defining a set ofWeb server nodes, a Web switch node and one ormore Web client nodes. The “Web” qualifier in frontof some non-ambiguous terms, such as “client” and“server”, will be often omitted. The server nodes andthe Web switch are interconnected by an internalnetwork, whereas the client nodes are connected tothe Web switch through an external networksimulating the Internet; the network models aremanaged by the Network module. Since CWebSim iscluster-oriented, the overall target system descriptionis based on some global data structures that definethe components available on each node: forexample, number of CPUs in a node, type of NIC,and so on. When the simulator is initialized, for eachnode (or group of homogeneous nodes), a tating, for instance, how much memory itowns and which scheduling policy its CPU adopts.In a typical Web interaction, a client sends aconnection request to the Web switch, that selects aserver node and forwards the request to it; the server,in its turn, processes the request and sends a reply tothe client. Our framework relies on process-orientedsimulation. Hence, any active entity, such as clients,servers and dispatchers, are instances of CSIMthreads, which communicate through internalmessage passing routines: models for high-levelnetwork protocols can be easily built upon this basiccommunication system.processes. In our implementation, each clientprocess is activated at a simulation time ti , which isa stochastic variable describing the client arrivaltime: the mean difference value ti 1 ti (inter-arrivaltime) can be adjusted to obtain the desired incomingload pressure.Once activated, a client process computes somesession parameters: first, the number of HTMLpages requested during the Web session, and then,for each page, the number of embedded objects;HTML pages and embedded objects are chosenaccording to a certain popularity distribution. Afterthis set-up phase, the Web client enters the system,issues the first connection request to the Web switch,and stands waiting for a reply. When a responsemessage from one server of the cluster is received,the Web client process is resumed: it can eithersubmit a request for an embedded object of the sameHTML page or, if all the embedded objects havebeen received, it can wait for a user think time Ttt,during which the user is supposed to read theobtained page. These actions are repeated until allthe HTML pages composing the Web session arereceived, then the client process leaves the system.The Web interaction model also covers connectionrefusals, which occur when the cluster servicecapacity reaches a predefined saturation point;rejected connections are not reissued, to avoiddriving the system into a trashing state. Each randomvariable describing the client life cycle ischaracterized by a probability distribution function,whose shape and parameters can be defined by theCWebSim user. The alternatives of statisticaldistribution supported by CWebSim for distributiondriven workload generation are discussed in Section2.2.1.Trace-driven Model. In this model, all thecharacteristics of the workload model aredetermined by a real log of Web requests. Thebehavior of a client process is entirely driven by preloaded data. Our trace-driven model is fairly realisticbecause it preserves the time patterns of real logs ofthe typical Web traffic. We are mainly interested inpreserving the time dependencies of the requeststream, since this affects significantly the serverperformance. A trace log must be pre-processed tobe used as input by the simulator: this operationsintroduces some artifacts necessary to rebuild asession-structured trace. The log’s lines are scannedwith a sliding time window that defines themaximum session time length: all requests comingfrom the same IP address within the time windoware assigned to the same Web session. The firstobject requested in a session is considered an HTMLpage, whereas the following objects are treated asembeddedobjects. Thisintroducessomeapproximations, but it is not possible to rebuild theexact page structure without considering the originalstructure of the Web site.2.1.1 Client moduleThis module is responsible of generating the inputworkload for the target system. The life cycle of aWeb client is modeled according to the most recentresults on the Web load characterization (seeBarford, 1999, Arlitt, 1997). In a real scenario, usersvisit a Web site for a time whose length depends ontheir personal profile and on the requested services;once completed the service request, they leave theWeb site. Hence, we consider a Web interactionmodel wherein clients enter the system and populateit for a Web session. During a Web session a clientgenerates a random number of requests for Webpages, each of them being composed of an HTMLfile and a random number of embedded objects.Once received the requested document with itsembedded objects, the client reads it and issues anew page request after a random user think time,mimicking the human behavior.Clients are implemented by CSIM processes whichgenerate input requests through either a distributiondriven model or a trace-driven model.Distribution-driven model. Client processes aregenerated concurrently. At any simulation instant,the system is populated by a random number ofI.J. of SIMULATION Vol. 7 No 612ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORK2.1.2. Dispatching module1. The incoming HTTP request is queued at theinternal NIC .2. The CPU parses the incoming HTTP request,and runs a load management algorithm to decidewhether to accept or discard the request.3. The memory bank is accessed, engaging theCPU, trying to retrieve a cached copy of therequested document.4. If the requested file is not found in the memorycache, the hard-disk is accessed to load the fileinto the cache.5. The CPU is used again to produce an HTTPresponse.6. The external NIC is used to deliver the responseback to the client.The Web server application simulates a multithreaded server, whereby a CSIM process started atthe system boot acts as a master HTTP daemon,waiting for request connections; when a server nodereceives an HTTP request from the Web switch, themaster daemon forks a new slave CSIM processwhich serves the request. Any admission controlpolicy is performed by the master daemon: the slaveprocess is not spawned if the system is overloaded,in which case an error reply is sent to the client bythe master process.The main memory is used as a stand-alone localcache to simulate the behavior of the cachingmechanisms of current operating systems. Differentclassical replacement policy, such as Least RecentlyUsed (LRU), Least Frequently Used (LFU) and theirvariants, are supported.We model both HTTP/1.0 and HTTP/1.1 protocols.Through the HTTP/1.0 protocol, a new slave HTTPprocess is forked to serve each Web pagecomponent, that is, the HTML file and all theembedded objects. Through the HTTP/1.1 protocol,the same slave HTTP process serves the HTML fileand all embedded objects.The Web switch, also known as dispatcher, isresponsible of forwarding the incoming Webrequests to a server node selected according to acertain policy. CWebSim can simulate eitherstateless dispatching algorithms, such as random andround-robin, or stateful algorithms, such as leastloaded and dynamically-weighted round-robin. Alsocontent-aware dispatching policies are supported.An overview of the main dispatching alternatives forWeb clusters is given in Cardellini, 2002.A Web switch can be a general purpose PC or adedicated hardware device: in both cases thecomponents relevant to our performance studies areCPU(s) and NICs. The basic Web switch node modelconsists of a queuing system with three servicecenters connected to work in a one-way mode, thatis, only the incoming client requests go through it,whereas the server replies reach directly the clients.The service centers are a CPU used to run thedispatching algorithm, and two NICs: a first oneconnects the cluster to the Internet, through whichthe client requests come in, a second one connectsthe Web switch to the internal network whichconveys the requests forwarded to the server nodes.A two-way Web switch could be modeled using twoservice centers for each NIC, according to the modelof a Web server node proposed in Carrera, 2001.Since some of the supported dispatching algorithmsare based upon server state information, a specialCSIM process runs on the Web switch node: everyTget seconds it stores load state information, such asnumber of active processes on each server node,server response time, CPU and disk utilization. Thesame information is used to simulate an optionaladmission control mechanism that rejects connectionrequests when the system gets overloaded.2.1.3. Server module2.1.4. Network moduleThe Web server node model encompasses the mainhardware components, as shown in Figure 3: a CPU,a hard-disk, a memory bank used as a main memorycache and two NICs. One NIC is used to connect thenode to the internal network, the other NIC connectsthe node to the external network.Since we are interested in the main issues of highlevel communication protocols, such as the HTTPhandshaking or the TCP connection handoff, thecommunication among the diverse Web entities ismodeled as a single message exchange mechanism,without simulating any packet fragmentation androuting mechanisms. This approach aims to obtain aserver-side performance evaluation, and is based onthe assumption that each application-levelcomputation, like the HTTP request processinginside the Web server process, is much longer thanany characteristic time of the underlying networklayers (with the exclusion of transfer delays). Insidea Web cluster, such a system configuration isachievable with the adoption of light-weightmessaging protocols, like UDP. As for thegeographical network overheads, the connectionsetup times are, in most cases, negligible withrespect to the average transfer latency. Although forthe server-side comparison of diverse cluster-basedarchitecture it is not necessary to simulate a wide-Figure 3. Server node modelWe suppose that when a Web request is assigned to aWeb server the service centers are visited as itfollows.I.J. of SIMULATION Vol. 7 No 613ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKarea network, CWebSim can be easily extended toaccommodate this need.Various network models are implemented inside theNetwork Module of CWebSim. Each of them can beadopted according to the topology and features ofthe system under study.Ideal network: neither delays nor contentions areconsidered. This model is suited to scenarioswhere all communication overheads arenegligible, or where the network links shouldnever become a performance bottleneck. NICsare dummy elements.Delayed network: each transfer experiments a delayproportional to its size, but network contentionsare not considered (NICs are delay elements).This model can be used when the network issupposed to never saturate, even though it isnecessary to model transmission delays as inInternet’s back-bones.Bus network: all the transfer requests share thesame resource (a single FCFS queue servicecenter, NICs are delay elements) and experiencea delay proportional to the size of the transmitteddata; network contentions are captured by thequeuing discipline. This model is anapproximation of Ethernet-like LANs, whereineach node is connected on the same physicalmedium.Switched network: it is composed of Nindependent links each of them being modeledthrough a single FCFS queue service center (thatis the NIC), which simulates the networkcontentions; each transfer from one attachednode to another engages two links with a delayproportional to the transmitted data size. Eachlink’s queue is used for bidirectionalcommunication. No switching delay isconsidered, as, in most real cases, it is negligiblewith respect to the transfer latency. Themaximum theoretical bandwidth of such networkmodel is N/2 times greater than the bandwidthvalue of the single link. This model resemblesswitched LANs with a star topology, like theFast/Gigabit Ethernet.module, and dispatching parameters which selectand configure the dispatching algorithm to be used.Table 1 summarizes the main input parameters ofCWebSim with some hints about the experimentsthat can be done by varying one or more of them.Table 1. Input parameters of CWebSimThe workload parameters can be adjusted fordifferent statistical distributions. Web access patternsexhibit a high variability and a self-similar naturewhich are well approximated by “heavy-tailed”distributions, such as Pareto and Lognormaldistributions (see Arlitt, 2000, Barford, 1999,Cherkasova 2001). For instance, in Pitkow 1999 isshown that the number of requests per Web sessionfollows an inverse Gaussian distribution, whereas aPareto function fits the distributions of the numberof embedded objects per request and of the userthink time (see Barford, 1999, Pitkow, 1999). SinceCSIM offers only a set of standard distributions,such as exponential and normal, we implementedinside CWebSim the following heavy-taileddistributions of interest for the Web: InverseGaussian, Lognormal, Pareto, Weibull and Zipf.The case study presented in Section 3 shows someapplication examples.2.2.2. Gather and Output moduleOnce defined the system and the workload model,the next step in the testbed setup is to choose themost appropriate performance indexes to evaluatecluster-based Web systems. We classify performanceindexes into three broad groups: service capacityindexes , service efficiency indexes and system loadindexes . Any index can be referred either to thewhole system or to its components.Service capacity indexes estimate quantitatively thesystem performance at the service source point.Commonly adopted indexes are the following:throughput: it is defined as the number of quantitiescomputed by a system in a time unit. Quantitiesof interest for Web systems are: HTML pages(including the embedded objects), objects (files),HTTP and TCP connections, bytes. Somethroughput indexes can be further specified: forinstance, we can distinguish between the staticobject throughput for pre-existent files, and thedynamic object throughput for those filesgenerated on the fly, like CGI results;cache hit ratio: measurable as the ratio of thenumber of either documents (document hit ratio,2.2. Service modulesThe service modules implement functionalitiesrelated to the simulator, such as providing thesystem with the input parameters, collectingsimulation statistics and producing the simulationreports.2.2.1. Input moduleThis module is responsible of processing all theinput parameters, which are then dispatched to theother modules in order to configure and initializetheir components. CWebSim’s input parameters canbe divided into three main classes: workloadparameters needed to configure the Client module,system parameters needed to configure the Webswitch node, Web servers nodes and the NetworkI.J. of SIMULATION Vol. 7 No 614ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKor DHR) or bytes (byte hit ratio, or BHR) foundin the cache to the total number of requesteddocuments/bytes. The hit ratios give an estimateof the effectiveness of the caching subsystem,e.g., when simulating a proxy architecture.The main service source points in a Web cluster arethe Web switch (toward the server pool), each servernode and the whole cluster (toward the client nodes).Service efficiency indexes estimate qualitativelythe system performance at the service destinationpoint. The items of interest are:response time: it is defined as the time experiencedby an user to obtain a service from a system. Inthe case of Web systems, the service request canbe an object/page request or a session, therefore,the response time is measured, respectively, asthe time to receive an object, an HTML pagewith all its embedded objects or a certain numberof HTML pages composing a session;latency time: it is the time needed by a system toprocess a service request, excluding anycommunication delay. For instance, the objectlatency time of the entire Web cluster does notencompass the internal/external network delays.The main service destination point in a Web clusterare the client nodes (the source being the wholecluster) and the server nodes (the source being theWeb switch).System load indexes estimate the stress of a systemat the service processing point: they representcomplementary performance indexes, that is, theygive a measure of the service costs. We consider thefollowing index:utilization: it is the fraction of a base time intervalduring which a single-resource system is busy.The utilization of a multi-resource system can bemeasured in many ways, according to itsarchitecture. For an n resource pipelined system,which is a good approximation of a server nodeequipped with one CPU and one disk, we adoptan OR-based definition of utilization: the fractionof a base time interval during which at least oneof the system resources is busy. On the otherhand, for a parallel system, like a dual CPU/diskserver node, we adopt an AND-based definitionof utilization: the fraction of a base time intervalduring which all the system resources are busy;network B/W consumption: it measures the trafficconveyed by the network in a time unit, that is,the bandwidth (B/W) consumed: this gives anestimate of the network stress.Once defined the performance indexes, we use somemetrics to extrapolate a single or a set of values thatare representative for that index. Common statisticalmeasures of a set of sampled data are: sample mean,sample standard deviation, x-percentile andcumulative distribution function. Sample mean andstandard deviation are representative measures of aset of sampled data following a standarddistribution, such as normal and Poisson. When theprobability distribution of the samples ischaracterized by an infinite standard deviation themost representative statistical metrics are the xI.J. of SIMULATION Vol. 7 No 6-percentiles and the cumulative distributionfunctions. All these metrics can be easilyextrapolated via CSIM routines.In order to generate the output statistics, the Gathermodule provides the simulator with a service agentthat, periodically, collects some status values probedinside the system components, computes someperformance metrics and stores the resulting valuesinto CSIM tables. The simulation can be controlled,through internal CSIM routines, in order to stop therun when a given confidence interval is reached. Atthe end of the simulation, the collected data areprocessed by the Output module to carry out adetailed simulation report. The main output statisticsavailable in CWebSim are shown in Table 2.Furthermore, the simulator is enriched by a set ofPerl scripts that analyze the simulator output foradditional computation. The scripts can buildautomatically graphs from a set of simulations orcan aggregate multiple simulation runs (e.g., bycalculating average values and standard deviation) toprovide results that are more significant from astatistic point of view.Table 2. Output statistics of CWebSim3. SIMULATION OF GLOBAL CACHINGMECHANISMSA classical Web cluster is a pool of stand-aloneserver nodes, each of them being unaware of theothers. The main performance limitation of thisarchitecture is the shortage of memory resource oneach server node, which forces it to retrieve most ofthe requested documents from its hard-disk: as thisis often the bottleneck of a commodity-based servermachine, the single node performance is bounded bythe hard-disk performance. Yet, a Web cluster has alot of aggregated RAM that can be used as adistributed cache to be accessed via a light-weightcooperation protocol, exploiting RAM-to-RAMtransfers through the internal LAN, which can betwo orders of magnitude faster than a local disk-toRAM transfer: this is what we call a global cachingarchitecture. To the best of our knowledge,commercial Web clusters do not adopt cachecooperation solutions, whereas some examples existin the research community (see: [Li, 2001], [Liu,2000], [Song, 2000]).The logical topology of the cluster is considered tobe a flat mesh. This choice enables us to design thecooperation protocol in a fully distributed and peerto-peer fashion, as all the nodes are responsible of15ISSN 1473-804x online, 1473-8031 print

E. CASALICCHIO, R.LANCELLOTTI, M.E. POLEGGI: SIMULATION FRAMEWORKthe same Web dominion. Distributed file systems areoften used to share files within a Web cluster,because of their transparent interface toward theuser-level applications. However, there are someperformance trade-offs in this approach due to theirarchitecture, designed to work in a read/writeenvironment. Hence, unlike the distributed filesystems, the caching cooperation protocol shouldrun at the application level, because it needs tomanage whole files, instead of disk blocks.Moreover, present disk storage dimensions allow usto replicate the whole Web content on the disk ofeach node, that simplifies file retrieval.In the following sections, we discuss how themodules of CWebSim are instantiated to simulatesome cooperation alternatives, and present someperformance results.update) approximating simple hash-basedoperations, and for object insertion/replacement.The hashing CPU cost depends on the number ofobjects hosted in the cache, whereas

the load of incoming requests. Many alternatives exist and CWebSim can be used to evaluate most of them, especially those acting at the higher levels, that is, application protocols, server-level caching, file systems. Nevertheless, CWebSim remains a detailed simulation model of a Web cluster,