Grid Systems Deployment & Management Using Rocks

Transcription

Grid Systems Deployment & Management Using RocksFederico D. Sacerdoti, Sandeep Chandra and Karan BhatiaSan Diego Supercomputer ea grid deployments are becoming a standard forshared cyberinfrastructure within scientific domaincommunities. These systems enable resource sharing,data management and publication, collaboration, andshared development of community resources. This paperdescribes the systems management solution developedfor one such grid deployment, the GEON Grid(GEOsciences Network), a domain-specific grid ofclusters for geological research. GEON provides astandardized base software stack across all sites toensure interoperability while providing structures thatallow local customization. This situation gives rise to aset of requirements that are difficult to satisfy withexisting tools. Cluster management software is availablethat allows administrators to specify and install acommon software stack on all nodes of a single clusterand enable centralized control and diagnostics of itscomponents with minimal effort. While griddeployments have similar management requirements tocomputational clusters, they have faced a lack ofavailable tools to address their needs. In this paper wedescribe extensions to the Rocks cluster distribution tosatisfy several key goals of the GEON Grid, and showhow these wide-area cluster integration extensionssatisfy the most important of these goals.1.IntroductionComputational clusters have become the dominantcomputational platform for a wide range of scientificdisciplines. Due to this pressure, cluster managementsoftware has risen to the challenge: cluster tools existthat specify a common configuration base, install asoftware stack on all nodes, enable centralized control ofcomponents, and provide diagnostics for failurereporting, all with minimal effort. While clustermanagement toolkits have been successfully applied tolarge-scale clusters operating in tightly coupled LANenvironments [1, 2], current grid deployments, popularfor building shared cyberinfrasture [3], have similarmanagement requirements yet have faced a lack ofavailable tools. These grid systems seek to offer acommon operating environment for the scientificFigure 1: The GEON grid. Grid resources aregeographically distributed, and connected via boththe commodity Internet and specialized wide-areanetworks. While individual sites may have uniquecomponents and abilities, GEON must insureinteroperability throughout the grid.domain community and typically involve a diverse set ofresources operating in a geographically dispersedenvironment. Examples of such grid deploymentsinclude GEON [4], BIRN [5] and GriPhyn [6] althoughmany others exist [7-9].We present grid design from the perspective of GEON,although its similarity to other grid efforts makes ourresults applicable to other projects. Figure 1 shows theGEON grid architecture that is composed of a set ofphysically distributed clusters located within theadministrative domain of each of sixteen participatingsites. These clusters have at least one point-of-presence(pop) node and may have zero or more additionalcompute or data nodes that may consist of differenthardware architectures. The operation of this virtualorganization [10] requires the machines to run acommon software stack that enables interoperabilitywith other GEON resources. We restrict our attention tocomputational hardware resources.In addition to being geographically distributed, theclusters at each partner site build upon the commonsoftware base to provide site-specific applications andservices. The challenge, therefore, is to manage thedistributed set of hardware resources, physicallydistributed at partner institutions and connected over thecommodity Internet, in a way that minimizes system

measurements and other feedback from using thissystem for real systems deployment and management inGEON and identifies open issues and future directions.Section 6 summarizes the paper.serveri386x86 vercondor-servercondori386sge2.Rocks Cluster ManagementHigh-performance clusters have become the computingtool of choice for a wide range of scientific disciplines.Yet straightforward software installation, management,and monitoring for large-scale clusters have beenconsistent and nagging problems for non-cluster experts.The free Rocks cluster distribution takes a freshperspective on cluster installation and management todramatically simplify version tracking, clustermanagement, and integration.condor-clientFigure 2: Rocks and Rolls. A node in the graphspecifies a unit of packages and configuration; agraph traversal defines software for a clusterappliance. Yellow nodes are Rocks base, whilecolored nodes belong to various rolls.administration costs while achieving interoperability andlocal site autonomy. Although local cluster performanceis important, the main objective of the GEON systemscomponent is to efficiently manage the grid in the faceof system upgrades, security patches, and new softwarecomponents. A central tenant of the design is to achievethis level of management with minimum administration.We use the Rocks cluster distribution as a starting pointfor the GEON effort. Although lacking the functionalityto manage a grid of machines, Rocks has beensuccessfully used to manage large clusters consisting ofmore than 500 nodes. With our direction, the Rockstoolkit has been modified to support wide-areadeployment and management of systems in a fashionappropriate for GEON. In this paper we present ourrequirements for a grid management toolkit and showhow the new wide-area cluster initialization capability inRocks satisfies a majority of these requirements. Wehope the solutions explored for the GEON project can bedirectly applied to other grid deployments in addition toour own.Section 2 describes the overall Rocks architecture andsummarizes its capabilities along with other popularcluster management software. Section 3 discusses thespecific requirements for wide-area grid deploymentsand management using GEON as an example. Section 4describes the architecture and implementation changesmade to Rocks to support wide-area grid management.Section 5 discusses some initial performanceThe toolkit centers around a Linux distribution based onthe Red Hat Enterprise line, and includes work frommany popular cluster and grid specific projects.Additionally, Rocks allows end-users to add their ownsoftware via a mechanism called Rolls [11]. Rolls are acollection of packages and configuration details thatmodularly plug into the base Rocks distribution. In thispaper, for example, we demonstrate injecting softwareinto a domain-specific Grid via a GEON roll. Strongadherence to widely used tools allows Rocks to movewith the rapid pace of Linux development. The latestrelease of Rocks, version 3.2.0, supports severalcommodity architectures including x86, IA64 andx86 64 Opteron.2.1. ArchitectureFigure 3 shows a traditional architecture used for high-performance computing clusters. This design waspioneered by the Network of Workstations [12], andpopularized by the Beowulf project [13]. In this methodthe cluster is composed of standard high-volume servers,an Ethernet network and an optional off-the-shelfperformance interconnect (e.g., Gigabit Ethernet orMyrinet). The Rocks cluster architecture favors highvolume components that lend themselves to reliablesystems by making failed hardware easy andinexpensive to replace.Rocks Frontend nodes are installed with the basedistribution and any desired rolls. Frontend nodes serveas login and compile hosts for users. Compute nodestypically comprise the rest of the cluster and function asexecution nodes. Compute nodes and other clusterappliances receive their software footprint from thefrontend. Installation is a strong suit of Rocks; a singlefrontend on modern hardware can install over 100

Front-endNode(s)eth1Public Ethernetsolutions exist, both in open source and commercialform.eth02.2.1.Cluster Distribution MethodsEthernet Message PassingNetworkFigure 3: Rocks Cluster Hardware architecture.The Frontend node acts as a firewall andgateway between the private internal networksand the public Internet.compute nodes in parallel, a process taking only severalminutes. Typically cluster nodes install automatically,using PXE to obtain a boot kernel from the frontendOne of the key ingredients of Rocks is the modular Rollmechanism to produce Linux distributions customizedfor a particular domain. When we build a clusterfrontend with the GEON roll, all compute nodes willinstall a set of Geo specific software. This mechanismallows us to easily inject domain-specific software intothe Rocks integration system, enabling a Geo-specificgrid of clusters. Specialized CDs can be generated fromthis custom distribution, which behave identically tothose from RedHat. More importantly to this paper, thecustom distribution may be transmitted to other clusterfrontends over a standard wide-area network such as theInternet.By leveraging the standard RedHat Anacondainstallation technology, Rocks abstracts many hardwaredifferences and automatically detects the correctconfiguration and hardware modules to load for eachnode (e.g., disk subsystem type: SCSI, IDE, integratedRAID adapter; Ethernet interfaces, etc). Although itsfocus is flexible and rapid system configuration (and reconfiguration), the steady-state behavior of Rocks has alook and feel much like any other commodity clustercontaining de-facto cluster standard services.2.2. Related WorkWe chose the Rocks cluster distribution to power theGEON scientific grid based on its fitness to ourrequirements. However several attractive clusteringSystemImager [14] performs much of the same tasks asRocks. It supports each hardware platform by storing aunique image of the desired directory structure on amaster node in the cluster. There is no mechanism forsharing portions between images, however, makingfunctional or physical heterogeneity difficult to manage.The LCFG project [1] has an installation philosophysimilar to Rocks. It provides a configuration languageand a central repository of configuration specifications.The specifications, analogous to Rocks kickstart nodes,can be combined to install and configure individual Unixmachines over a network. Changes to the centralspecifications automatically trigger correspondingchanges in the configuration of managed nodes. TheLCFG system is used in diverse configurations whereeven the Unix flavor is heterogeneous.Scyld Beowulf [15] is a commercial, single-systemimage cluster operating system. In contrast toSystemImager, LCFG, and Rocks, processes on a scyldcluster see a single process space for all running tasks.While this feature simplifies cluster operations, it relieson a heavily modified Linux kernel and GNU C library.Because of the deep changes required by the system, theScyld company sits in the critical path of many bug andsecurity fixes. These fundamental changes require Scyldto take on many (but not all) duties of the distributionprovider.Other clustering projects such as Warewulf [16],LinuxBIOS [17], and OpenMosix [18] offer interestingcapabilities, but do not provide a complete solution.They require a Linux distribution to be installed andconfigured a priori by experienced systemadministrators, a critical shortfall for our choice. Theactual ingredients in these projects cover a small part ofa full distribution: OpenMosix provides a kernel module(albeit with some elegant capabilities), and LinuxBIOSspecifies a small assembly/C initialization pre-kernel.Warewulf is the most ambitious of the three and canconfigure a shared environment in the cluster via animage-like file system on the master node.3.Grid RequirementsThe GEON grid imposes a regular structure on itsconstituents. At minimum, each partner site runs aGEON pop node that acts as a point-of-presence in thegrid, and corresponds to a Rocks frontend cluster

appliance type. The pop node optionally coordinatesadditional local machines at the site for providing data orcomputation capability.A GEON pop node must maintain a core set of services.In order to ensure interoperability between nodes atdifferent sites, SDSC provides a comprehensivestandardized software stack definition that includes thecore operating system, currently Linux, along withhigher-level development environments such as Weband Grid Service libraries, and end user environments(portals). The software stack definition includes thenecessary security information to link nodes to theGEON grid security infrastructure.3.1.ConsistencyThe first major challenge for systems management of theGEON grid is maintaining a consistent software stack onall machines, currently 40 in total. These hosts arephysically distributed among various administrativedomains at different sites and connected through thecommodity Internet1. Uniformity of the software stackis required to ensure interoperability between servicesrunning on different sites in the grid. The GEONgriduses the NMI grid software release [19] as the base of itsgrid software stack. In addition, we provide higher-levelsoftware such as the portal environments, various webservices libraries, an OGSI grid service container andlibraries, and GEON-specific services. In order to dealwith the complexity of so many interoperatingcomponents, we have chosen to integrate and test thesystem as a whole unit, which when verified is pushedout to all nodes in the GEON grid.Second, the partner sites may deploy additional grid orweb services applications on the GEON machinesbeyond those built into the software stack. Theseservices must be persistent across reinstallations andsystem updates. Finally, partner sites may deploydatasets into the GEON network by hosting them onGEON machines. These datasets must also be preservedacross software stack updates.Unconstrained customization, however, leads tosoftware incompatibilities that require significantadministration overhead. Therefore the needs ofcustomization must be balanced with the need forefficient system management.3.3. RequirementsThe following are specific requirements we havedetermined for the GEON deployment, which webelieve are similar to other grid systems:1.Centralized software stack definition. GEONcentral node will define the base software stackused to instantiate all other nodes.2.The ability to push the software stack to GEONpop nodes over the Internet with little or noadministration.3.Enable local site autonomy by definingacceptable node types for compute, data, andsite-specific customized nodes.4.Ability to add site-specific constraints. Allowcustomized software with durable site-specificenvironment and settings. These softwarecomponents should survive the process ofupgrading or reinstalling the base distribution.5.Update software and security patches. UseINCA framework [21] to monitor softwareversions of important middleware components.The ability to identify and incorporate changesto the base software stack.6.Incremental and hierarchical base distributionupdates. Updates to the pop frontend nodesmust be automatically pushed to the computeand data nodes at the site.7.Remotely update critical patches and thesoftware stack. At the same time sites withvarying administrative requirements andpolicies should be able to add additional rulesto the basic update mechanism.3.2. Controlled CustomizationThe second major challenge of the project is to managethe constituent machines to allow controlledcustomization of the machines at each of the partnersites. There are three reasons for local customization:First, each site may have specific system requirementsfor its configuration. For example, software and securitypatches, firewall integrity, and other management toolsspecific to the site must be accommodated. Each partnersite may also configure the GEON machines in such away to leverage additional resources it may have. SDSC,for example, will provide gateways to large computeclusters (TeraGrid [20] and Datastar) and data archives(HPSS).1Few sites are connected using the higher bandwidthInternet2 backbone

8.Nodes or clusters that join the grid shouldintegrate easily and be immediately consistentwith the base grid software stack.While keeping in mind that the sites own their localresources and have full control of them, the GEONsystem must provide a robust, basic level of systemsmanagement that can be extended.4.Wide Area KickstartTo address the primary requirement of the GEON grid, alow management cost for grid-wide software installationand updates, we extended the Rocks distribution toperform full cluster installations over wide areanetworks. While compute nodes in Rocks alwaysemployed the LAN to install software from the frontendmachine, the frontend itself had to be integrated with CDor DVD media. This strategy, while appropriate forcluster instantiations of basic Rocks software, isinsufficiently flexible for the dynamic needs of theGEON grid. Specifically, we considered affecting gridwide software with mailed disk media unacceptable.4.1. CentralThe wide area cluster integration involves a Centralserver that holds the software stack. Frontend pop nodesof the grid obtain a full Rocks cluster distribution fromthe central. This distribution is suitable to install localcompute, data, and customized nodes. Since the softwarestack defines the whole of the base system, and becausesecurity patches and feature enhancements are commonduring updates, any part of the stack may be changed.The challenge is to retrieve all components necessary forintegration from central over the network, including theinstallation environment itself. We require a small staticbootstrap environment for the first pop initialization,which contains only network drivers and hardwareprobing features and fits onto a business card CD. Thisbootstrap environment is stored on the node to facilitatethe upgrade process.Figure 4 shows the wide-area kickstart architecture. Acentral server holds the base Rocks distribution,including the Linux kernel and installation environment,along with standard rolls such as the HPC (highperformance computing), and the domain-specificGEON roll. We initialize GEON frontend pops over thewide-area Internet from this central.The software disbursement methodology is optimizedfor unreliable networks, and possesses greater awarenessand tolerance for failure than the local-area Rocks installprocess for between frontend and compute nodes. Thefrontend pop locally integrates the Rocks base and thevarious Rolls selected from the central server, and thecombined software stack is rebound into a new Linuxdistribution. The pop then initializes cluster nodes withits combined distribution. If the pop is current it is easyto see any new compute, data, or specialized nodesjoining the grid immediately posses a consistent versionof the base grid stack.4.2. WAN vs LANIn a traditional Rocks cluster, compute nodes also installover a network: the LAN between themselves and thefrontend. The frontend uses PXE or DHCP requestsfrom compute nodes to perform a registration step,information from which it uses to generate customkickstart files for each node in the cluster.In the wide area, we do not have the luxury of PXE orDHCP. The only communication method over the WANis HTTP. In essence a frontend is grown over severalsteps. The strategy gives a nascent frontend enoughinformation and software to generate its own fullyfunctional kickstart file, which it then uses to installitself.The frontend initially boots from a small environment( 10MB on a CD), and specifies the name of a central itwishes to be cloned from. The central then sends a fullinstaller over HTTP, which enables the frontend torequest a kickstart file. The kickstart file returned isminimal, since central knows little about the

The free Rocks cluster distribution takes a fresh perspective on cluster installation and management to dramatically simplify version tracking, cluster management, and integration. The toolkit centers around a Linux distribution bas