README For Installing Red Hat HPC Solution - Beta

Transcription

README for Installing Red Hat HPC Solution - BetaTable of ContentsREADME for Installing Red Hat HPC Solution - Beta.1What is the Red Hat HPC Solution?.1Installation Prerequisites.1Installation Procedure.2Preparing To Install .2Starting the Install .2Installing Additional Red Hat HPC Kits.3Verifying the Red Hat HPC install.3Adding Nodes to the Cluster.4Managing Node Groups.7Adding RPM Packages in RHEL to Node Groups.8Adding RPM Packages not in RHEL to Node Groups.9Adding Fedora Repository to the Installer Node.11Associating a Repository to Node Groups.11Adding Kit Components to Node Groups.13Synchronizing Files in the Cluster.14Updating the Installer node and the Compute Node Repository.16What is the Red Hat HPC Solution?The Red Hat HPC Solution is a fully integrated software stack that enables the creation, management and usageof a high performance computing cluster running Red Hat Enterprise Linux. The cluster management toolsprovided with the Red Hat HPC Solution are based on Platform OCS from Platform Computing Corporation.For more information about Platform OCS, visit http://www.platform.com/Products/platform open cluster stackInstallation PrerequisitesInstalling Red Hat HPC Solution - Beta (Red Hat HPC) will require one system to be designated as an installernode. This installer node will be responsible for installing the rest of the machines. Prior to installing Red HatHPC confirm that the designated machine has Red Hat Enterprise Linux 5.1 installed and meets the followingrequirements: SELinux must be disabled. One or more Network Interfaces not using DHCP, but statically defined IP addresses. These should beconnected to the networks where the machines will be provisioned. Installed with Red Hat Enterprise Linux version 5 update 1. A partition with at least 10GBytes free. Red Hat Enterprise Linux Version 5 installation media. A valid subscription to Red Hat Network is required including an entitlement to Red Hat HPC Channel The firewall (iptables) must be configured to permit the services needed for installation on all networksused to provision nodes (HTTP, HTTPS, TFTP, DNS, NTP, BOOTPS, etc) Red Hat HPC will create a private DNS zone for all machines under its control. The name of this zone1 of 17

must NOT be the same as any other DNS zone within the organization where the cluster is installed.Installation ProcedurePreparing To InstallVerify that the installer node meets the prerequisites above.Register on Red Hat Network and subscribe to the appropriate channels.Starting the InstallLog into the machine as root and install the Red Hat HPC bootstrap RPM by running the following:# yum install ocs# source /etc/profile.d/kusuenv.shThe Red Hat HPC bootstrap (called ocs) command will be downloaded. The Red Hat HPC package provides atool for completing the installation and setup of Red Hat HPC. Run the following:# /opt/kusu/sbin/ocs-setupThe script will detect your network settings and provide a summary per NIC like:NIC: eth0 Device eth0IP 172.25.243.44Network 172.25.243.0Subnet 255.255.255.0mac 00:0C:29:C4:61:06Gateway 172.25.243.2dhcp Falseboot 1Red Hat HPC cannot setup provisioning on DHCP configured networks only statically configured networks. Thesetup script will ask if you want to provision on all networks, and if not which ones to provision on.Red Hat HPC creates a separate DNS zone for the nodes it installs. The tool will prompt for this zone.Warning: Do not use the same DNS zone as any other in your organization. Using an existing zone will causeDNS name resolution problems.Red Hat HPC needs to store a copy of the Operating System media, and installation images. The setup scriptwill prompt for the location of the directory to store the Operating System. The default is /depot. If anotherlocation is used a symbolic link to /depot will be created.The setup script will need to copy the Red Hat Enterprise Linux 5.1 media on to the local disk in order to buildthe repository for installing nodes. It will ask for the DVD/CD's, a directory containing the contents of the OSmedia, or an ISO file providing the media.The setup script will copy the Operating System to /depot this will take some time (5 10 minutes from aCD/DVD). Once completed you will see something like:Congratulations! You should be able to install compute nodes on:2 of 17

Network 1.2.3.4 on interfaceethXThe installer node is now ready to begin installing other nodes in the cluster.Installing Additional Red Hat HPC KitsAdditional software tools such as Nagios and Cacti are packaged as software kits. Software packaged as akit is much easier to install onto a Red Hat HPC Cluster. A kit contains, rpms for the software, rpms for metadata and configuration files. To install Nagios and Cacti onto the Red Hat HPC cluster use the followingcommands:# yum install ocs-kit-cacti# /opt/kusu/sbin/install-kit-cacti# yum install ocs-kit-nagios# /opt/kusu/sbin/install-kit-nagiosThe yum command above downloads the kit from Red Hat Network. Included in the kit is an installation scriptthat adds the kit to the Red Hat HPC cluster repository and rebuilds the cluster repository. Every kit that isdownloaded from Red Hat Network has a corresponding script used to install the kit into the cluster repository.Verifying the Red Hat HPC installOnce the installer node is successfully configured the next step is to verify that all software components areinstalled and working correctly. The following steps can be used to verify the Red Hat HPC Install1. Start the web browser. The cluster homepage should display2. Check for any hardware issues by using the dmsg command3. Check all network interfaces to see if they are configured and up.a. # ifconfig more4. Verify the routing table is correcta. # routeb. Make sure the following system services are running:ServiceWeb ServerDHCPDNSXinetdMySQLNFSAutoFSStatus commandservice httpd statusservice dhcpd statusservice named statusservice xinetd statusservice mysqld statusservice nfs statusservice autofs status5. Run some basic Red Hat HPC commands3 of 17

a. List the installed repositoriesi. # repoman –lb. List the installed kitsi. # kitops –lc. Run the Node Group Editori. # ngeditd. Run the Add Host tooli. # addhoste. Check that cacti is installedi. From the Web browser enter the following URL:http://localhost/cactiii. Login to Cacti with username: admin, password: adminf. Check that Nagios is installedi. From the Web browser enter the following URL:http://localhost/nagiosii. Login to Nagios with username: admin, password: adminAdding Nodes to the ClusterAdding Nodes to a Red Hat HPC cluster is accomplished by running the addhost tool. Addhost listens on anetwork interface for nodes that are PXE booting and adds them to a specified node group. Node groups aretemplates that define common characteristics such as network, partitioning, operating system and kits for allnodes in a node group. To add nodes, open a terminal window or login to the installer node as root:1. # addhost2. Select the node group for the new nodes. Normally compute nodes are added to compute-rhel4 of 17

3. Select the network interface to listen on for new PXE booted node4. Indicate the rack number where the nodes are located5 of 17

5. Addhost will now wait for the nodes to boot6. Boot the nodes you want to add to the cluster6 of 17

7. When a node is successfully detected by addhost a line will appear in the ‘installing node status’ window.8. Exit add host when Red Hat HPC has detected all nodes.Managing Node GroupsRed Hat HPC cluster management is built around the concept of node groups. Node Groups are a powerfultemplate mechanism that allows the cluster administrator to define common shared characteristics among agroup of nodes. Red Hat HPC ships with a default set of node groups for, Installer nodes, packaged installedcompute nodes, diskless compute nodes and imaged compute nodes. The default node groups can be modified7 of 17

or new node groups can be created from the default node groups. All of the nodes in a node group share thefollowing: Node Name formatOperating System RepositoryKernel parametersKits and componentsNetwork Configuration and available networksAdditional rpm packagesCustom scripts (for automated configuration of tools)PartitioningA typical HPC cluster is created from a single installer node and many compute nodes. Normally computenodes are exactly the same as each other with just a few exceptions, like the node name or other host specificconfiguration files. A node group for compute nodes makes it easy to configure and manage 1 or 100 nodes allfrom the same node group. The ngedit command is a graphical TUI (Text User Interface) run by the clusterAdministrator to create, delete and modify node groups. The ngedit tool modifies cluster information in the RedHat HPC database and also automatically calls other tools and plugins to perform actions or updateconfiguration files automatically. For example, modifying the set of packages associated with a node group inngedit automatically calls cfm (configuration file manager) to synchronize all of the nodes in the cluster usingyum to add and remove the new packages, while modifying the partitioning on the node group notifies theadministrator that a re-install must be performed on all nodes in the cluster in order to change the partitioning onall nodes. The Red Hat HPC database keeps track of the node group state, thus several changes can be madeto a node group simultaneously and the physical nodes in the group can be updated immediately or at a futuretime and date using the cfmsync command.Adding RPM Packages in RHEL to Node GroupsOpen a Terminal and run the node group editor as root.# ngeditSelect the compute-rhel node group and move through the Text User Interface screens by pressing F8 or bychoosing next on the screen. Stop at the Optional Packages screen.8 of 17

Additional rpm packages are added by selecting the package in the tree list. Pressing the space bar expands orcontracts the list to display all of the available packages. By default packages are sorted alphabetically. The listof packages can be sorted by Red Hat groups, just choose Toggle View to re-sort the packages. Select theadditional packages using the spacebar when a package is selected an asterisk will display beside the packagename. Package dependencies are automatically handled by yum, thus if any selected package requires otherpackages they will be automatically included when the package is installed on the cluster nodes. Ngedit willautomatically call cfm to synchronize the nodes and install new packages but will not automatically removepackages from nodes in the cluster (this is by design). If required pdsh and rpm can be used to completelyremove packages from the rpm database on each node in the cluster.Adding RPM Packages not in RHEL to Node GroupsRed Hat HPC maintains a repository containing all of the rpm packages that ship with Red Hat Enterprise Linux,for most customers this repository is sufficient. Rpm packages that are not in Red Hat Enterprise Linux can alsobe added to a Red Hat HPC repository by placing the rpms into the appropriate contrib directory under /depot.For example:1. Start with the rpms that are not in Red Hat Enterprise Linux or in a Red Hat HPC Kit2. Create the appropriate subdirectories in /depot/contrib:# cd /depot# mkdir –p rhel/5/x86 64# cp foo.rpm /depot/contrib/rhel/5/x86 64/foo.rpm3. Rebuilt the Red Hat HPC repository with repoman:# repoman –u –r rhel5 x86 649 of 17

4. It will take some time to rebuild the repository and associated images.5. Run ngedit and navigate to the Optional Packages screen.6. Select the new package by navigating within the package tree and using the spacebar to select.7. Continue through the ngedit screens and either allow ngedit to synchronize the nodesimmediately or perform the node synchronization manually with cfmsync –p at a later time.Example: selecting a rpm that is not included in Red Hat Enterprise LinuxThe contrib. Directory may not exist in /depot/. if it does not exist create the directory. Contributions can beadded to more than one Red Hat HPC repository, the directory structure is as follows:/depot/contrib/ os name / version / architecture For example adding contributions to a Fedora Core 6 on x86 repository requires the following directory structurein /depot/contrib./depot/contrib/fedora/6/i38610 of 17

Adding Fedora Repository to the Installer NodeAdding other Red Hat based Operating Systems such as Fedora to Red Hat HPC is quite straight-forward, butdoes require a few steps. In order to Add Fedora to the installer node you will need a copy of the Fedora mediaor a Fedora iso. Once you have the Fedora media or iso, just add Fedora to Red Hat HPC using the kitopscommand. Type the following to add Fedora, mounted on /media/CDROM to Red Hat HPC:# kitops -a -m /media/CDROM/ --kit fedoraAdding a kit to Red Hat HPC makes the software available for use in a repository, so the next step is to create aFedora repository# repoman –n –r Fedora-6-i386Now add the required Operating System kit to the repository# repoman –a –r Fedora-6-i386 –kit fedoraAdd the Red Hat HPC base kit to the repository. The base kit contains all of the tools required by Red Hat HPCfor managing the cluster.# repoman –a –r Fedora-6-i386 –kit baseThe Operating System and base kits are always required in a repository, at this point the repository can be usedto install nodes or you can add more kits to the repository. One final step must be performed on the repository torebuilt the repository with the new Operating System and base kit# repoman –u –r Fedora-6-i386Congratulations, you should now have a new repository added to your cluster, you can view the availablerepositories with the following command:# repoman –lAssociating a Repository to Node GroupsA single Red Hat HPC installer node can contain more than one Red Hat Operating System Repository. Addinga new Operating System such as Fedora to Red Hat HPC involves several steps:1. Add Fedora Operating System CDs/DVD/iso as a Red Hat HPC Kit using kitops2. Create a new repository for Fedora using repoman -n3. Add the Fedora kit to the new repository with repoman -a4. Add the Red Hat HPC base kit to the repository with repoman -a5. Update the repository with repoman -u. This assembles all of the kits into a complete repositoryOnce steps 1-5 are completed the new repository can be added to Node Groups with the ngedit tool. Runngedit from a terminal, and create a copy of an existing node group. In our example we will copy the computerhel node group.11 of 17

Edit the newly created node group then on the Repository screen change the repository to Fedora (or yoursnapshot repository)12 of 17

By changing the repository to your new repository you have effectively added this new node group to your newrepository. Continue moving through the rest of the ngedit screens selecting or modifying settings as needed.Upon exit, ngedit will automatically update the database .Adding Kit Components to Node GroupsAdding kit components to nodes in a node group is very similar to adding additional rpm packages. Open aterminal and start the ngedit tool choose the compute-rhel node group, press F8 or choose Next and proceed tothe Components screen. Each Red Hat HPC kit installs an application or a set of applications, the kit alsocontains components which are meta-rpm packages designed for installing and configuring applications onto acluster. By choosing the appropriate components it is easy to configure all nodes in a node group. For examplethe cacti kit contains two components, component-cacti and component-cacti-monitored-node. The componentcacti installs and configures cacti, sets up the web pages and connection to the database, this component isnormally installed on the cluster installer node or any other node (or set of nodes) designated as themanagement node. The other component in the cacti Kit, component-cacti-monitored-node contains the cactiagent code that runs on compute nodes in the cluster. Most Red Hat HPC Kits come configured with automaticnode group association and component selection, this makes the process of adding Kits to node groups mucheasier than manually selecting them in ngedit. For example, the Platform Lava Kit automatically associates theLava master with the installer node group and the Lava compute nodes with the compute-rhel node group.13 of 17

Synchronizing Files in the ClusterHPC clusters are built from many individual compute nodes and all of these nodes must have copies of commonfiles such as /etc/passwd, /etc/shadow, /etc/group, and others. Red Hat HPC contains a file synchronizationservice called cfm (Configuration File Manager). Cfm runs on each compute node in the cluster and when newfiles are available on the installer node a message is sent to all of the nodes notifying them that files areavailable. Each compute node connects to the installer node and copies the new files using the httpd daemonon the installer node. All of the files to by synchronized by cfm are located in the directory tree/etc/cfm/ nodegroup . Cfm organizes file synchronization trees by node group. A directory exists for eachnodegroup under /etc/cfm and below the nodegroup name is a tree that replicates the file structure of themachines in the node group, for example:14 of 17

In the screenshot above /etc/cfm directory contains several node group directories such as compute-disklessand compute-rhel. In each of those directories is a directory tree where the /etc/cfm/ nodegroup directoryrepresents the root of the tree. The /etc/cfm/compute-rhel/etc directory contains several files or symbolic links tosystem files. These system files will be synchronized across all of the nodes in the node group automatically bycfm. Creating symbolic links for the files in cfm allows the compute nodes to be automatically synchronized withsystem files on the installer node.Adding files to cfm is very simple just create the new file in the appropriate directory, you must create all of thedirectories and subdirectories for the file then place the file in the correct location. Existing files can also have a filename .append file. The contents of a filename .append file is automatically appended to the existing filename file on all nodes in the node group.To notify all of the nodes in all node groups or nodes in a single node group use the cfmsync command, forexample# cfmsync –f –n compute-rhelSynchronizes all files in the compute-rhel node group.# cfmsync –fSynchronizes all files in all node groups15 of 17

For more information on cfmsync view the man pages.Updating the Installer node and the Compute NodeRepositoryRed Hat HPC manages updates to the installer nodes differently from all other nodes in the cluster. The rpmpackages and updates to the Operating System Repository for all nodes provisioned by the installer (and thatincludes compute nodes and diskless nodes) is managed independently from updating the installer node. Toupdate the installer node use the following command:# yum updateThe yum tool will download all of the required updates for the operating system and install them on the Installernode. Since updating installer nodes and compute nodes is separate you can choose just to update the installernode – and either choose to update the compute nodes or not update the compute nodes.To update the compute nodes in a Red Hat HPC cluster the following command must be used:# repopatch –r rhel5 x86 64The repopatch tool will download all of the required updates for the operating system and install them into therepository for the compute nodes. Repopatch may display an error if it is not properly configured, for example:# repopatch –r rhel5 x86 64Getting updates for rhel-5-x86 64. This may take awhile Unable to get updates. Reason: Please configure/opt/kusu/etc/updates.confEdit the /opt/kusu/etc/updates.conf file adding your username and password for Red Hat Network to the [rhel]section of the file, for example:[fedora]url /[rhel]username password- url https://rhn.redhat.com/XMLRPCyumrhn https://rhn.redhat.com/rpc/apiAfter configuring the /opt/kusu/etc/updates.conf file repopatch should download all of the updates from Red HatNetwork and create an update kit which is then associated with the rhel-5-x86 64 repository using ngedit.Repopatch should automatically associate the update kit with the correct repository, you can view the list ofupdate kit components from ngedit on the Components screen and list the available update kits with the kitopscommand, for example:16 of 17

Note: Remember that yum is used to update the installer node directly from Red Hat Network or other yumrepositories. The repopatch command updates the compute nodes or other nodes provisioned by the installernode.17 of 17

Additional software tools such as Nagios and Cacti are packaged as software kits. Software packaged as a kit is much easier to install onto a Red Hat HPC Cluster. A kit contains, rpms for the software, rpms for meta-data and configuration files. To install Nagios and Cacti onto the Red Hat HPC cluster use the following commands: