Cluster Management - University Of New Mexico

Transcription

Cluster ManagementCluster ManagementJames E. PrewettOctober 8, 2008

Cluster ManagementOutlineCommon Management ToolsOSCARROCKSOther Popular ClusterManagement toolsSoftware Management/ChangeControlCfengineGetting Started with CfengineParallel Shell Tools / Basic ClusterScriptingPDSHDancer’s DSHClusteritC3 tools (cexec)Basic Cluster ScriptingBackup ManagementLogging/ Automated Log AnalysisRegular Expression ReviewRegular ExpressionMeta-charactersRegular ExpressionMeta-characters (cont.)SECLogsurfer Security plans/procedures, RiskAnalysisNetwork Topologies and PacketFilteringLinux TricksCluster-specific issuesChecking Your WorkRegression TestingSystem / Node / Software ChangeManagement LogsHow to know when to upgrade,trade–offsMonitoring tools

Cluster ManagementCommon Management ToolsOSCAROSCAR InformationVital Statistics:Version:Date:Distribution Formats:URL:5.1June 23, 2008tar.gzhttp://oscar.openclustergroup.org/

Cluster ManagementCommon Management ToolsOSCAROSCAR cluster distribution features:ISupports X86, X86 64 processorsISupports Ethernet networksISupports Infiniband networksIGraphical Installation and Management tools. if you like that sort of thing

Cluster ManagementCommon Management ToolsOSCAROSCAR (key) Cluster PackagesWhats in the box?ITorque Resource ManagerIMaui SchedulerIc3ILAM/MPIIMPICHIOpenMPIIOPIUM (OSCAR User Management software)IpFilter (Packet filtering)IPVMISystem Imager Suite (SIS)ISwitcher Environment Switcher

Cluster ManagementCommon Management ToolsOSCAROSCAR Supported Linux DistributionsIRedHat Enterprise Linux 4IRedHat Enterprise Linux 5IFedora Core 7IFedora Core 8IYellow Dog Linux 5.0IOpenSUSE Linux 10.2 (x86 64 Only!)I“Clones of supported distributions, especially open source rebuildsof Red Hat Enterprise Linux such as CentOS and Scientific Linux,should work but are not officially tested.”

Cluster ManagementCommon Management ToolsOSCAROSCAR InstallationIInstall a supported Linux on the erver NodeLeave at least 4GB free in each of / and /var!The easy way is to make 1 big partition for / !ICreate repositories for SystemInstaller# mkdir /tftpboot# mkdir /tftpboot/oscar# mkdir /tftpboot/distro# mkdir /tftpboot/distro/OS-version-archIUnpack the oscar-repo-common-rpms and theoscar-repo-DISTRO-VER-ARCH tarballs into /tftpboot/oscar/ICopy your RPMs into the /tftpboot/distro/OS-version-archdirectory

Cluster ManagementCommon Management ToolsOSCAROSCAR Installation (cont.)IInstall yum unless your OS already has itIInstall yume:# yum install stall oscar-base RPM:# yume - -nogpgcheck1 - -repo /tftpboot/oscar/common-rpmsinstall oscar-base1This is not in the documentation, but I found that the packages were notsigned causing yume to barf unless you passed it the - -nogpgcheck option.YMMV

Cluster ManagementCommon Management ToolsOSCAROSCAR Server Node Network ConfigurationIGive your host a hostname! The default of “localhost” or“localhost.localdomain” will *not* work.IConfigure the “Public” network interface as per the requirements ofyour local network. This is the network that will connect to theInternet (or the lab network), so configure it appropriately.IConfigure the “Private” network interface using a “Private” IPaddress.The IANA has reserved the following three blocks for privateinternets:III10.0.0.0 – 10.255.255.255 (10/8 CIDR block)172.16.0.0 – 172.31.255.255 (172.16/12 CIDR block)192.168.0.0 – 192.168.255.255 (192.168/16 CIDR block)

Cluster ManagementCommon Management ToolsOSCAROSCAR Cluster InstallationOnce the Server is installed and configured, start the installer!# cd /opt/oscar# ./install cluster device This will:IInstall all required RPMsIupdate the /etc/hosts file with OSCAR aliasesIupdate the /etc/exports fileIupdate system initialization scripts (/etc/rc.d/init.d/)Irestart any affected servicesThen the installer GUI will be launched.

Cluster ManagementCommon Management ToolsOSCARThe OSCAR Installation Wizard:ISelect your packagesIConfigure the packagesIInstall the Server packagesIBuild an image for thecompute nodesIDefine the compute nodesIConfigure networkingIComplete the setupITest the cluster!

Cluster ManagementCommon Management ToolsOSCARBuild Client ImageIChoose an image nameIChose a package fileIChose a Target DistributionISpecify package repositoriesISpecify Disk Partition fileIPick IP assignment methodIPick Post Install action

Cluster ManagementCommon Management ToolsOSCARDefine OSCAR Clients (Compute Nodes)IPick the image to installISpecify the domain nameISpecify the base hostnameISpecify the number of hostsISpecify first number toappend to the base hostnameISpecify the “padding”ISpecify the starting IPISpecify the subnet maskISpecify the default gatewayNOTE: You may only define 254 clientsat a time!

Cluster ManagementCommon Management ToolsOSCARSetup OSCAR NetworkingICollect MAC AddressesIOptionally tweak SIinstallation modeIBuild Boot CDORISetup Network BootIOptionally choose to UseYour Own Kernel (UYOK)

Cluster ManagementCommon Management ToolsOSCARFinishing Up!IGo to “Monitor Cluster Deployment” to monitor the progress of theinstallation.IReboot the compute nodes.IGo to “Complete Cluster Setup”IRun the OSCAR Test suite (unless you’re feeling brave!)IEnjoy your new cluster!

Cluster ManagementCommon Management ToolsOSCARReally, Its *that* simple!IOSCAR comes with quite a few “standard” cluster packages.IOSCAR uses SystemImagerISystemImager is GoodIRPM packages may be added by placing them in the appropriatedirectory, rebuilding the image, and rebooting the nodes.TM

Cluster ManagementCommon Management ToolsROCKSROCKS InformationVital Statistics:Version:Date:New development:Distribution Formats:URL:5.0November 12, 2006September 2008tar.gzhttp://oscar.openclustergroup.org/

Cluster ManagementCommon Management ToolsROCKSROCKS cluster distribution features:ISupports X86, X86 64 processorsISupports Ethernet networksISupports Specialized networks and components (Myrinet,Infiniband, nVidia GPU)

Cluster ManagementCommon Management ToolsROCKSBeginning the ROCKS InstallationFor the Installation, you will need:IKernel/Boot Roll CDIBase Roll CDIWeb Server Roll CDIOS Roll CD - Disk 1IOS Roll CD - Disk 2 ORIALL Red Hat Enterprise Linux5 update CDsIALL CentOS 5 update 1 CDsIALL Scientific Linux 5 update1 CDsIBoot the “Kernel/Boot RollCD” on the serverIYou should see:IType “front-end” to begin theinstallation

Cluster ManagementCommon Management ToolsOther Popular Cluster Management toolsOther Popular Cluster Management toolsIXcatIopenMosix (RIP March 1, 2008)ILinuxPMI Continuation of 2.6 branch of openMosix (*NOT* SingleSystem Image)IOpenSSIIScyldIIBM’s CSMIAlso notable: Sandia’s CIT22It may not be the most popular, but it is well designed and pretty darncool!

Cluster ManagementSoftware Management/Change ControlWhat is “Change Control”?IAutomatically manage configuration filesITake care of maintenance tasks like running backupsIManage things like “cron jobs” in a centralized place. Automate and reduce the headache of administration!

Cluster ManagementSoftware Management/Change ControlCfengineCfengine InformationVital Statistics:Version:2.2.8Date:August 5, 2008Distribution Formats:tar.gzURL:http://www.cfengine.org/

Cluster ManagementSoftware Management/Change ControlCfengineWhat is Cfengine good for?IEnsure proper versions of software are installedITemplate-based creation of configuration filesIVerify permissions & ownership of files and directoriesIStandardize properties (netmask, domain name, etc.) of hostsIEnsure checksums of filesICheck disk capacity

Cluster ManagementSoftware Management/Change ControlCfengineInstalling CfengineItar zxf cfengine-2.2.8.tar.gzIcd cfengine-2.2.8I./configureImakeImake installItest: /usr/local/sbin/cfagent -v

Cluster ManagementSoftware Management/Change ControlGetting Started with CfengineGetting Started with CfengineIn order to get started with Cfengine, we will need 3 things:IA crontab entry to run cfexecd periodically30 * * * * /usr/local/sbin/cfexecd -FIAn update.conf fileIA cfagent.conf file3Cfengine can also be run as a daemon.

Cluster ManagementSoftware Management/Change ControlGetting Started with Cfengineupdate.conf — control section######################################### Distribute the configuration ol:# distribute the files, then clean up our messworkdir ( /var/cfengine )actionsequence ( copy tidy )policyhost ( cfengine.hpc.unm.edu ) # master hostdomain ( hpc.unm.edu )master cfinput ( /cfengine/inputs )sysadmin root@hpc.unm.edu

Cluster ManagementSoftware Management/Change ControlGetting Started with Cfenginecfagent.conf — control sectioncontrol:domain ( hpc.unm.edu )netmask ( 255.255.252.0 )sysadm ( root@hpc.unm.edu )timezone ( MST )actionsequence (mountall# mount filesystems in /etc/fstabnetconfig# check the network interfaceresolve# check the DNS resolvertidy# ‘‘tidy’’ Cfengine logfilesfiles# check file permissionsdirectories# ensure directories existprocesses )# check processes

Cluster ManagementSoftware Management/Change ControlGetting Started with Cfenginecfagent.conf — files and directories section# check important filesfiles:/etc/passwdmode 644 owner root action fixall/etc/shadowmode 600 owner root action fixall/var/spool/torque/pbs environment mode 644 owner root action fixall/var/spool/torque/server name mode 644 owner root action fixall#check that TORQUE directories existdirectories:/var/spool/torque/owner root mode 755 action fixall/var/spool/torque/aux/owner root mode 755 action fixall/var/spool/torque/mom logs/ owner root mode 755 action fixall(etc.)

Cluster ManagementSoftware Management/Change ControlGetting Started with Cfenginecfagent.conf — processes section# Here we define processes we want to ensure are running# We could also define ones we wanted to kill or restart# Strings are regular expressions used to match the name# of the processprocesses:“pbs server” matches 1 # ensure PBS is running“maui”matches 1 # ensure Maui is running

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingPopular Parallel ShellsIPDSHIDancer’s DSHIClusteritIC3 tools

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingPDSHPDSH InformationVital :Distribution Formats:URL:2.16April 3, 2008“sliding window” parallel algorithmCRPM, tar.gzhttps://computing.llnl.gov/linux/pdsh.html

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingPDSHPDSH Remote command modulesThese are ways of accessing the remote nodes. Tune as per yoursecurity/performance requirements!IRSHISSHIKerberosIMRSH, QSH, MQSH, XCPU (whatever those are ;)

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingPDSHPDSH Node SpecificationISpecify a list of hosts:pdsh -w node01,node05,node17 - - commandIspecify a range of hosts:pdsh -w node01-node100 - - commandISpecify a range of hosts, excluding a set in the middle:pdsh -w node01-node100 -x node20-node30 - - command

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingPDSHPDSH Node Specification (cont.)ISpecify a nodes in a netgroup “netgroup”:pdsh -g netgroup - - commandIExclude nodes in the netgroup “netgroup”:pdsh -X netgroup - - commandIExecute a command on all nodes in a file:export WCOLL /path/to/node-filepdsh - - command

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingDancer’s DSHDancer’s DSH InformationVital :Distribution Formats:URL:0.25.9August 15, 2007“Hierarchical invocation technique”“4 nodes accessing 4 nodes” .CDEB, .tar.gzhttp://www.netfort.gr.jp/ dancer/software/dsh.html.en

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingDancer’s DSHDancer’s DSH Node SpecificationIUse the global nodes file, /etc/dsh/machines.list:dsh -a -c - - commandIUse the list of nodes for “Rack 1” stored in HOME.dsh/group/rack1dsh -g rack1 -c - - command

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingClusteritClusterit InformationVital Statistics:Version: 2.5Date:“Parallelism”:Language:Distribution Formats:URL:August 15, 2007N-way FanoutC.tar.gzhttp://clusterit.sourceforge.net/

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingClusteritClusterit Node Specification (Groups and Lumps)IGroups are sets of nodes:IGROUP:computenode01node02ILumps are sets of groups:ILUMP:clustercomputestorageadmin

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingClusteritClusterit Node SpecificationISpecify a list of hosts:dsh -w node01,node04,node23 - - commandIExclude a list of hosts:dsh -x node03,node09,node17 - - commandISpecify a group of hosts:export CLUSTER /path/to/nodefiledsh -g compute - - commandISpecify a lump of hosts:export CLUSTER /path/to/nodefiledsh -g cluster - - command

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingC3 tools (cexec)C3 InformationVital :Distribution Formats:URL:4.0.1July 15, 2003“Sub-Cluster Staging”PythonRPM, age.shtml

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingC3 tools (cexec)C3 Cluster Node Specification file format/etc/c3.confISpecify a cluster with a head node with an external interface named“external-name” and an internal interface named “node0” and 64compute nodes named node01-node64.I/etc/c3.conf contents:cluster my-cluster{external-name:node0 #head nodenode[1-64] #compute nodes}

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingC3 tools (cexec)C3 Node SpecificationISpecify the default cluster:cexec commandISpecify a subset of nodes in the default cluster:cexec :6-53 commandISpecify a list of clusters:cexec cluster1: cluster2: command

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingBasic Cluster ScriptingBasic Cluster Scriptinggrep is your (best) friendIFind the CPU count on all of the nodes:pdsh “cat /proc/cpuinfo grep processor wc -l”IFind nodes with the wrong image version:export VER ”1.2.3”pdsh “cat /etc/image version grep \ ”ˆ VER\ \” hostname”

Cluster ManagementParallel Shell Tools / Basic Cluster ScriptingBasic Cluster ScriptingMore Basic Cluster Scriptingawk is a pretty good friend too!IFind nodes where the load is greater than 2:pdsh uptime awk ’{if( 11 2.0){print}}’IFind bad GM counts on all nodes:pdsh “/opt/mx/bin/mx counters awk ’/bad/ {if (\ 2 0) {print;}}’ ”

Cluster ManagementBackup ManagementBackup anything you can’t recreateBackup anything you can recreate but can’t recreate quicklyIUse backup anytime it would take longer to rebuild and reconfigurethan to restore.I“Longer” may be in terms of staff time or elapsed time or both.IConsider:IIIIIUser directories (not scratch!)Libraries and applications you’ve built on siteTcl module files in /usr/share/modules/modulefiles/System configuration files DNS, DHCP, NIS, etc.(Should that be everything in /etc/?)Node imagesThanks to Roy Heimbach for contributing this slide!

Cluster ManagementLogging/ Automated Log AnalysisLogging/Automated Log Analysis Tools:ISECILogsurfer Isplunk

Cluster ManagementLogging/ Automated Log AnalysisWhat can we find in our logfiles?What are we happily ignoring?IEvidence of misconfigurations:e.g. “/var/log/lastlog does not exist”ISecurity violationse.g. Illegal usersIHardware/Software errors e.g. Disk failures

Cluster ManagementLogging/ Automated Log AnalysisRegular Expression ReviewRegular Expression ReviewIs that line noise?This is a quick review of Perl Regular Expressions.ISimple ’as-is’ text string matching:I“cat” or “dog”IMeta-characters:I{}[]()ˆ . * ?\

Cluster ManagementLogging/ Automated Log AnalysisRegular Expression Meta-charactersRegular Expression Meta-charactersI. matches any single characterI* match the previous thing 0 or more timesI match the previous thing 1 or more timesI? match the previous thing 1 or 0 timesIˆ matches the beginning of the lineI matches the end of the lineI\’escapes’ the next characterI[] specifies a set or range of characters:eg. [a-z,A-Z,0-9] would match all alphanumeric characters

Cluster ManagementLogging/ Automated Log AnalysisRegular Expression Meta-characters (cont.)Regular Expression Meta-characters (cont.)I{n} match the previous thing exactly “n” timesI{n,} match the previous thing at least “n” timesI{n,m} match the previous thing at least “n” times, but not morethan “m” timesI() specifies groups of things or things to “save”the first group will be saved in 1, the second in 2, etc.I specifies “OR” inside of a groupeg. (cat dog) would match either “cat” or “dog”

Cluster ManagementLogging/ Automated Log AnalysisSECSEC InformationVital Statistics:Version:Date:Language:Distribution Formats:URL:2.4.2February 1, 2008Perl.tar.gz, DEB, RPM, FreeBSD and OpenBSDports, Gentoo portagehttp://www.estpak.ee/ risto/sec/

Cluster ManagementLogging/ Automated Log AnalysisSECQuick intro to SEC:SEC ComponentsIMessagesSingle lines of text in a logfileIRulesDo something in response to an incoming MessageIContextsPassive structures to store Messages

Cluster ManagementLogging/ Automated Log AnalysisSECDefault SEC RuleMatch all messages and print them# Print all messagestype singleptype regexppattern . desc unmatched message: 0 # note 0 is the entire messageaction logonlyThis, or something like it, should be the last rule in your ruleset

Cluster ManagementLogging/ Automated Log AnalysisSECSEC Filtering RuleIgnore messages we’re expecting# This machine has 4 processors# Ignore messages reporting what we expect!type singleptype RegExppattern kernel: Total of 4 processors activateddesc correct processors initializedaction none

Cluster ManagementLogging/ Automated Log AnalysisSECSEC Responding to messagesSound the alert!# This machine has 4 processors# Report any number other than that!# report problem.sh is a script we wrote to report this# to our adminstype singleptype RegExppattern (\S ) kernel: Total of (\d ) processors activateddesc incorrect processor count: 2 on host: 1action shellcmd report problem.sh 1 2

Cluster ManagementLogging/ Automated Log AnalysisSECSEC Contexts and CorrelationFinding, Blocking, and Reporting on “SSH scanners”# Store "Invaid user" messages from this host unless we’re blocking ittype singlecontinue TakeNextdesc invalid login from host 2ptype regexppattern \S \s \S \s \S \s \S \s sshd\[\d \]: Invalid user (\S ) from (\S ) context (!(block bad ssh- 2))action add bad ssh- 2# Block the host if we’ve gotten 10 "Invalid user" messages in a daytype SingleWithThresholddesc invalid login from host 2ptype regexppattern \S \s \S \s \S \s \S \s sshd\[\d \]: Invalid user (\S ) from (\S ) thresh 3action create block bad ssh- 2; \shellcmd iptables -A INPUT --source 2 -j REJECT ; \report bad ssh- 2 /usr/adm/bin/report-bad-host.pl 2 ; \delete bad ssh- 2window 10000000

Cluster ManagementLogging/ Automated Log AnalysisLogsurfer Logsurfer Information:Vital Statistics:Version:Date:Language:Distribution Formats:URL:1.7December 2006C.tar.gzhttp://www.crypt.gen.nz/logsurfer/

Cluster ManagementSecurity plans/procedures, Risk AnalysisSystem and Cluster Security!Watch Out!IIdentify the ProblemISecurity StrategiesIDealing with WeaknessesICluster Network TopologiesICluster Specific IssuesILinux TricksIChecking Your Work

Cluster ManagementSecurity plans/procedures, Risk AnalysisDefine the EnemyIData thievesIResource thievesIHackers there for various reasonsICuries script kiddiesIMalicious script kiddies

Cluster ManagementSecurity plans/procedures, Risk AnalysisAttack VectorsIRemote Attacks:Network Services allow access to the machineILocal Attacks:Insecure Priveledged Binaries allow Priveledge escalation

Cluster ManagementSecurity plans/procedures, Risk AnalysisSecurity Strategies. . . besides cutting the wireISecure CommunicationIHunt and kill unneeded servicesIApplication configurationIProtective Mechanisms

Cluster ManagementSecurity plans/procedures, Risk AnalysisIdentifying WeaknessesThe key here is to strike a balance between security and useabilityIIdentify and categorize running servicesAre they Really needed?IIdentify sensitive informationPasswords, Data, etc.IIdentify protective mechanismsTCPwrappers, iptables, firewall, etc.

Cluster ManagementSecurity plans/procedures, Risk AnalysisLimiting WeaknessesILocal weaknesses:ILimit use of installed privledged binariesIRemoved setuid/setgid bitsIIf you don’t use it, get rid of it!IRemote weaknesses:IClose unused portsILimit access to portsIIf you don’t use it, get rid of it!

Cluster ManagementSecurity plans/procedures, Risk AnalysisFinding servicesThey can’t hide!Iinetd(8) and xinetd(8) configuration filesIchkconfig(8)Iinit(8) scriptsIps(1)Ilsof(8) -iInmap(1)

Cluster ManagementSecurity plans/procedures, Risk AnalysisKilling ServicesIkill(1)Ichkconfig(8)Iinit(8) scriptsIinetd(8) and xinetd(8) configuration filesIchmod(1)

Cluster ManagementSecurity plans/procedures, Risk AnalysisCommon Cluster ServicesILogin Service(s)IFile Transfer Service(s)IFile Service(s)ITime ServiceIDomain name service (DNS)ICommon Configuration SericesIIIIDHCPNISorLDAPetc.

Cluster ManagementSecurity plans/procedures, Risk AnalysisLogin ServicesIrlogin, telnet, etc.ISSHIIKerberized versions availablePKI (GSI) versions available

Cluster ManagementSecurity plans/procedures, Risk AnalysisSSH Key Setupssh-keygen -N ”” -f /tmp/key# if you want password-less accesscp - -force /tmp/key /root/.ssh/identityrm - -force /tmp/keycat /tmp/key.pub /nfs/shared/authorized keyspdsh “cp /nfs/shared/authorized keys /root/.ssh/”

Cluster ManagementSecurity plans/procedures, Risk AnalysisSecure File TransferIscp(1)IIIIEncrypted connectionsKerberized versions availableUses ssh(1)sftp(1)IIIII“Simular” to ftp(1)Encrypted connectionsKerberized versions availableUses ssh(1)Clumsy!

Cluster ManagementSecurity plans/procedures, Risk AnalysisSecure X11 ConnectionsIUse ssh to “tunnel” X11 connections safelyIdefault ssh configuration files disable thisITo enable “X11 Forwarding”:IIIn sshd config add:X11Forwarding yesIn ssh config add:ForwardAgent yesForwardX11 yes

Cluster ManagementSecurity plans/procedures, Risk AnalysisUsing my admin tools from home. . .SSH tunnels for the win!IEVERYONE has used an X11 tunnel over SSHIHave you ever forwarded something else?IRun administration tools from “inside” the firewall, but still at homeIForward arbitrary ports – Encrypted!Issh -v -L local-port:remote-machine:remote-port local-machine -lrootIssh -v -L 1178:service1:1178 pq-admin.alliance.unm.edu -l root

Cluster ManagementSecurity plans/procedures, Risk AnalysisNetwork Topologies and Packet FilteringNetwork Topologies and Packet FilteringIPublic Network TopologyVS.IPrivate Network Toplogy

Cluster ManagementSecurity plans/procedures, Risk AnalysisNetwork Topologies and Packet FilteringPublic Network TopologyThe easy way. . .ISimpler to set upIAllows direct access to compute nodesIWorse overall cluster securityIALL nodes need packet filtering, security tweaksIAll nodes are potential targetsIBetter network throughput

Cluster ManagementSecurity plans/procedures, Risk AnalysisNetwork Topologies and Packet FilteringPrivate Network TopologyMight be worth the extra headacheIBetter security for entire clusterIRelaxed security on compute nodesIOnly login/admin nodes on public networkICompute/storage nodes access outside network via NATIDifficult to allow outside access to compute nodes

Cluster ManagementSecurity plans/procedures, Risk AnalysisNetwork Topologies and Packet FilteringPacket FilteringIStateless:Each packet is handled individuallyipchains — (OLD!!! NOBODY uses this anymore!)IStateful:Each packet is viewed as a part of a sessioniptables — Modern, *probably* in your kernel.IYou can filter based on:IIIIINetwork interfaceProtocol typeSource address and portDestination address and portOther parameters depending upon the protocol

Cluster ManagementSecurity plans/procedures, Risk AnalysisNetwork Topologies and Packet FilteringStateful Packet FilteringIKeeps track of active connectionsIExamines each packet based on their contextICan provide a more useable systemIControlled by iptables on Linux

Cluster ManagementLinux TricksProtecting a single machine with IPtablesWe’re not doing NATIiptables -A INPUT -m state ESTABLISHED,RELATED -jACCEPTIiptables -A INPUT -p tcp - -destination-port ssh -jACCEPTIiptables -A INPUT -j REJECT

Cluster ManagementLinux TricksProtecting a network with IPtablesHiding your cluster behind a NATIiptables -A INPUT -p tcp - -destination-port ssh -jACCEPTIiptables -A INPUT -m state - -state ESTABLISHED,RELATED-j ACCEPTIiptables -A INPUT -i INTERNAL INTERFACE -m state - state NEW -j ACCEPTIiptables -A INPUT -j REJECTIiptables -A FORWARD -j REJECT

Cluster ManagementLinux Tricks/proc ProtectionsTurning on network stack security featuresIPreventecho 0echo 1echo 1address spoofing: /proc/sys/net/ipv4/conf/*/accept source route /proc/sys/net/ipv4/conf/*/rp filter /proc/sys/net/ipv4/conf/*/log martiansIDisable ICMP redirectsecho 0 /proc/sys/net/ipv4/conf/*/accept redirectsITurn off bootp packet relayingecho 0 /proc/sys/net/ipv4/conf/*/bootp relayIIgnore ICMP bad error responsesecho 1 /proc/sys/net/ipv4/icmp ignore bogus error responsesIEnable syncookie protectionecho 1 /proc/sys/net/ipv4/tcp syncookies

Cluster ManagementLinux TricksCluster-specific issuesCluster-specific issuesISystem backdoors:IIcronatIOne user per node guaranteeIPasswordless authentication

Cluster ManagementLinux TricksCluster-specific issuesOne user per node. . . or the right number of users per nodeICompute nodes should be wholly allocated to the user(s) that thescheduler has given them toIOnly the scheduler knows who owns the nodesIStrategies:IIIModify NIS mapsModify /etc/passwdPAM modulesWe (UNM HPC) use pam pbssimpleauth distributed withTORQUE for most of our systems.

Cluster ManagementLinux TricksCluster-specific issuesPasswordless AuthenticationIJob launch can’t require passwordsISSH can be used via RSAAuthentication (Public Key)IIssues:IIManagement of host keysManagement of user keys

Cluster ManagementLinux TricksCluster-specific issuesRSA vs. DSA (the low–down)“In DSA, signature generation is faster than signature verification,whereas with the RSA algorithm, signature verification is very muchfaster than signature generation. . . . tml)In a nutshell:RSA can be used for both encryption and digital signatures.DSA is strictly a digital signature

Cluster ManagementLinux TricksChecking Your WorkChecking Your WorkInmap — port scannerINessus — vulnerability scannerISecurityfocus.comIIIISearch for your distribution & versionCompare vulnerabilties to services you runCompare vulnerabilities to setuid/setgid binaries on yoursystemBugtraq — for the seriously hardcoreThe up-and-coming info in the security world

Cluster ManagementLinux TricksChecking Your WorkFinding listening services with lsof:lsof shows which network files are open:% lsof -i awk ’/LISTEN/ print 1, (NF-2), (NF-1)’ sort uniqcondor ma TCP service0.nano.alliance.unm.edu:1026identd TCP *:authinetd TCP *:ftpinetd TCP *:globus-gatekeeperinetd TCP *:gsiftpinetd TCP *:klogininetd TCP *:kshellinetd TCP *:logininetd TCP *:netsaint remote

Cluster ManagementLinux TricksChecking Your WorkFiniding init.d started services:To find the services that will be started by default at the current runlevelusing /etc/rc.d/init.d scripts:# chkconfig - -list grep ‘grep :initdefault:/etc/inittab awk -F: ’print 2’‘:on awk ’print 1’ sort columnatdisdnrandomautofskeytablereconfigcondorg pbs mom verifyd

Cluster ManagementLinux TricksChecking Your WorkFinding Network visible servicesNmap is your friend!To find services visible from the network:other-host# nmap host-to-be-looked-atPortState /tcpopenshell1026/tcp opennterm4321/tcp openrwhoisw

Cluster ManagementRegression TestingRegression TestingMaking sure stuff still worksYour regression tests should:ICheck your basic system components and toolsICheck your network(s)ICheck your important applicationsJim’s Rule:4If the cluster doesn’t work for your users, the cluster *doesn’t work*!4Jim learned this the hard way!

Cluster ManagementRegression TestingYou’re mostly on your own :P. but its just some shell scripts.IYou can use tools like Cfengine to automate some of your regressiontestingIYour regression tests should be easy to runIYour regression tests should produce a summary of successes andfailures — a report at the end.IConsider a suite of shell scriptsIShould the scripts attempt to repair any errors they find? (season totaste!)

Cluster ManagementSystem / Node / Software Change Management LogsSystem/Node/Software Change Management LogsIChange management logs will save your backside!ISystem administrators can be sloppy! :P :)Where did I put that?!IChoose a tool that works well for the administrator(s) for thesystem in question.

Cluster ManagementSystem / Node / Software Change Management LogsWhere to keep Change Management Logs?Somewhere that you will actually keep them!IA Wiki of some kindIEmacs outline mode is nice!IReally, whatever works for you and your staff!II’ve seen sites alias editor commands in root’s environment torequire the admin to make a change management log when s/heedits a config file.II won’t tell if you’re using a plain ASCII text file :)I.but if you do, please consider

Cluster Management Common Management Tools OSCAR OSCAR Supported Linux Distributions I RedHat Enterprise Linux 4 I RedHat Enterprise Linux 5 I Fedora Core 7 I Fedora Core 8 I Yellow Dog Linux 5.0 I OpenSUSE Linux 10.2 (x86 64 Only!) I "Clones of supported distributions, especially open source rebuilds of Red Hat Enterprise Linux such as CentOS and Scientific Linux,