SX-Aurora TSUBASA Installation Guide

Transcription

SX-Aurora TSUBASAInstallation Guide

Proprietary NoticeThe information disclosed in this document is the property of NEC Corporation (NEC)and/or its licensors. NEC and/or its licensors, as appropriate, reserve all patent,copyright, and other proprietary rights to this document, including all design,manufacturing, reproduction, use and sales rights thereto, except to the extent saidrights are expressly granted to others.The information in this document is subject to change at any time, without notice.Trademarks and Copyrights Linux is a registered trademark of Linus Torvalds in the United States and othercountries. Red Hat and Red Hat Enterprise Linux are registered trademarks of Red Hat,Inc. in the United States and other countries. Apache is a registered trademark of Apache Software Foundation. InfiniBand is a trademark or service mark of InfiniBand Trade Association. Mellanox is trademark or registered trademark of Mellanox Technologies inIsrael and other countries. Windows are registered trademarks of Microsoft Corporation in the UnitedStates and other countries. All other product, brand, or trade names used in this publication are thetrademarks or registered trademarks of their respective trademark owners. NEC Corporation 2018-2022i

PrefaceThis document explains how to install, configure, update, and uninstall the SXAurora TSUBASA software on the SX-Aurora TSUBASA system.The latest version of this document is available lationGuide E.pdf“SX-Aurora TSUBASA Setup Guide” is also available at the following URL, andexplains how to set up the SX-Aurora TSUBASA system for first-time users,including hardware setup, installation of the OS and SX-Aurora TSUBASA software,basic environment settings, and execution of sample SetupGuide E.pdfPlease note that the setup guide explains the setup procedures mostly for SX-AuroraTSUBASA Model A100-1, and does not describe installation of ScaTeFS and NQSV.NotePlease execute command lines starting with “#” as the superuser in this document.ii

Definitions and AbbreviationsTermVector Engine (VE)DescriptionThe core part of the SX-Aurora TSUBASA system, on whichapplications are executed. A VE is implemented as a PCIExpress card and attached to a server called a vector host.Vector Host (VH)A Linux (x86) server to which VEs are attached, in otherwords, a host computer equipped with VEs.Vector Island (VI)A set of a VH and VEs that are attached to the VH. A VI isthe basic unit for the tower model and rack mount modeldescribed below.Tower modelOne of the SX-Aurora TSUBASA product models. The towermodel is a desk side model that can be simply set-up.Rack mount modelOne of the SX-Aurora TSUBASA product models. The rackmount model is a 1U or 4U server model with a serverrack. It covers from small systems to large scale systems.Supercomputer modelOne of the SX-Aurora TSUBASA product models. Thesupercomputer model is positioned as the next generationmodel of the SX series. It can mount up to eight 4U rackmount servers. All vector engines have water coolingdevices.VMCAbbreviation of VE Management ControllerIBAbbreviation of InfiniBandHCAAbbreviation of Host Channel Adapter. A kind of PCIe cardto connect a server to an IB network.MPIAbbreviation of Message Passing Interface. MPI is astandard specification for a communication library. It can beused together with OpenMP or automatic parallelization.NEC yum repositoryThe yum repository for NEC SX-Aurora TSUBASA software.The yum repository for the free software can be accessedby any user. The yum repository subject to accessrestrictions for the paid software can be accessed only byusers with PP support contract.License serverA server that manages licenses for the software on the SXAurora TSUBASA. This is needed to use the paid software.Frontend,In this document, frontend (or frontend machine) meansFrontend machinecompiling programs for VEs on a machine other than SXAurora TSUBASA system (or the compile machine).iii

TermPP supportDescriptionThe support services to provide technical support for thepurchased software products for a fee.iv

ContentsContentsChapter1 Introduction . 11.1Scope . 11.2System Requirement. 21.2.1Hardware . 21.2.2Supported Operating Systems . 21.2.3Related URL . 31.3Examples of System Configuration . 31.4Software Installation . 51.4.1With PP Support contract . 51.4.2Without PP Support contract. 61.4.3Software Installation procedure . 6Chapter2 Installation OS and the related Software . 82.1OS Installation onto the VHs . 82.1.1Linux OS installation . 82.1.2Linux OS yum repository . 82.2Kernel Update . 102.3Installation of MLNX OFED (Optional) . 122.4Installation of Python 3 (Optional) . 132.5Update of bash on the RHEL/CentOS 8.3-8.4 . 13Chapter3 Installation the SX-Aurora TSUBASA software . 153.1Setup of the Yum Repository . 153.1.1The files to use yum repository . 153.1.2The ID to access the restricted repository. 153.1.3In the case of not accessing to NEC repository from VH . 173.1.4Configulation of yum repository for old version kernel (Optional) . 233.2Software Installation . 243.2.1With PP Support and Direct access to NEC yum repository . 243.2.2With PP Support and Local yum repository . 263.2.3Without PP Support . 283.2.4Package groups for TSUBASA Software . 313.3Status Check of the VEs . 33v

Contents3.4Update of the VMC Firmware. 333.5Configuration of HugePages . 343.6Start of the ScaTeFS Client (Optional) . 35Chapter4 Software Configuration . 364.1Configuration of Operation Network . 374.1.1InfiniBand (IP over IB) . 374.1.2Ethernet . 384.1.3Restart network . 394.2Specification of the License Server . 394.3Configuration of the License Server . 404.4Configuration of InfiniBand HCA Relaxed Ordering (Optional) . 404.5Configuration of InfiniBand HCA PCIe credit number (Optional) . 414.6Configuration of ScaTeFS . 434.7Configuration of NQSV . 444.8Configuration of NEC MPI. 444.8.1SELinux . 444.8.2Firewall . 454.8.3InfiniBand QoS . 454.8.4NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol(SHARP) . 464.8.5Binding HCAs to VEs (Optional). 474.8.6Name Resolution of Hostname . 474.8.7InfiniBand Adaptive Routing (Optional) . 484.8.8Partial Process Swappking (Optional) . 484.9Setup of NEC Parallel Debugger . 484.9.1Installation of Eclipse PTP . 484.9.2Installation of the NEC Parallel Debugger Plugin . 494.9.3Installation of the Necessary Software for Eclipse PTP . 494.9.4Configuration of the Firewall . 504.10The Confirmation of the Virtual Memory Setting . 514.11The Confirmation of the Memlock Resource Setting . 514.12Configuration of HugePages . 514.13Configuration for Partial Process Swapping . 534.13.1Configuration of veswap.option . 53vi

Contents4.13.24.14Restarting VEOS . 55Configuration for Process accounting . 564.14.1Start Process accounting service . 564.14.2Stop Process accounting service . 564.15How to Execute Programs on VEs . 56Chapter5 Update . 585.1Removal of VHs from System Operation. 615.1.1Disconnection of VHs from the Job Scheduler . 615.1.2Removal of the VHs from Target of Monitoring . 615.2Uninstallation of the ScaTeFS Client (Optional) . 625.3Setup of the Yum Repository . 635.4Stop of Update with the Yum Repository . 645.5Uninstallation of MLNX OFED (Optional) . 655.6Update of the OS (Optional) . 655.7Update of the Kernel (Optional) . 665.8Installation of MLNX OFED (Optional) . 675.9Update of the Yum Repository . 675.10Configulation of yum repository for old version kernel (Optional) . 675.11Uninstallation of Unnecessary Software. 685.12Update of the SX-Aurora TSUBASA Software . 685.13Installation of the ScaTeFS Client (Optional) . 715.14Status Check of the VEs. 725.15Update of the VMC Firmware . 725.16Start of the ScaTeFS Client (Optional) . 735.17Update of Software Configuration . 735.18Start of the System Operation . 745.18.1Status Check of the VEs . 745.18.2Start of Monitoring of the VHs . 745.18.3Addition of the VHs to the Job Scheduler . 74Chapter6 Uninstallation . 766.1Removal of VHs from System Operation. 766.1.1Disconnection of VHs from the Job Scheduler . 766.1.2Removal of the VHs from Target of Monitoring . 776.2Uninstallation of the ScaTeFS Client (Optional) . 77vii

Contents6.3Uninstallation . 79Appendix AA.1Installation of the Software Supporting Multiple Instances. 80SDK(Compilers) . 80A.1.1Installation of a Specific Version of the Compilers . mmand/opt/nec/ve/bin/[nfort ncc nc ] . 81A.1.3Update of the Compilers without Changing the Versions Invoked with theCommand /opt/nec/ve/bin/[nfort ncc nc ] . 81A.2MPI . 82A.3Numeric Library Collection . 84A.4NLCPy. 85Appendix BInstalled packages . 86B.1SX-Aurora Software Packages. 86B.2Packages for SX-Aurora Software Package Group . 89Appendix CNetwork Configuration . 94C.1Operation Network . 94C.2Management Network . 95Appendix DMigration to the Glibc Environment . 97Appendix EThe manual setting of HugePages. 98Appendix FHistory . 101History table . 101Change notes . 101viii

List of tablesList of tablesTable 1 The SX-Aurora TSUBASA Software . 1Table 2 Models . 2Table 3 Correspondence between the OS Versions and MLNX OFED . 12Table 4 Package Groups for SX-Aurora TSUBASA . 31Table 5 Package Group/NQSV . 32Table 6 Package Group/ScaTeFS Client . 32Table 7 Package Group/ScaTeFS Server . 32Table 8 Software Configuration . 36Table 9 Parameters for Specifying the License Server . 39Table 10 Environment Variables for Specifying the License Server . 40Table 11 The configuration in veswap.option file . 53Table 12 The List of the SX-Aurora TSUBASA Software . 86Table 13 Package Group/InfiniBand for SX-Aurora TSUBASA (for MLNX OFED4.9). 89Table 14 Package Group/InfiniBand for SX-Aurora TSUBASA (for MLNX OFED5) . 89Table 15 Package Group/VE Application . 89Table 16 Package Group/NEC SDK . 91Table 17 Package Group/NEC MPI. 92Table 18 The Required Number of HugePages . 98ix

List of figuresList of figuresFigure 1 Configuration 1: Standalone (Single VI) . 3Figure 2 Configuration 2: Multiple VIs, a Management server, and a FrontendMachine . 4Figure 3 Configuration 3: Large Scale System . 4Figure 4 Installation from the yum repository . 5Figure 5 Installation from the local yum repository . 5Figure 6 Installation of packages downloaded in the Internet Delivery ProductDownload Service . 6Figure 7 Installation procedure . 7Figure 8 Serial Number Card . 16Figure 9 Software Update . 60Figure 10 Network Configuration . 94x

Chapter1Chapter1IntroductionIntroduction1.1 ScopeThis document explains installation, configuration, update, and uninstallation of theSX-Aurora TSUBASA software, which is listed in Table 1.Table 1 The SX-Aurora TSUBASA SoftwareSoftware NameDescriptionComponentsHow toget (*1)VEOSVE management softwareVEOSAMMMMonitoring & MaintenanceMMMAManagerVMC FirmwareVMC FirmwareVMC FirmwareAInfiniBand for SX-AuroraInfiniBand controlInfiniBand for SX-AuroraATSUBASAsoftwareTSUBASALicense ServerLicense managementLicense serverAsoftwareLicense Access LibraryLicense check libraryLicense access libraryANEC SoftwareSoftware DevelopmentC/C CompilerBDevelopment Kit forSoftwareFortran CompilerVector Enginebinutils(abbreviation: SDK)Numeric Library CollectionNLCPyNEC Parallel DebuggerTuning ToolNEC MPI (including NECMPI/Scalar-Vector Hybrid(*2))SDK RuntimeThe runtime libraries andThe binutils, runtimeAcommands in SDK forlibraries and MPIexecuting VE programsexecution command inand MPI programs.SDKNEC Scalable TechnologyScalable Technology FileScaTeFS/ClientBFile SystemSystemNQSV/JobServerB(abbreviation: ScaTeFS)NEC Network QueuingBatch Execution SystemSystem V (abbreviation:NQSV/ClientNQSV)(*1)A: Free software. You can install the software packages from the NEC yum repositorywith the yum command.B: Paid software. If you have the PP support contract, you can install the software1

Chapter1Introductionpackages from the NEC yum repository subject to access restrictions with the yumcommand. Otherwise, you can install the packages downloaded in the internetdelivery product download service.(*2) Installation of NEC MPI/Scalar Vector Hybrid is not necessary because the package forit is the same package for NEC MPI.(*3) ScaTeFS is only available in environments that use InfiniBand for the operation network.It is not available in environments that use only Ethernet for the operation network.1.2 System Requirement1.2.1 HardwareThe SX-Aurora TSUBASA is available in the following models.Table 2 ModelsTowerModel NameRack MountA100-1A101-1A111-1A300-2A300-4A311-4Max. # of VectorEngines (VEs)124# of Vector Hosts 8B401-8A500-64A511-648864118Please refer to the SX-Aurora TSUBASA catalogue for details.Note: Boot mode setting of VHsThe boot mode setting in the BIOS of VHs should be left the UEFI mode, which is thefactory default setting. The SX-Aurora TSUBASA does not support other modes.1.2.2 Supported Operating SystemsThe SX-Aurora TSUBASA software runs on the Linux operating system compatible withthe Red Hat Linux. The NEC support portal below lists the operating systems and theirkernel versions verified for the SX-Aurora TSUBASA.[SX-Aurora TSUBASA] Supported OSes and kernel versions2

en/View.aspx?id 4140100078As listed in the above page, only updated kernels are supported and they are notincluded in the ISO image of each distribution. So it is necessary to update the kernelbefore use. Also, to avoid kernel update to a version that is not verified, pleaseconfigure the yum command using the file /etc/yum.conf so that kernel packages arenot updated. Please refer to 2.2 for the configuration.1.2.3 Related URLSee also the following site about SX-Aurora TSUBASA. NEC Aurora Forum(https://www.hpc.nec/)1.3 Examples of System ConfigurationThis section illustrates system configuration examples of the SX-Aurora TSUBASA.Configuration 1: Standalone (Single Vector Island (VI))Figure 1 illustrates the SX-Aurora TSUBASA software to be installed on the VectorHost (VH).Figure 1 Configuration 1:3Standalone (Single VI)

Chapter1IntroductionConfiguration 2: Multiple VIs, a Management Server, and a Frontend MachineIn this case, software license management is performed on the management serverand programs can be compiled on the frontend machine.Figure 2 illustrates the SX-Aurora TSUBASA software to be installed on the VHs,management server, and frontend machine.Figure 2 Configuration 2: Multiple VIs, a Management server, and a FrontendMachineConfiguration 3: Large Scale SystemPlease contact our sales or SE.Figure 3 Configuration 3: Large Scale System4

Chapter1Introduction1.4 Software Installation1.4.1With PP Support contractWhen you have the PP support contract, you can install the free software packagesfrom the NEC yum repository and the paid software packages from the NEC yumrepository subject to access restrictions, using the yum command. The serial numberof the support pack is required for access to the yum repository for the paid software.Please refer to Section 2.3 for the serial number.Figure 4 Installation from the yum repositoryIf your SX-Aurora TSUBASA system does not have direct access to the Internet, youcan install them by setting up the yum repository for the free and paid software in thelocal environment. Please refer to 3.1.3 for how to set up the local yum repository.Figure 5 Installation from the local yum repository5

Chapter11.4.2IntroductionWithout PP Support contractIf you do not have the PP support contract, you can install the paid software packageswhich you downloaded in the internet delivery product download service.Figure 6 Installation of packages downloaded in the Internet Delivery ProductDownload ServiceAs for the free software packages, you can install them from the NEC yum repository.If your SX-Aurora TSUBASA system does not have direct access to the Internet, youcan install them by setting up the yum repository for the free software in the localenvironment. Please refer to 3.1.3 for how to set up the local yum repository.1.4.3Software Installation procedureYou can install the SX-Aurora TSUBASA software onto VHs according toFigure 7.Please refer to Chapter2 and 2.3 for details. Also, configure the softwarereferring to Chapter4 after the installation.6

Chapter1Figure 7 Installation procedure7Introduction

Chapter2 Installation OS and the related SoftwareChapter2Installation OS and the related SoftwareThis chapter explains the procedure installing OS and the related software.2.1 OS Installation onto the VHs2.1.1Linux OS installationBefore installing the SX-Aurora TSUBASA software, install a supported operatingsystem on the VHs with reference to the following information. [SX-Aurora TSUBASA] Supported OSes and kernel ?id 4140100078 Installation of the operating system‒ Red Hat Enterprise LinuxThe Red Hat Customer Portal Product Documentation for Red Hat Enterprise Linux 8 – Installing Product Documentation for Red Hat Enterprise Linux 7 – Deployment‒ CentOSThe documentation on the CentOS Project siteNoteWhen using CentOS, in order to obtain full performance of the SX-Aurora TSUBASA, setthe tuning profile as follows:# tuned-adm profile throughput-performance# tuned-adm activeCurrent active profile: throughput-performance2.1.2Linux OS yum repositorySet up the yum repository for the Linux OS so that additional packages for the OS canbe installed by the yum command. This is required to install SX-Aurora TSUBASAsoftware. There are two ways to set up the yum repository. One is the way to use theLinux OS installation media, and the other is the way to use official repository on theinternet.8

Chapter2 Installation OS and the related SoftwareThis section explains about the settings to use yum repository which accesses toCentOS 8.3 OS installation media (DVD). In the case of other OS, he followingrepository name, file name and GPG key file should be changed according to the targetOS version.Mount OS installation media (DVD).Put the OS installation DVD into the DVD drive and mount it at an appropriatedirectory. In the following example, /media/cdrom is used.# mkdir /media/cdrom# mount /dev/cdrom /media/cdromConfigure repository settingsTo enable to install software from the OS installation media (DVD) by the yumcommand, save the original repository configuration files and create a newconfiguration file for the OS installation DVD.First, save the original configuration files under /etc/yum.repos.d.# cd /etc/yum.repos.d# mkdir CentOS-repos.d# mv CentOS-* CentOS-repos.d# cp CentOS-repos.d/CentOS-Linux-Media.repo .Edit /etc/yum.repos.d/CentOS-Linux-Media.repo as follwing:[media-baseos]name CentOS Linux releasever - Media - BaseOSbaseurl file:///media/cdrom/BaseOSgpgcheck 1enabled 1gpgkey [media-appstream]name CentOS Linux releasever - Media - AppStreambaseurl file:///media/cdrom/AppStreamgpgcheck 1enabled 1gpgkey Please keep the DVD mounted until installation of the SX-Aurora TSUBASA software iscompleted.9

Chapter2 Installation OS and the related Software2.2 Kernel UpdatePlease update the kernel on the VHs to a version verified for the SX-Aurora TSUBASA,and reboot them.The NEC support portal below lists the operating systems and their kernel versionsverified for the SX-Aurora TSUBASA.[SX-Aurora TSUBASA] Supported OSes and kernel ?id 4140100078After the update, to avoid kernel update to a version that is not verified, pleaseconfigure the yum command using the file /etc/yum.conf so that kernel packages arenot updated. The following is an example of the description in the file /etc/yum.confto avoid kernel update, where ’exclude kernel*’ is specified.# vi /etc/yum.conf[main]exclude kernel*You can install CentOS 8.3 verified kernel(kernel-4.18.0-240.22.1.el8 3.x86 64) asfollowing procedure.(1) Setting of CentOS 8.3 repositoryCreate /etc/yum.repos.d/CentOS-Linux-BaseOS.repo as below to connect tohttps://vault.centos.org/8.3.2011/. Edit the file, and comment out“mirrorlist http://mirrorlist.centos.org/?release releasever&arch basearch&repo BaseOS&infra infra”,and add“baseurl https://vault.centos.org/8.3.2011/BaseOS/ basearch/os".# cd /etc/yum.repos.d# cp CentOS-repos.d/CentOS-Linux-BaseOS.repo .# vi CentOS-Linux-BaseOS.repo# diff CentOS-repos.d/CentOS-Linux-BaseOS.repo CentOS-Linux-BaseOS.repo13c13 mirrorlist http://mirrorlist.centos.org/?release releasever&arch basearch&repo BaseOS&infra infra---10

Chapter2 Installation OS and the related Software #mirrorlist http://mirrorlist.centos.org/?release releasever&arch basearch&repo BaseOS&infra infra14a15 baseurl https://vault.centos.org/8.3.2011/BaseOS/ basearch/os(2) Installing the kernel(kernel-4.18.0-240.22.1.el8 3.x86 64)#dnf--disableexcludes allinstallkernel-4.18.0-240.22.1.el8 3kernel-headers-4.18.0-240.22.1.el8 3Last metadata expiration check: 0:00:14 ago on Thu Oct 14 15:01:41 2021.Dependencies resolved. PackageArchVersionRepositorySize Installing:kernelx86 64kernel-headersx86 644.18.0-240.22.1.el8 34.18.0-240.22.1.el8 3baseosbaseos4.4 M5.6 MInstalling dependencies:kernel-corekernel-modulesx86 64x86 644.18.0-240.22.1.el8 34.18.0-240.22.1.el8 3baseosbaseos30 M26 MTransaction Summary Install 4 PackagesTotal download size: 66 MInstalled size: 88 MIs this ok [y/N]: y:Installed:kernel-4.18.0-240.22.1.el8 3.x86 64kernel-core-4.18.0-240.22.1.el8 3.x86 64kernel-headers-4.18.0-240.22.1.el8 3.x86 64kernel-modules-4.18.0-240.22.1.el8 3.x86 64Complete!# reboot11

Chapter2 Installation OS and the related Software2.3 Installation of MLNX OFED (Optional)If you use InfiniBand with the SX-Aurora TSUBASA, install MLNX OFED onto the VHsaccording to Table 3.Table 3 Correspondence between the OS Versions and MLNX OFEDOSMLNX OFEDRHEL/CentOS 7.9MLNX OFED 4.9-2.2.4.0RHEL/CentOS 8.3MLNX OFED 4.9-3.1.5.0RHEL/CentOS 8.4MLNX OFED 4.9-4.0.8.0MLNX OFED 5.5-1.0.3.2RHEL 8.5MLNX OF

If you use the monitoring software (Zabbix or Ganglia Nagios) for the VHs, please bring the VHs back to the monitoring mode from the maintenance mode. 5.18.3 Addition of the VHs to the Job Scheduler If you use the job scheduler, add the VHs targeted for the update to the job scheduler according to the following procedure. 1.