ChangeLog - NEC

Transcription

ChangeLogEditionDate llNew2March20182.1.1. Preparation of thesoftware packages toinstall for the SX-AuroraTSUBASA systemChange Download URL was changed.2.2.1. Installing AnsibleChange The reference URL was deleted.2.4.1. Files for AnsibleChange vh-set-bmc.yml was deleted.2.4.2. Tuning AnsibleperformanceChange The reference URL was deleted.2.5. Execution ofCommands on VHsChange The reference URL was deleted.2.6. File Manipulation onVHsChange The reference URL was deleted.3.1. ZabbixChange The reference URL and the screenimages of Zabbix were deleted.3.2. Ganglia NagiosChange The reference URL and the screenimages of Ganglia were deleted.3.1.1. PreparationChange Download URL was changed.3.2.1. Preparation4.1. Controlling power with Change The explanation of the BMC operationBMCuser was added.4.1.1.1. Checking BMC installation and4.1.1.2. BIOS Configuration on VHswere deleted.3May 2018C.4.1. VH power controlusing the powerctrlcommands fails.Change Cause and Solution were changed.2.1.1. Preparation of thesoftware packages toinstall for the SX-AuroraTSUBASA systemChange NEC MPI was excluded from the updatetarget which Ansible is used. NECParallel Debugger was added.2.2.5. Creating a Yumrepository for SX-AuroraTSUBASAChange "4. Update the group definition file." wasdeleted.2.4.1. Files for AnsibleChange roles/nec-mpi was deleted. roles/paralleldebugger was added.2

EditionDate ofIssueChapter/SectionNew/Change2.6. File Manipulation onVHsChangesChange The master playbook was changedvh.yml to vh-set.yml.2.7. Software Installation onVHs2.10.1. Editing PlaybookChange nec-mpi was deleted. parallel-debuggerwas added.3.1.2.1. PreparationChange Add the operation example of firewalldservice.3.1.3.1. Manual Setup3.2.2.1. Preparation on theManagement ServerChange Fix the operation example of firewalldservice.3.2.3.1. Manual Setup45May 2018Jan 2019A.2. The Item KeysProvided by the LoadableModulesChange Fix a misspelling.C.2. Operational StatusMonitoring(Zabbix)Change Fix a misspelling.2.1.1. Preparation of thesoftware packages toinstall for the SX-AuroraTSUBASA systemChange NEC MPI was re-supported as the updatetarget with Ansible.2.4.1. Files for AnsibleChange roles/nec-mpi was re-supported.2.10.1. Editing PlaybookChange nec-mpi was re-supported.1.1. ScopeChange "System Configuration Management"functions by Ansible suspend support.Trademarks Linux is a registered trademark of Linus Torvalds in the United States and other countries. Red Hat and Red Hat Enterprise Linux are registered trademarks of Red Hat, Inc. in the UnitedStates and other countries. Apache is a registered trademark of Apache Software Foundation. InfiniBand is a trademark or service mark of InfiniBand Trade Association. Ansible is a registered trademark of Red Hat, Inc. in the United States and other countries. Python is a registered trademark of the Python Software Foundation. Nagios is a registered trademark of Nagios Enterprises, LLC. Zabbix is a trademark of Zabbix LLC that is based in Republic of Latvia. All other product, brand, or trade names used in this publication are the trademarks or registeredtrademarks of their respective trademark owners.3

CopyrightNo part of this document may be reproduced, in any form or by any means, without permission fromNEC Corporation.The information in this document is subject to change at any time, without notice.4

ContentsChangeLog. 2Trademarks.3Copyright. 4List of Figures. 8List of Tables. 9Chapter 1 Overview. 101.1.1.2.1.3.1.4.Scope. 10Glossary. 10Configuration.11Operating Environment. 12Chapter 2 System Configuration Management.132.1. Preparation.132.1.1. Preparation of the software packages to install for the SX-Aurora TSUBASAsystem.132.1.2. Preparing of the template package.162.2. Initial Setup of the Management Server. 162.2.1. Installing Ansible.172.2.2. Installing Python.172.2.3. Installing Apache HTTP Server. 172.2.4. Installing createrepo. 172.2.5. Creating a Yum repository for SX-Aurora TSUBASA.172.2.6. Creating an administrative user.182.2.7. Installing the template package.182.2.8. Setting administrative user's SSH public and private keys. 192.2.9. Creating the VH information file.192.2.10. Creating host list file.202.3. Initial Setup of VHs.202.4. Preparing for using Ansible. 222.4.1. Files for Ansible.222.4.2. Tuning Ansible performance.242.4.3. Starting ssh-agent. 252.4.4. Registering VHs. 252.4.5. Checking VH connectivity. 252.4.6. Starting Apache HTTP Server. 252.5. Execution of Commands on VHs. 262.6. File Manipulation on VHs. 262.7. Software Installation on VHs.272.8. Software Uninstallation from VHs. 282.9. Update of Other Software. 292.10. Software Update on VHs. 292.10.1. Editing Playbook. 292.10.2. Execution of the Playbook. 315

Chapter 3 Operational Status Monitoring.333.1. Zabbix. 333.1.1. Preparation. 343.1.2. Initial Setup of Zabbix Server.353.1.2.1. Preparation. 353.1.2.2. Installation of Zabbix Server.373.1.2.3. Start of Zabbix Server. 383.1.2.4. Initial Setup of Zabbix Server.383.1.3. Initial Setup of Zabbix Agent. 383.1.3.1. Manual Setup.393.1.3.2. Procedure with Ansible. 413.1.4. Configuration of Host Information. 433.1.4.1. Registration of Hosts. 453.1.5. Configuration of Monitoring Items.453.1.5.1. Configuration of Monitoring Items Using Templates. 463.1.5.2. Addition of Monitoring Items. 463.1.5.3. Information Gathering with Loadable Modules. 463.1.5.4. Information Gathering with User Parameters.473.1.5.5. Monitoring of VEOS Services. 473.1.5.6. Performance Tuning. 483.1.6. Configuration of Triggers.483.1.7. Configuration of Actions.493.1.8. Customization of the Web Interface.493.1.9. Creation of Loadable Modules from Source Files. 493.1.10. Exclusion of monitoring hosts for updating SX-Aurora TSUBASAsoftware. 503.1.11. Inclusion of monitoring hosts updating SX-Aurora TSUBASA software.503.2. Ganglia and Nagios.503.2.1. Preparation. 513.2.2. Configuration of the Management Sever. 523.2.2.1. Preparation on the Management Server. 523.2.2.2. Installation of Ganglia Server. 533.2.2.3. Configuration of Ganglia Server. 533.2.2.4. Installation of Nagios. 543.2.2.5. Configuration of Nagios. 543.2.2.6. Configuration of Actions.553.2.2.7. Start of Ganglia Server and Nagios. 553.2.3. Configuration of Ganglia Agent. 563.2.3.1. Manual Setup.563.2.3.2. Setup Using Ansible.593.2.3.3. Exclude the VH from monitoring. 623.2.3.4. Restart monitoring of the VH.62Chapter 4 Power Control. 644.1. Controlling power with BMC. 644.1.1. Configuration for controlling power with BMC. 644.1.1.1. Installing IPMITool on the management server. 644.1.1.2. Configuration for Power control tool powerctrl.644.1.2. Powering on/off VHs with BMC. 644.2. Powering on/off VHs with WOL.654.2.1. Configuration of controlling power with WOL. 654.2.1.1. Check the network card.656

4.2.1.2. Configuration for the power control tool powerctrl w. 654.2.2. Powering on/off VHs with WOL.65Chapter 5 t.py. 67powerctrl. 68powerctrl w. 70setup-hostlist. 72vh host.conf.72vh host w.conf. 73Appendix A Monitoring Items of Zabbix. 75A.1. The Item Keys Provided by the Loadable Modules. 75A.2. The Items and Triggers Provided by the Templates. 82Appendix B Monitoring Items of Ganglia. 88B.1. The List of the Monitoring Items (Ganglia). 88Appendix C Trouble Shooting.92C.1. Operation Management. 92C.1.1. Entry of the SSH private key's password is required. 92C.1.2. Playbook execution fails with "sudo: sorry, you must have a tty to run sudo". 92C.1.3. Package download fails with "urlopen error [Errno 113] No route to host". 93C.1.4. Yum repository update fails with "[Errno 14] HTTP Error 502 - BadGateway". 94C.2. Operational Status Monitoring(Zabbix). 95C.2.1. Cannot get VE information, and 'Not supported by Zabbix Agent' is shown toitem information of monitoring host.95C.2.2. Cannot get VE sensor information, and 'File can not access' is shown to iteminformation of monitoring host. 95C.2.3. Cannot get a core temperature of VE, and 'Specified core is not available' isshown to item information of monitoring host. 95C.3. Operational State Monitoring (Ganglia Nagios). 96C.3.1. Metrics for the SX-Aurora TSUBASA system are not displayed on theGanglia web interface.96C.3.2. The values in a graph for the SX-Aurora TSUBASA system are displayed as“0” on the Ganglia web interface.96C.3.3. The status of VH services is displayed as "UNKNOWN" on the Nagios webinterface.96C.4. Power Control.97C.4.1. VH power control using the powerctrl commands fails.97C.4.2. VH power control using the powerctrl w commands fails. 97Appendix D Ansible Playbook. 99D.1. How to check Ansible playbooks. 99Bibliography. 1007

List of FiguresFigure 1: The configuration of the system configuration management. 11Figure 2: The management network and BMC network are identical. 11Figure 3: The management network and BMC network are different. 12Figure 4: The Configuration of Zabbix. 33Figure 5: VEs are registered as hosts. 43Figure 6: VEs are not Registered as Hosts.44Figure 7: Example 1: Configuration of Host Groups.44Figure 8: Example 2: Configuration of Host Groups.45Figure 9: Configuration of Ganglia and Nagios.518

List of TablesTable 1: Items in the VH information file. 19Table 2: Structure of Sample Playbooks. 22Table 3: Structure of roles-dir .23Table 4: Settings of ansible.cfg. 24Table 5: List of sample files Installation Path. 34Table 6: List of sample files Installation Path. 51Table 7: The Item Keys Provided by ve hw item.so. 75Table 8: The List of Keys Provided by ve os item.so. 81Table 9: The List of the Items of the Application Name(VEHW). 82Table 10: The List of the Application Name (VEOS).86Table 11: The List of the Items of the Application Name (VEOS-SERVICE). 87Table 12: The List of the Metrics Provided by ve.py.88Table 13: The List of the Keys Provided by veos.py. 909

Chapter 1 Overview1ChapterOverview1.1. ScopeThis document explains how to manage system configuration, monitor operational status, and controlpower supply using open source software (OSS) in a large-scale SX-Aurora TSUBASA system. Operational Status MonitoringThe SX-Aurora TSUBASA system recommends the following OSS for this purpose, and explainsthe usage of them.- Zabbix- Ganglia NagiosIn addition to the monitoring, system failure and shortage of resources can be detected and informedto the administrator with the OSS above. Please select Zabbix or the combination of Ganglia andNagios according to your system requirement. System Configuration ManagementHow to execute commands, edit files, start or stop services, and install, uninstall, or update packageson VHs from the management server. Power Supply ControlHow to turn on or off the power of VHs.NoticeThe "System Configuration Management" function by Ansible suspend support at Jan 2019.This document includes the following contents and the description related to Ansible, but you cannotexecute as the document. Please use only a reference. " Chapter 2 System Configuration Management on page 13 " " 3.1.3.2. Procedure with Ansible on page 41 " " 3.2.3.2. Setup Using Ansible on page 59 "1.2. GlossaryThe table below lists terms used in this document.TermsDescriptionVEEngine based on NEC's vector architecture and developed to run vector programsVHLinux server host mounting VE10

Chapter 1 Overview1.3. ConfigurationThe configuration for operational status monitoring of the SX-Aurora TSUBASA system dependson the OSS you choose. Please refer to Chapter 3. " 3.1. Zabbix on page 33 " " 3.2. Ganglia andNagios on page 50 "The system configuration management of VHs requires Ansible, which is OSS. Therefore, themanagement server where Ansible is running and all VHs in a system must be connected by a network.Figure 1: The configuration of the system configuration managementTo turn on and off the power of VHs from the management server, the server needs to have access toBaseboard Management Controllers (BMC) on the VHs. Examples of the configurations are shown inFigures 2 and 3.Figure 2: The management network and BMC network are identical.11

Chapter 1 OverviewFigure 3: The management network and BMC network are different.1.4. Operating EnvironmentThe following environment supports the system management of VHs described in this document.[Management Server]H/Wx86 64 Architecture machineOSRed Hat Enterprise Linux 7Zabbix 3.0.11Ganglia 3.7.2OSSNagios 4.3.2Ansible 2.3.0.0IPMItool 1.8.15[VH]H/WThe models listed in the SX-Aurora TSUBASA product catalog.OSPlease refer to SX-Aurora TSUBASA Installation Guide.Zabbix 3.0.11OSSGanglia 3.7.2Nagios 4.3.212

Chapter 2 System Configuration Management2ChapterSystem Configuration ManagementAnsible enables the configuration management of the SX-Aurora TSUBASA environment on VHs.The steps for setting up Ansible are as follows:1. Preparation2. Initial Setup of the Management ServerSet up the environment for using Ansible on the management server.3. Initial Setup of VHsSet up the environment in which you can manage the system configuration of VHs from themanagement server.4. Initial Setup of AnsibleThese steps enable execution of commands and updates of software on VHs from the managementserver as the management user.Please perform the operation in " 2.4. Preparing for using Ansible on page 22 " and after in thischapter as the management user on the management server.2.1. PreparationThis chapter describes files to be prepared before starting tasks described in this document.2.1.1. Preparation of the software packages to install for the SX-Aurora TSUBASAsystemWhen updating the SX-Aurora TSUBASA system software installed on VHs, prepare the softwarepackages of the SX-Aurora TSUBASA system and the yum repository's group definition file(TSUBASA-groups.xml). These files will be used in " 2.2.5. Creating a Yum repository for SX-AuroraTSUBASA on page 17 "" 2.10. Software Update on VHs on page 29 ".Some of the packages are released separately for RHEL versions. So download and prepare thepackages for using system.This document does not cover updating of C/C compiler and Fortran compiler. For updating thesoftware, please refer to SX-Aurora TSUBASA Installation Guide. List of the Package filesThe following packages are included in the SX-Aurora TSUBASA system software.ProductsPackage filesFree/Non-freeLicense access libraryaurlic-lib.x86 64FreeVEOS Application Runtimecoreutils-ve.x86 64Freegdb-ve.x86 64libsysve-musl.x86 64libved.x86 6413

Chapter 2 System Configuration ManagementProductsPackage filesFree/Non-freemusl-libc-ve.x86 64procps-ng-ve.x86 64psacct-ve.x86 64psmisc-ve.x86 64strace-ve.x86 64sysstat-ve.x86 64time-ve.x86 64util-linux-ve.x86 64ve-memory-mapping.x86 64ve drv-kmod.x86 64velayout.x86 64veos.x86 64veos-libveptrace.x86 64veosinfo.x86 64vesysinit.noarchvesysinit-udev.noarchvp-kmod.x86 64VEOS Application hgdb-ve.x86 64libsysve-musl.x86 64libsysve-musl-devel.x86 64libtool-ve.x86 64libved.x86 64musl-libc-ve.x86 64musl-libc-ve-devel.x86 64vedebuginfo.noarchvelayout.x86 64veos-libveptrace.x86 64veos-musl-headers.x86 64InfiniBand for SX-AuroraTSUBASAlibibverbs-ve-musl.x86 64libvedma-ve-musl.x86 64libmlx5-ve-musl.x86 64libveib.x86 6414Free

Chapter 2 System Configuration ManagementProductsPackage filesFree/Non-freeve peermem.x86 64ve peermem.srcMMMftmon.x86 64Freelibsignature.x86 64mmm.x86 64mmm-analysis.x86 64mmm-msl.x86 64rtmon.x86 64ve-firmware.noarchve-power.x86 64VMC Firmwarevmcfw.noarchFreeScaTeFS Client[RHEL 7.3/CentOS 7.3]Non-freescatefs-client-libscatefsib.x86 64scatefs-client-libscatefsib ve.x86 64scatefs-client-modules-mlnx ofed.x86 64scatefs-client-mount-utils.x86 64scatefs-client-rcli-utils.x86 64scatefs-client-utils.x86 64[RHEL 7.4 or later/CentOS 7.4 or later]kmod-scatefs-client-modulesmlnx ofed.x86 64scatefs-client-libscatefsib.x86 64scatefs-client-libscatefsib ve.x86 64scatefs-client-mount-utils.x86 64scatefs-client-rcli-utils.x86 64scatefs-client-utils.x86 64NEC MPInec-mpi-devel-1-0-0.x86 64Non-freenec-mpi-libs-1-0-0.x86 64nec-mpi-utils-1-0-0.x86 64nec-mpi-runtime.x86 64Tuning Toolnec-veperf.x86 64Non-freenec-ftraceviewer.x86 64NEC Parallel Debuggernec-paralleldebugger.x86 64Non-freeNQSV/JobServerNQSV-JobServer.x86 64Non-free15

Chapter 2 System Configuration ManagementProductsPackage filesFree/Non-freeNQSV/ClientNQSV-Client.x86 64Non-freeNumeric Library Collectionnec-asl-ve-1.0.0.x86 64Non-freenec-aslfftw-ve-1.0.0.x86 64nec-blas-ve-1.0.0.x86 64nec-heterosolver-ve-1.0.0.x86 64nec-lapack-ve-1.0.0.x86 hnec-sblas-ve-1.0.0.x86 64nec-scalapack-ve-1.0.0.x86 64binutilsbinutils-ve.x86 64Non-freeC/C compilernec-nc .x86 64Non-freenec-nc -musl-inst.noarchnec-nc -doc.noarchFortran compilernec-nfort.x86 noarchThese files can be got from the following places.FilesPlacesTSUBASA-groups.xml" https://jpn.nec.com/hpc/aurora/ve-software/ " or" oftware/ "Update package files-There are difference in the contents of the group definition file (TSUBASA-groups.xml) for each OSversion. Please download the file of the corresponding version.2.1.2. Preparing of the template packageThe template package, TSUBASA-sysmng-soft-X.X- Y.noarch.rpm, provides sample files and toolsdescribed in the chapter of Configuration Management. Download the latest TSUBASA-sysmng-softX.X- Y.noarch.rpm and save it in any directory you wish on the management server.2.2. Initial Setup of the Management ServerThis section explains how to set up the environment for using Ansible on the management server.CautionIf you have already installed using SX-Aurora TSUBASA Installation Guide (with OSS), skip this section.16

Chapter 2 System Configuration Management2.2.1. Installing AnsibleInstall Ansible on the management server. For information about verified versions, see " 1.4. OperatingEnvironment on page 12 ". Please refer to the official website of Ansible for the installation sequence.2.2.2. Installing PythonThe conversion tool of the VH information file, " 5.1. create-hostlist.py on page 67 " uses python.Check Python 2.7 or later of the Pytho

BMC Change The explanation of the BMC operation user was added. 4.1.1.1. Checking BMC installation and 4.1.1.2. BIOS Configuration on VHs were deleted. 2 March 2018 C.4.1. VH power control using the powerctrl commands fails. Change Cause and Solution were changed. 2.1.1. Preparation of the software packages to install for the SX-Aurora TSUBASA .