Troubleshooting Your SUSE Cloud - Susecon

Transcription

Troubleshooting YourSUSE Cloud TUT6113Paul ThompsonSUSE Technical ConsultantDirk MüllerSUSE OpenStack Engineer

SUSE Cloud . 2

SUSE Cloud Troubleshooting 3

SUSE Cloud 4653Parameters4

SUSE Cloud 14Components5

SUSE Cloud 2Hours6

SUSE Cloud Troubleshooting 1Hour7

SUSE Cloud Building Blocks d APIsAppMonitorSec & isorHypervisorXen, KVMXen,KVMVmware,HyperVVmware, HyperVAdaptersAdaptersRBDRadosOperatingSystemSUSE LinuxEnterpriseServer 11 SP3Physical Infrastructure: x86-64, Switches, StorageOpenStack8SUSECloud AddsToolsManagementOS nerSolutionsCeph

Non-HA SUSE Cloud Installation 9

HA SUSE Cloud Installation 10

Just Enough HA for Troubleshooting11 crm resource list crm mon crm resource restart X crm resource cleanup X

More About HA.https://www.suse.com/documentation/sle ha/12

SUSE Cloud Functional Blocks CrowbarNovaChef .OpenStackCinderSLES13GlanceKeystoneNeutron

Crowbar and Chef14

Generic SLES Troubleshooting All Nodes in SUSE Cloud are SLES11 SP3 Watch out for typical issues: 15 –dmesg for hardware-related errors, OOM, interesting kernelmessages–usual syslog targets, e.g. /var/log/messagesCheck general node health via:–top, vmstat, uptime, pstree, free–core files, zombies, etc

Supportconfig16 supportconfig can be run on any cloud node supportutils-plugin-susecloud.rpm–installed on all SUSE Cloud nodes automatically–collects precious cloud-specific information for further analysis

Typical Deployment SchemaAdmin NodeControl NodeChef ClientChef ServerCrowbarProvisionerComputeComputeComputeCloud NodeNetwork NodeChef Client17

Cloud Installscreen install-suse-cloud bar/barclamp install/*.log18

SUSE Cloud Admin Node SUSE CloudAddonCrowbar UICrowbarServicesChef/RabbitRepo MirrorSLES 11 SP319Install /couchdb/couchdb.logCrowbar repo r/log/crowbar/production.{out,err}

Chef Cloud uses Chef for almost everything:–All Cloud and SLES non-core packages–All config files are overwritten–All daemons are started–Database tables are initializedhttp://docs.getchef.com/chef quick overview.html20

Admin Node: Using Chefknife node listknife node show nodeid export EDITOR /usr/bin/vim; \knife node edit -a nodeid node21

SUSE Cloud Admin Node Populate root/.ssh/authorized keysprior install Barclamp install logs:/var/log/crowbar/barclamp install Node discovery logs:/var/log/crowbar/sledgehammer/d macid . domain .log Syslog of crowbar installed nodes sent via rsyslog to:/var/log/nodes/d macid .log22

Useful Tricks Root login to the Cloud installed nodes should bepossible from admin node (even in discovery stage) If admin network is reachable: /.ssh/config:host 192.168.124.*StrictHostKeyChecking nouser root23

SUSE Cloud Admin Node If a proposal is applied, chef client logs are at:/var/log/crowbar/chef-client/ macid . domain .log Useful crowbar commands:crowbar machines helpcrowbar transition node state crowbar barclamp proposal list show name crowbar barclamp proposal delete default24

Admin Node: Crowbar Services Nodes are deployed via PXE boot:/srv/tftpboot/discovery/pxelinux.cfg/* Installed via AutoYaST; profile generated to:/srv/tftpboot/nodes/d mac . domain /autoyast.xml Can delete & rerun chef-client on the admin node Can add useful settings to autoyast.xml: confirm config:type "boolean" true /confirm (don’t forget to chattr i the file)25

Admin Node: Crowbar UIUseful Export Pageavailable in theCrowbar UI in order toexport various log filesfrom a customerinstallation26

Admin Node: Crowbar UIRaw settings in barclampproposals allow access to"expert" (hidden) optionsMost interesting are:debug: trueverbose: true27

Admin Node: Crowbar Gotchas28

Admin Node: Crowbar Gotchas Be patient–Do not multiple transition nodes from one state to another–Do not apply proposals while a proposal is applyingCloud nodes should boot from:1. Network2. First disk29

SUSE Cloud Nodes SUSE CloudAddonCloud NodeAll managed via Chef:/var/log/chef/client.logrcchef-client statusNodespecificservicesChef ClientSLES 11 SP330chef-client can be invokedmanually Should lock each other if maintenanceupdates are installed

SUSE Cloud Control Node SUSE CloudAddonControl NodeJust like any other cloud node:/var/log/chef/client.logrcchef-client statusOpenStackAPIservices.chef-clientChef overwrites all config files ittouches chattr i is your friendChef ClientSLES 11 SP331

OpenStack Architecture Diagram32

OpenStack Block diagramAccesses almosteverythingKeystone: SPOF33

OpenStack Architecture 34Typically each OpenStack component provides:–an API daemon / service–one or many backend daemons that do the actual work–command line client to access the API– proj -manage client for admin-only functionality–dashboard ("Horizon") Plugin providing a graphical view onthe service–uses an SQL database for storing state

OpenStack Packaging Basics Packages are usually named:openstack- codename –usually a subpackage for each service (-api, -scheduler, etc)–log to /var/log/ codename / service .log–each service has an init script:dde-ad-be-ff-00-01: # rcopenstack-glance-api statusChecking for service glance-api35.running

OpenStack Debugging Basics Log files often lack useful information withoutverbose enabled TRACEs of processes are not logged without verbose Many reasons for API error messages are not loggedunless debug is turned on Debug is very verbose ( 10GB per ck.org/icehouse/36

OpenStack ArchitectureAccesses almosteverythingKeystone: SPOF37

OpenStack Dashboard: Horizon/var/log/apache2/openstack-dashboard-error log38 Get the exact URL it tries toaccess! Enable “debug” in Horizonbarclamp Test componentsindividually

OpenStack Identity: Keystone39 Needed to access all services Needed by all services for checking authorisation Use keystone token-get to validate credentialsand test service availability

OpenStack Object Store: Swiftswift stat swift dispersion in Crowbar uses regular syslog formany messages:/var/log/messagesconsole 40easiest to debug usingcurl

OpenStack Imaging: Glance To validate lifeness:glance image-listglance image-download id /dev/nullglance image-show id 41

OpenStack Networking: Neutron Swiss Army knife for SDNneutron agent-listneutron net-listneutron port-listneutron router-list 42There's no neutron-manage

Basic Network Layout43

Networking with OVS:Compute nce/content/under the hood openvswitch.html44

Networking with LB:Compute Node45

Neutron TroubleshootingNeutron uses IP Networking Namespaces on theNetwork node for routing overlapping networksneutron net-listip netns listip netns exec qrouter- id bashping. arping. ip ro. curl .46

OpenStack Compute: Novanova-manage service listnova-manage logs errorsnova show id with admin privileges showscompute nodevirsh list or virsh dumpxml can be used to analyzestate of VM47

Nova "Launches" go to Scheduler; rest to Conductor48

Nova Booting VM Workflow49

Nova: Scheduling a VM 50Nova scheduler tries to select a matching computenode for the VM

Nova SchedulerTypical errors: No suitable compute node can be found All suitable compute nodes failed to launchthe VM with the required settings nova-manage logs errorsINFO nova.filters a2ef] FilterRamFilter returned 0 hosts51

Nova SchedulerTypical errors: No suitable compute node can be found All suitable compute nodes failed tolaunch the VM with the required settings–nova-manage logs errorsINFO nova.filters [req-299bb909-49bc-4124-8b88732797250cf5 47a52e6b02a2ef] FilterRamFilter returned 0 hosts52

OpenStack Volumes: CinderSchedulerAPI53VolumeVolumeVolumeVolume

OpenStack Cinder: VolumesSimilar syntax to Nova:cinder-manage service listcinder-manage logs errorscinder-manage host listcinder list show (with admin privs) shows volume-host54

Troubleshooting Cloud-Init OpenStack Services like Heat or Nova depend oncloud-init–sets host name, ssh keys, resizes disks, launches customscripts on boot Heat uses scripts to launch cfntools use curl on the metadata server inside the VM/var/lib/cloud//var/log/cloud-init.log55

Q&A http://ask.openstack.org/ http://docs.openstack.org/ nk you56

Bonus Material

OpenStack Orchestration: Heat58

OpenStack Orchestration: Heat Uses Nova, Cinder, Neutron to assemble completestacks of resourcesheat stack-listheat resource-list show stack heat event-list show stack 59Usually necessary to query the actual OpenStackservice for further information

OpenStack Imaging: Glance Usually issues are in the configured glance backenditself (e.g. RBD, swift, filesystem) so debuggingconcentrates on those Filesytem:/var/lib/glance/images RBD:ceph -wrbd -p pool ls60

SUSE Cloud 61

Unpublished Work of SUSE. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE.Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope oftheir assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market aproduct. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in makingpurchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.The development, release, and timing of features or functionality described for SUSE products remains at the solediscretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, atany time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced inthis presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. Allthird-party trademarks are the property of their respective owners.

All Nodes in SUSE Cloud are SLES11 SP3 Watch out for typical issues: - dmesgfor hardware- related errors, OOM, interesting kernel messages - usual syslog targets, e.g. /var/log/messages Check general node health via: - top, vmstat, uptime, pstree, free - core files, zombies, etc