Transcription
Troubleshooting YourSUSE Cloud TUT6113Paul ThompsonSUSE Technical ConsultantDirk MüllerSUSE OpenStack Engineer
SUSE Cloud . 2
SUSE Cloud Troubleshooting 3
SUSE Cloud 4653Parameters4
SUSE Cloud 14Components5
SUSE Cloud 2Hours6
SUSE Cloud Troubleshooting 1Hour7
SUSE Cloud Building Blocks d APIsAppMonitorSec & isorHypervisorXen, KVMXen,KVMVmware,HyperVVmware, HyperVAdaptersAdaptersRBDRadosOperatingSystemSUSE LinuxEnterpriseServer 11 SP3Physical Infrastructure: x86-64, Switches, StorageOpenStack8SUSECloud AddsToolsManagementOS nerSolutionsCeph
Non-HA SUSE Cloud Installation 9
HA SUSE Cloud Installation 10
Just Enough HA for Troubleshooting11 crm resource list crm mon crm resource restart X crm resource cleanup X
More About HA.https://www.suse.com/documentation/sle ha/12
SUSE Cloud Functional Blocks CrowbarNovaChef .OpenStackCinderSLES13GlanceKeystoneNeutron
Crowbar and Chef14
Generic SLES Troubleshooting All Nodes in SUSE Cloud are SLES11 SP3 Watch out for typical issues: 15 –dmesg for hardware-related errors, OOM, interesting kernelmessages–usual syslog targets, e.g. /var/log/messagesCheck general node health via:–top, vmstat, uptime, pstree, free–core files, zombies, etc
Supportconfig16 supportconfig can be run on any cloud node supportutils-plugin-susecloud.rpm–installed on all SUSE Cloud nodes automatically–collects precious cloud-specific information for further analysis
Typical Deployment SchemaAdmin NodeControl NodeChef ClientChef ServerCrowbarProvisionerComputeComputeComputeCloud NodeNetwork NodeChef Client17
Cloud Installscreen install-suse-cloud bar/barclamp install/*.log18
SUSE Cloud Admin Node SUSE CloudAddonCrowbar UICrowbarServicesChef/RabbitRepo MirrorSLES 11 SP319Install /couchdb/couchdb.logCrowbar repo r/log/crowbar/production.{out,err}
Chef Cloud uses Chef for almost everything:–All Cloud and SLES non-core packages–All config files are overwritten–All daemons are started–Database tables are initializedhttp://docs.getchef.com/chef quick overview.html20
Admin Node: Using Chefknife node listknife node show nodeid export EDITOR /usr/bin/vim; \knife node edit -a nodeid node21
SUSE Cloud Admin Node Populate root/.ssh/authorized keysprior install Barclamp install logs:/var/log/crowbar/barclamp install Node discovery logs:/var/log/crowbar/sledgehammer/d macid . domain .log Syslog of crowbar installed nodes sent via rsyslog to:/var/log/nodes/d macid .log22
Useful Tricks Root login to the Cloud installed nodes should bepossible from admin node (even in discovery stage) If admin network is reachable: /.ssh/config:host 192.168.124.*StrictHostKeyChecking nouser root23
SUSE Cloud Admin Node If a proposal is applied, chef client logs are at:/var/log/crowbar/chef-client/ macid . domain .log Useful crowbar commands:crowbar machines helpcrowbar transition node state crowbar barclamp proposal list show name crowbar barclamp proposal delete default24
Admin Node: Crowbar Services Nodes are deployed via PXE boot:/srv/tftpboot/discovery/pxelinux.cfg/* Installed via AutoYaST; profile generated to:/srv/tftpboot/nodes/d mac . domain /autoyast.xml Can delete & rerun chef-client on the admin node Can add useful settings to autoyast.xml: confirm config:type "boolean" true /confirm (don’t forget to chattr i the file)25
Admin Node: Crowbar UIUseful Export Pageavailable in theCrowbar UI in order toexport various log filesfrom a customerinstallation26
Admin Node: Crowbar UIRaw settings in barclampproposals allow access to"expert" (hidden) optionsMost interesting are:debug: trueverbose: true27
Admin Node: Crowbar Gotchas28
Admin Node: Crowbar Gotchas Be patient–Do not multiple transition nodes from one state to another–Do not apply proposals while a proposal is applyingCloud nodes should boot from:1. Network2. First disk29
SUSE Cloud Nodes SUSE CloudAddonCloud NodeAll managed via Chef:/var/log/chef/client.logrcchef-client statusNodespecificservicesChef ClientSLES 11 SP330chef-client can be invokedmanually Should lock each other if maintenanceupdates are installed
SUSE Cloud Control Node SUSE CloudAddonControl NodeJust like any other cloud node:/var/log/chef/client.logrcchef-client statusOpenStackAPIservices.chef-clientChef overwrites all config files ittouches chattr i is your friendChef ClientSLES 11 SP331
OpenStack Architecture Diagram32
OpenStack Block diagramAccesses almosteverythingKeystone: SPOF33
OpenStack Architecture 34Typically each OpenStack component provides:–an API daemon / service–one or many backend daemons that do the actual work–command line client to access the API– proj -manage client for admin-only functionality–dashboard ("Horizon") Plugin providing a graphical view onthe service–uses an SQL database for storing state
OpenStack Packaging Basics Packages are usually named:openstack- codename –usually a subpackage for each service (-api, -scheduler, etc)–log to /var/log/ codename / service .log–each service has an init script:dde-ad-be-ff-00-01: # rcopenstack-glance-api statusChecking for service glance-api35.running
OpenStack Debugging Basics Log files often lack useful information withoutverbose enabled TRACEs of processes are not logged without verbose Many reasons for API error messages are not loggedunless debug is turned on Debug is very verbose ( 10GB per ck.org/icehouse/36
OpenStack ArchitectureAccesses almosteverythingKeystone: SPOF37
OpenStack Dashboard: Horizon/var/log/apache2/openstack-dashboard-error log38 Get the exact URL it tries toaccess! Enable “debug” in Horizonbarclamp Test componentsindividually
OpenStack Identity: Keystone39 Needed to access all services Needed by all services for checking authorisation Use keystone token-get to validate credentialsand test service availability
OpenStack Object Store: Swiftswift stat swift dispersion in Crowbar uses regular syslog formany messages:/var/log/messagesconsole 40easiest to debug usingcurl
OpenStack Imaging: Glance To validate lifeness:glance image-listglance image-download id /dev/nullglance image-show id 41
OpenStack Networking: Neutron Swiss Army knife for SDNneutron agent-listneutron net-listneutron port-listneutron router-list 42There's no neutron-manage
Basic Network Layout43
Networking with OVS:Compute nce/content/under the hood openvswitch.html44
Networking with LB:Compute Node45
Neutron TroubleshootingNeutron uses IP Networking Namespaces on theNetwork node for routing overlapping networksneutron net-listip netns listip netns exec qrouter- id bashping. arping. ip ro. curl .46
OpenStack Compute: Novanova-manage service listnova-manage logs errorsnova show id with admin privileges showscompute nodevirsh list or virsh dumpxml can be used to analyzestate of VM47
Nova "Launches" go to Scheduler; rest to Conductor48
Nova Booting VM Workflow49
Nova: Scheduling a VM 50Nova scheduler tries to select a matching computenode for the VM
Nova SchedulerTypical errors: No suitable compute node can be found All suitable compute nodes failed to launchthe VM with the required settings nova-manage logs errorsINFO nova.filters a2ef] FilterRamFilter returned 0 hosts51
Nova SchedulerTypical errors: No suitable compute node can be found All suitable compute nodes failed tolaunch the VM with the required settings–nova-manage logs errorsINFO nova.filters [req-299bb909-49bc-4124-8b88732797250cf5 47a52e6b02a2ef] FilterRamFilter returned 0 hosts52
OpenStack Volumes: CinderSchedulerAPI53VolumeVolumeVolumeVolume
OpenStack Cinder: VolumesSimilar syntax to Nova:cinder-manage service listcinder-manage logs errorscinder-manage host listcinder list show (with admin privs) shows volume-host54
Troubleshooting Cloud-Init OpenStack Services like Heat or Nova depend oncloud-init–sets host name, ssh keys, resizes disks, launches customscripts on boot Heat uses scripts to launch cfntools use curl on the metadata server inside the VM/var/lib/cloud//var/log/cloud-init.log55
Q&A http://ask.openstack.org/ http://docs.openstack.org/ nk you56
Bonus Material
OpenStack Orchestration: Heat58
OpenStack Orchestration: Heat Uses Nova, Cinder, Neutron to assemble completestacks of resourcesheat stack-listheat resource-list show stack heat event-list show stack 59Usually necessary to query the actual OpenStackservice for further information
OpenStack Imaging: Glance Usually issues are in the configured glance backenditself (e.g. RBD, swift, filesystem) so debuggingconcentrates on those Filesytem:/var/lib/glance/images RBD:ceph -wrbd -p pool ls60
SUSE Cloud 61
Unpublished Work of SUSE. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE.Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope oftheir assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market aproduct. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in makingpurchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.The development, release, and timing of features or functionality described for SUSE products remains at the solediscretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, atany time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced inthis presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. Allthird-party trademarks are the property of their respective owners.
All Nodes in SUSE Cloud are SLES11 SP3 Watch out for typical issues: - dmesgfor hardware- related errors, OOM, interesting kernel messages - usual syslog targets, e.g. /var/log/messages Check general node health via: - top, vmstat, uptime, pstree, free - core files, zombies, etc