VSS Automated Disaster Recovery - 3.vmware

Transcription

Automated Disaster RecoverySimon CarusoSenior Systems Engineer, VMware

What is a Disaster?Complete loss of a data center for an extendedperiod of timeDeclaration of a disaster usually requires consensus frommultiple parts of the organization (at the C*O level)What is not a disaster?Failure of an individual hostA temporary service interruption

The Current State of Physical Disaster RecoveryTierIRPORTOImmediate ImmediateCost II4 hrs.8 hrs. III24 hours24 - 48hours DR services tiered according to business needsPhysical DR is challengingMaintain identical hardware at both locationsApply upgrades and patches in parallelLittle automationError-prone and difficult to test

Advantages of Virtual Disaster RecoveryVirtual machines are portableVirtual hardware can be automatically configuredTest and failover can be automated (minimizes human error)The need for idle hardware is reducedCosts are lowered, and the quality of service is raised

Introducing VMware Site Recovery ManagerSite Recovery Manager leverages VMware Infrastructure to deliveradvanced disaster recovery management and automationSimplifies and automates disasterrecovery workflows:Setup, testing, failoverTurns manual recovery runbooksinto automated recovery plansProvides central management ofrecovery plans from VirtualCenterWorks with VMware Infrastructureto make disaster recovery rapid,reliable, manageable, affordable

Site Recovery Manager at a GlanceSite ARecoverySiteProtectedSiteVirtualCenterSite BSite RecoveryManagerProtectedSiteSupports bidirectional siteprotectionRecoverySiteVirtualCenterSite RecoveryManagerProtected VMs poweredoffline onProtected VMsonlinebecomein unavailableProtected SiteArray ReplicationDatastore GroupsDatastore Groups

Server Side Components *Site 2Site 1VC Server 1VC Server 2VCMS 2 DBVCMS 1 DBSRM Server 1SRM 1 DBSRM Server rArray 1Block Replication SWSRM 2 DBArray 2Block Replication SW* Note: Conceptual drawing only. Site Recovery Manager Server may run on another system than VCMS

Key Concepts And Their RelationshipsVMFS 1LUN 1Recovery Plan 1(Whole Site)Datastore Group 1Protection Groups:Protection Group 1Protection Group 2Protection Group 3VMFS 2LUN 2Protection Group 1LUN 3LUN 5Recovery Plan 2(Subset)Protection Groups:Protection Group 1VMFS 4LUN 4Protection Group 2VMFS 3Datastore Group 2Datastore Group 3Protection Group 3Protected SiteRecovery Site

Array Integration with Site Recovery ManagerVendor-specific scripts support:Array discoveryReplicated LUN discoveryTest initiation (simulated failover in an isolated environment)Failover initiation (actual failover of services to the recovery site)In cooperation with VMware and with the full support of VMware the storagevendors create the storage replication adapters for their respective storage arrays

VMware Site Recovery Manager LicensingSite 1 Site 2ProtectedSiteVirtualCenterSite RecoveryManagerRecoverySiteVirtualCenterSRM Protected VMsSRM licensed per CPU socket onthe ESX server that hosts theprotected virtual machinesin the Protected SiteVMs not protected by Site Recovery ManagerSite RecoveryManager

Site Recovery Manager 1.0 PrerequisitesESX 3.0.2, ESX 3.5 or ESXiVirtualCenter (VC) server version 2.5 installed at the protected siteand at the recovery siteSite Recovery Manager server installed at the protected and at therecovery siteSite Recovery Manager plug-in installed on the VMwareInfrastructure Clients that will access the protected and recovery siteNetwork configuration that allows TCP connectivity between VCservers and SRM serversAn Oracle or SQL Server database that uses ODBC for connectivityin the protected site and in the recovery siteA Site Recovery Manager license file installed on the VC licenseserver at the protected site and at the recovery sitePre-configured array-based replication between the protectedsite and the recovery site

Site Recovery Manager Installation WorkflowAt the protected site the following activities are completed:Installation of the SRM serverInstallation of the SRM Plugin into the VI ClientInstallation of the Storage Replication Adapter (SRA)At the recovery site the following activities are completed:Installation of the SRM serverInstallation of the SRM Plugin into the VI Client *Installation of the Storage Replication Adapter (SRA)It is important to complete the workflows in the orderdetailed in this presentation* Note: Optional step, only required if a different instance of the VI Client is used to access the recovery site

Site Recovery Manager User InterfaceSRM UIAccessLocal andPaired SiteProtectionSetupRecoverySetup

Setup Workflow – Protection SiteAt the protection site the following setup activities are completed:The user pairs the SRM servers at the protected and recovery sitesSecurity certificates are established between the SRM servers and theVC serversCertificates that are not properly signed willresult in the Yellow Warnings Signs.Reciprocity will still be established allowingyou to continue to the next step in theworkflow.

Setup Workflow – Protection Site (continued)Array Managers ConfigurationSelect the correct Manager Type fromthe Manager type drop down boxStorage Partner ParticipationVMware provides the SRA specificationStorage Partners create the SRAStorage Partners test the SRAVMware review the SRA test resultsSRA support with SRM granted if alltest are passed

Setup Workflow – Protection Site (continued)SRM identifies available arrays in the Protection and Recovery Sideand the replicated datastores and determines the datastore groupsProtection SideArray DiscoveryRecovery SideArray DiscoveryReplicated DatastoresandDatastore Groups

Setup Workflow – Protection Site (continued)Using the Inventory Preferences Mapper, the user maps resources inthe protected site to their counterparts in the recovery site.

Setup Workflow – Protection Site (continued)A protection group is a group of VMs that will be failed overtogether to the recovery siteWorking through the Protection Group wizard you will need to selecta temporary location for placeholder VM configuration files for theprotected VMs at the recovery site.

Setup Workflow – Protection Site (continued)Working through theProtection Groupwizard a user selectswhich VMs need tobe protected andassigns them to aprotection groupThe creation of aprotection groupresults in VCinventory updates inthe recovery site

Setup Workflow – Recovery SiteAt the recovery site the following setup activity is completed:The user creates a recovery plan which is associated to a single ormultiple protection groups

Site Recovery Manager Recovery PlanVM ShutdownHigh PriorityVM ShutdownPrepareStorageHigh PriorityVM RecoveryNormal PriorityVM Recovery

Site Recovery Manager Recovery Plan (continued)Low PriorityVM RecoveryPost TestCleanupStorageResetSite Recovery Manager Recovery Plan Benefits:Turn manual BC/DR run books into an automated processSpecify the steps of the recovery process in VirtualCenterProvide a way to test your BC/DR plan in an isolated environmentat the recovery site without impacting the protected VMs in theprotected site

Testing a Recovery PlanSRM enables you to ‘Test’ a recovery plan by simulating a failover withzero downtime to the protected VMs in the protected siteStorage configuration during a SRM Test failover from Site A to Site Bfor datastore ‘shared-san-2’Site A - Protected SiteSite B - Recovery SiteData Replication continues between the Source LUN and Target LUNThe data synchronization between the Target LUN and the Clone LUN is suspendedRead WriteEnabledWrite Disabled(read only)Read WriteEnabledSource LUN(shared-san-2)Target LUN(shared-san-2)Clone LUN(shared-san-2)Protected VMs(app vm7 to app vm12)Protected VMs that will berecovered to Site BProtected VMs(app vm7 to app vm12)Protected VMs powered onin Site B during the SRMTest failoverNote: Datastore ‘shared-san-1’ will be in the same configuration state as ‘shared-san-2’

Testing a Recovery Plan (continued)Recovery OnlyStatusSuccessErrorsSuccessWaiting for InputTest Only

Executing an Actual FailoverWARNING - Executing an actual failover will permanently alter virtual machines andinfrastructure of both the protected and recovery sitesStorage configuration after running a Recovery in SRM (Actual Failover)from Site A to Site BSite A - Protected SiteSite B - Recovery SiteData Replication is suspendedWrite Disabled(read only)Source LUN(shared-san-2)Protected VMs(app vm7 to app vm12)All powered off by SRMAt start of SRM RecoveryRead WriteEnabledTarget LUN(shared-san-2)Protected VMs(app vm7 to app vm12)All powered on by SRMduring the SRM RecoveryNote: A Clone LUN is not used during an actual failover in SRM.

Executing an Actual Failover (continued)WARNING - Executing an actual failover will permanently alter virtual machines andinfrastructure of both the protected and recovery sitesWARNING - Failback to the protected site is a not an automated process in SRM 1.0

Datastore Re-signature in Site Recovery ManagerSRM will automatically performa re-signature on the Datastoresin the Recovery Site that werereplicated from the SRMProtected SiteLVM.EnableResignature 1With a typical re-signature Datastore names will change tosnapxxxx datastorename, forexample snap-00000002-shared-san-1 snap-00000002-shared-san-2With a SRM initiated re-signature Datastore will maintain theoriginal datastore name shared-san-1 shared-san-2WARNING - The re-signature of the target datastore has implicationsduring a failback (resync) of data back to the SRM Protected Site

Failback Options with Site Recovery Manager 1.0SRM 1.0 does not provide a push-button automated failbackprocessFailback OptionsWithout SRM (no Recovery Plan, no Testing capabilities, no audit trail)Unregister the protected virtual machines in the Protected Site VCWork with your storage team, reverse data replicationVM re-inventory in Protected Site VC, restart and re-ip (manual or scripted)With SRM (Recovery Plan, Test before Recovery, built-in audit trail)Delete the protection groups in the Protected Site VCUnregister the protected virtual machines in the Protected Site VCWork with your storage team, reverse data replicationLeverage SRM, complete SRM workflows in the reverse direction fromRecovery Site back to the Protected SiteRepeat the above steps from the Protected Site back to the Recovery Site tocomplete the re-protection of the virtual machines in the Protected Site

Default Roles and Privileges in Site Recovery Manager

Alarms and Site Status MonitoringSRM will support the following alarm notification actions:Send e-mail to specified addressSend SNMP trap to VC trap receiversExecute specified command on VC hostWe recommend you complete setup of alarm notifications for:Remote Site DownRemote Site Ping FailedReplication Group RemovedRecovery Plan DestroyedLicense Server Unreachable

Site Recovery Manager Server MonitoringSRM will raise VC events for the following conditions:Disk Space LowCPU use exceeded limitMemory lowRemote Site not respondingRemote Site heartbeat failedRecovery Plan Test started, ended, succeeded, failed, or cancelledVirtual Machine Recovery started, ended, succeeded, failed, orreports a warning

Site Recovery Manager Core BenefitsExpand disaster recovery protectionNow any workload in a VM can be protected with minimal incrementaleffort and costReduce time to recoveryAs soon as disaster is declared, a single button kicks off recoverysequence for hundreds of VMsIncrease reliability of recoveryReplication of system state ensures a VM has all it needs to startupHardware independence eliminates failures due to different hardwareEasier testing based off of actual failover sequence allows morefrequent and more realistic tests

SummarySite Recovery Manager Leverages VMwareInfrastructure to Make Disaster RecoveryRapidAutomate disaster recovery processEliminate complexities of traditional recoveryReliableEnsure proper execution of recovery planEnable easier, more frequent testsManageableCentrally manage recovery plansMake plans dynamic to match environmentAffordableUtilize recovery site infrastructureReduce management costs

Simplifies and automates disaster recovery workflows: Setup, testing, failover Turns manual recovery runbooks into automated recovery plans Provides central management of recovery plans from VirtualCenter Introducing VMware Site Recovery Manager Works with VMware Infrastructure to make disaster recovery rapid, . Recovery Plan Test started .