AvailabilityGuard William Weber, Market Experts

Transcription

AvailabilityGuardPreventing Outages on YourCritical IT InfrastructureWilliam Weber, Market ExpertsWilliam.weber@markedist.com

About us»Founded in 2005, serving leadingenterprises worldwide»We help our customers to» Prevent outages on their critical ITinfrastructure» Secure their data storageenvironmentSelectedPartnersADVANCED TECHNOLOGY PARTNER

The Challenge: Outage PreventionOn-premise & Private cloudApp ServerDatabase SoftwareOutageClusteringOSHypervisor & Private-Cloud ServicesCompute HardwareSANPublic cloudStorage & Storage ServicesSingle-Point-of-FailureComplexityThousands of vendor best-practicesOn-Premise or Hybrid ITEngineered for always-on operation3Constant configuration changes

The Solution: AvailabilityGuard»Automatic daily verification of Production, HA & DR systems»Validates Compliance with Vendor Best Practices»Validates that HA systems are always fail-over ready»Validates that Production and DR are always in sync»Clear visibility into RPO and other key Resilience metrics»Supports both on-prem and public cloud environmentsAvailabilityGuard helps make IT work – ALL THE TIME4

How AvailabilityGuard worksDetectCollect» Correlates config across layers tobuild a visual topology» Analyzes config using a built-in riskdetection engine ( 7,000 issues)» Detects single-points-of-failure andother misconfigurations1» Daily collection of configuration fromall infrastructure layers» Non-intrusive» Agentless24» Single-pane-of-glass for configurationquality & operational stability» Presents issues by application orbusiness service» Automatically reports on successfulresolution of issues3Visualize & track5» Sends actionable alerts to appropriateteams» Suggests remedial steps to preventfuture outages» Integrates with existing incidentmanagement systemsPrescribe

AvailabilityGuard knowledgebase ( 7,000 issues)Data protectionAvailability managementReplication› Data completeness› Data consistency› Process failuresData protection SLA››››RPO managementData retentionPerformanceProtection, right locationSAN best practices› I/O multi-pathing bestpractices› SAN security / mable storageI/O, replication replicationServer performanceSAN best practicesData access› Access to shared storage(HA) and replicas (DR)› Redundancy and performanceVirtualization› Storage allocation› Dependency mappingDatabase› Data protection validation,detect corruption› Performance› Vendor recommendationsHost configuration››››OS version / SPs / patchesInstalled products / versionsKernel parametersNetwork servicesVirtualization› HA & DR› Vendor best practicesClustering› Consistent configurationacross cluster nodes› Vendor best practices› Local / geo clusteringApplication Server› Load balancing› Deployment best practicesRedundancy› Multi-pathing, Network,NIC / teaming› DNS, LDAP, AD› DB file configuration

Operational stability dashboard7

Drill-down on issues, with automatic visualization8

Single-point-of-failure at blade chassis levelAnti AffinityRuleVM1VMs associated with the rulechassis-1VM2Singlepoint offailure(1) Active-ActiveWindows VMsseparated todifferent hardwareby VMware AntiAffinity rules inorder to ensureservice availabilityand prevent singlepoint of failure(2) The VMs arerunning ondifferent ESXi hostsbut all of them arerunning on thesame BLADECHASSIS

Examples ofIssues Detected

Storage access issue in clusterProduction siteCluster Impact: cluster not ready for recovery. Downtimeon both automated-failover and manual switch-over.ClusterServiceXFailover / switch-overClusterServiceX Shared LUN notmapped to all nodes.11

Cluster configuration driftHardware2 x HBASoftwareMicrosoft .NET 2.0 SP 2Windows x64 SP 1Oracle MTS Recovery Service Failover/HA broken.Unexpected downtime whenleast desired.DNS 0Page Files1 x 1 GB (c:\)1 x 4 GB (d:\)Kernel ParametersNumber of open files: 32767OS configuration12Hardware1 x HBASoftwareMicrosoft .NET 2.0 SP 1Windows x64 SP 1Oracle MTS Recovery ServiceDNS Configuration192.168.68.51HA blueprint (clustered, LB, ) Configuration driftbetween serversPage Files1 x 1 GB (c:\)1 x 4 GB (d:\)Kernel ParametersNumber of open files: 8192OS configuration

SAN I/O path – single-point-of-failureProduction site Single-point-of-failure °raded performance1 Array Port Mapping & single I/O path4 Array Port Mappings & multiple I/O paths4 Array Port Mappings & multiple I/O pathsDB / Filesystem13Storage array

Partial replicationSite A Site B More capacityrequired. New Storagevolume allocatedDB/Filesystem/ No replication. Data lossupon fail-over / workloadshiftNo replicationSRDF/S (synchronized)SRDF/S (synchronized)Symmetrix VMAX14Symmetrix VMAX

Deadly misconfigurations in virtual infrastructureProduction site Impact: VMs can’t communicate with peers,leading to application failuresClusterPort group label: SAP 01VLAN ID:6SAP-016SAP 015 Incorrect label (typo?) Inconsistent VLAN ID (typo?)15SAP 016SAP 016SAP 016

Support matrixOS, Hypervisors & BladesStorage & SAN Linux RH 3 SuSE 8 Amazon Linux EMC Symmetrix: DMX VMAX PowerMAX Oracle 8.1.7 Exadata Amazon Web Services Windows Server (all releases) EMC XtremIO Data Domain Isilon MS SQL Server 2000 SP3 Microsoft Azure* Solaris 8 HP-UX 11.0 AIX 4 EMC VNX SAN Unity VPLEX Sybase 12.5 DB2 UDB 8.1 VMware vSphere Microsoft Hyper-V NetApp FAS/AFF: cDot 7-mode AWS RDS Azure Database* IBM PowerVM Oracle VM Zones Hitachi VSP USP AMS G-Series HCP Cisco UCS HP BL/Synergy IBM DS XIV SVC Storwize A/V9000/RLVM & Multi-Pathing All supported OS LVMs VxVM LVM 2 ASM ZFS more EMC TimeFinder SRDF RecoverPoint Infinidat InfiniBox EMC MirrorView SnapView Active-Active SAN: Brocade Cisco HP VirtualConnect NetApp SnapMirror SnapShots SnapVault Hitachi TrueCopy ShadowImage GADConverged & HCI Native: Linux Windows AIX HPUXPVLinks Solaris MPxIO ESXi EMC vxRail vxRack SDDC Vblock/VxBlock NetApp FlexPod HPE ConvergedSystem VMware HA / FT / SRM / vMSC IBM Pure Systems Cisco HyperFlex VMware VSAN EMC ScaleIO IBM PowerHA (HA/CMP)Application Servers Microsoft Cluster IBM WebSphere Oracle RAC & CRS HP MC/SG PolyServe Oracle WebLogic VCS Sun Cluster Linux cluster Apache Tomcat(*) Public Cloud roadmap items16Replication HP XP 3PAR EMC PowerPath Veritas DMP HitachiHDLM IBM SDD NetApp DSMClusteringDatabases Hitachi UniversalReplicator TrueShadow HP Snapshot RemoteCopy IBM Flash/Global Copy Metro/Global Mirror Oracle Data Guard GoldenGate Microsoft SQL Server Always On Veritas Volume Replicator Infinidat Snapshot Clone RemoteCopy Zerto vSphere replication AWS snapshots S3 replication Azure snapshots storage replication*Cloud VendorsContainers & Orchestration Amazon EC2 Container Service (ECS) Azure Service Fabric (ASF) * Kubernetes (Unmanaged / managed) DockerLoad balancers & DNS F5 AWS ELB/ALB Amazon Route 53 Azure Load Balancer ApplicationGateway * Azure Traffic Manager *Cloud Storage Amazon Elastic Block Storage S3 Glacier Azure Blob / Disk Storage *

Architecture: On-premise8 SSH to CLI proxy (Symmetrix /CLAR / VNX / DS / XIV / 3PAR) SSH (V7000 / SVC /DataDomain / Isilon /RecoverPoint) HTTP (HDS / HP XP / VPLEX) ZAPI (NetApp Filer)SSH (EMC/IBM)HTTP (HDS/HP/NETAPP)1Master: Win Server2K8/12/16 AG softwareStorage arrays7211i/12c3 Scale-outcollectors(optional)SOAP (vCenter)SSH (Unix)WRM/WMI (VMM)JDBCSSH (Unix), WMI/WinRM(Windows) / blade manager6 OS and vendorcommands / queries UCS Manager, HP VC5 AIX VIO: HMC CLI / SSH VMware: vCenter API Hyper-V: SCVMM CLI UNIX: OS commandsQuery metadata tables /consolePrivate cloudCisco MDS CLIHP vConnect CLIBrocade CLIBNA Rest APISAN switches417 SSH / HTTP / RestDatabasesServers(physical & virtual)All executed commandsare strictly read-only

Next Step: AvailabilityGuard HealthCheck» Detects single-points-failure and misconfigurations that cause downtime or data lossin production» Performed by a Continuity Software engineer using AvailabilityGuard» Includes a one-time scan of up to 100 physical servers and all their associatedinfrastructure (VMs, storage, clustering, databases )» Initial results viewable during the HealthCheck» A complete and extremely valuable HealthCheck report delivered following theHealthCheck› See Sample HealthCheck report» Minimal customer effort required18

Thank You!William Weber Directorwilliam.weber@markedist.com 34 679 250 046Market Experts Distribution, SL http://markedist.com/Copyright 2020 Continuity Software

Hyper-V: SCVMM CLI UNIX: OS commands SSH / HTTP / Rest Architecture: On-premise Cisco MDS CLI HP vConnect CLI Brocade CLI BNA Rest API OS and vendor commands / queries UCS Manager, HP VC Query meta-data tables / console 1 2 3 Master: Win Server 2K8/12/16 AG software Scale-out collectors (optional) 5 Databases .