Exadata Database Machine: Maximum Availability Architecture (MAA) - Oracle

Transcription

Exadata Database Machine:Maximum Availability Architecture (MAA)Platinum Tier FocusedApril 2020

Safe harbor statementThe following is intended to outline our general product direction. It is intended for informationpurposes only, and may not be incorporated into any contract. It is not a commitment to deliver anymaterial, code, or functionality, and should not be relied upon in making purchasing decisions. Thedevelopment, release, timing, and pricing of any features or functionality described for Oracle’sproducts may change and remains at the sole discretion of Oracle Corporation.Copyright 2020 Oracle and/or its affiliates.2

Oracle Maximum Availability Architecture(MAA) Solution OptionsCopyright 2020 Oracle and/or its affiliates.3

BRONZEDev, Test, Prod - Single Instance orMultitenant Database with Backups Single Instance with ClusterwareRestart Advanced backup/restore withRMAN Optional ZDLRA withincremental forever and nearzero RPOPrimary Availability DomainSecondary Availability ackupsOutage MatrixUnplanned OutageRTO / RPO Service Level Objectives (f1)Recoverable node or instance failureMinutes (f2)Hours to days. RPO since last backup ornear zero with ZDLRA Storage redundancy andvalidation with ASMDisasters: corruptions and site failures Multitenant Database/ResourceManagement with PDB featuresPlanned MaintenanceSoftware/hardware updatesMinutes (f2) Online MaintenanceMajor database upgradeMinutes to hour Some corruption protection Flashback technologiesCopyright 2020 Oracle and/or its affiliates.f1 : RPO 0 unless explicitly specifiedf2 : Exadata systems has RAC but Bronze Exadata configuration with Single Instance databaserunning with Oracle Clusterware has highest consolidation density to reduce costs4

SILVERProd/DepartmentalPrimary Availability DomainRACDatabaseBronze Secondary Availability DomainLocalBackupReal Application Clustering (RAC)Application ContinuityReplicatedBackupsOutage MatrixChecklist found in MAA savailability-5169724.pdfCopyright 2020 Oracle and/or its affiliates.Unplanned OutageRTO/RPO Service Level Objectives(f1)Recoverable node or instance failureSingle digit seconds (f2)Disasters: corruptions and site failuresHours to days. RPO since last backup ornear zero with ZDLRAPlanned MaintenanceSoftware/Hardware updatesZero (f2)Major database upgradeMinutes to hourf1: RPO 0 unless explicitly specifiedf2: To achieve zero downtime or lowest impact, apply application checklist bestpractices5

Transparent Application Continuity (TAC)Application does not see errors during rs/Timeouts hiddenCopyright 2020 Oracle and/or its affiliates. Uses Application Continuity andOracle Real Application Clusters Transparently tracks and records sessioninformation in case there is a failure Built inside of the database, so it workswithout any application changes Rebuilds session state and replays in-flighttransactions upon unplanned failure Planned maintenance can be handled byTAC to drain sessions from one or morenodes Adapts as applications change:protected for the future6

Planned MaintenancePlanned Maintenance (without Outages!):1. Database Service is relocated or stopped2. Service starts on another RAC instance3. Sessions connected to the service are drained4. New sessions connect to Service on another instance5. Results from Database Request returned to user6. Maintenance activities can start on first node (rolling)534126Copyright 2020 Oracle and/or its affiliates.RAC Cluster7

Unplanned Outages, without ImpactOutage or Interruption at Database:1. Database Request interrupted by an Outage or timeout2. Session reconnects to the RAC Cluster (or Standby) and3. Database Request replays automatically4. Result from Database Request returned to user42213PrimaryCopyright 2020 Oracle and/or its affiliates.3Active Data Guard Standby8

Checklist for Achieving Zero Application Downtime1.2.3.4.5.Use Oracle Clusterware Service (never use default service)Use Recommended Connection StringConfigure FAN for Connection PoolDrain your serviceUse Application Continuity or Transparent Application Continuity1) MAA Whitepaper: Application Checklist for Continuous Service for MAA Solutions2) Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata(MOS 2385790.1)3. Fleet Patch and Provisioning incorporates MAA practicesCopyright 2020 Oracle and/or its affiliates.9

GOLDPrimary RegionSecondary RegionDG FSFOAD1AD2Mission CriticalSilver Active Data Guard Comprehensive Data ProtectionMAA Architecture: At least one standby requiredacross AD or region. Primary in one data center(or AD)replicated to a Standby in anotherdata center Active Data Guard Fast-StartFailover (FSFO) Local backups on both primary andstandby Copyright 2020 Oracle and/or its ndbyLocalbackupOutage MatrixUnplanned OutageRTO/RPO Service Level Objectives (f1)Recoverable node or instance failureSingle digit seconds (f2)Disasters: corruptions and site failuresSeconds to 2 minutes. RPO zero orsecondsPlanned MaintenanceSoftware/Hardware updatesZero (f2)Major database upgradeLess than 30 secondsf1: RPO 0 unless explicitly specifiedf2: To achieve zero downtime or lowest impact, apply application checklist best practices10

Active Data Guard OverviewOffload read only or readmostly workloads to thestandby databaseStandbyOpen Read-OnlyPrimaryOpen Read-WriteDML RedirectionZero Data Loss at any DistanceAutomatic Block Repair Synchronous zero data loss replication Database rolling upgrade to reduce downtimefor planned maintenanceMulti-instance RedoApply for RAC(In Memory supported) Automatic failover for High AvailabilityCopyright 2020 Oracle and/or its affiliates.11

Active Data Guard Far SyncZero Data Loss Protection at Any DistanceSYNCLimited distanceASYNCAny distanceRedo compressed over WANPrimary Database Production copyCopyright 2020 Oracle and/or its affiliates.Far Sync Instance Oracle control file and log files No database files No media recovery Offload transport compression and/orencryptionActive Standby Database Zero data loss failover target Database open read-only Continuous Oracle validation Manual or automatic failover12

PLATINUMPrimary RegionAD2Extreme Critical13Gold GoldenGate Active/ActiveReplication Optional Sharding & Editions BasedRedefinitionMAA Architecture: Each GoldenGate “primary” replicaprotected by Exadata, RAC andActive Data Guard Primary in one data center (or AD)replicated to another Primary inremote data center (or AD) Oracle GG & Editions BasedRedefinition for zero downtimeapplication upgrade Sharding for scalability and faultisolation Local backups on both sites Achieve zero downtime throughcustom failover to GG replicaCopyright 2020 Oracle and/or its affiliates.Secondary aryPrimaryLocalbackupStandbyOutage MatrixUnplanned OutageRTO/RPO Service Level ObjectivesRecoverable node or instance failureZero or single digit secondsDisasters including corruptions and site failuresZero (f3)(f1)Planned MaintenanceMost common software/hardware updatesZero (f2)Major database upgrade, application upgradeZero (f3)f1: RPO 0 unless explicitly specifiedf2: To achieve zero downtime or lowest impact, apply application checklist best practicesf3: Application failover is custom or with Global Data Services13(f2/f3)

Data Center Architecture & RequirementsPrimary Region – West NASAD1AD2 A minimum of 2 Regions for DisasterRecovery Failover Region is a localized geographic area West Coast NAS – Primary example East Coast NAS – Secondary example Each Region should have a minimum of 2Availability Domains (AD) Availability Domain Characteristics AD’s are isolated from each other & faulttolerantSecondary Region – East NAS AD’s do not share infrastructure such as power,cooling or AD Network A failure of one AD does not effect other AD’s.AD1AD2Copyright 2020 Oracle and/or its affiliates. AD’s within a Region are connected via highspeed network within same geographical area.High Speed with 1ms Latency14

Platinum Reference ArchitectureApplication TierProdAAD1Read/WriteAD2ReadSTBYAPrimary Region 1 – West USSync Transport with Zero Data LossSecondary Region – East USProdBRead/WriteReadAD2STBYBApplication TierAD1Sync Transport with Zero Data LossCopyright 2020 Oracle and/or its affiliates.Active Data Guard Fast-Start Failover,Oracle GoldenGate Replication15

Reference Architecture – Zero App Downtime and Zero Data Loss(Disaster Scenario: Loss of Entire Data Center)Primary Region 1 – West USApplication TierRead/WriteAD2ProdAProdAAD1Read/WriteOptional Client failover to ProdBAutomatic Data Guard FailoverAchieve eventual Zero Data Lossby synchronizing replicasSecondary Region – East USCopyright 2020 Oracle and/or its affiliates.Application TierRead/WriteReadZero App and DB DowntimeWith ProdB ReplicaActive Data Guard Fast-Start Failover,AD2STBYBProdBAD1Oracle GoldenGate Replication16

Reference Architecture – Switching BackProdA returns to Primary and STBYA to StandbyApplication TierProdAAD1Read/WriteAD2ReadSTBYAPrimary Region 1 – West US1. Reinstate Failed System2. Data Guard Switchover toreturn to original stateSecondary Region – East USProdBRead/WriteCopyright 2020 Oracle and/or its affiliates.Active Data Guard Fast-Start Failover,ReadAD2STBYBApplication TierAD1Oracle GoldenGate Replication17

Reference Architecture – Upgrade ScenarioPrimary Region 1 – West USProdARead/WriteAD2STBYAApplication TierAD1ReadSync Transport with Zero Data LossV1V1Secondary Region – East USProdBRead/WriteV1Copyright 2020 Oracle and/or its affiliates.AD2STBYBApplication TierAD1ReadSync Transport with Zero Data LossActive Data Guard Fast-Start Failover,V1Oracle GoldenGate Replication18

Upgrade Scenario Step 1: Upgrade Prod B and StandbyPrimary Region 1 – West USApplication TierRead/WriteReadSync Transport with Zero Data LossV1V1Optionally redirect to Region 1 if application allowsApplication TierProdBAD11.Upgrade Prod BAsync Transport during UpgradeV22.Read/WriteValidateActive Data Guard Fast-Start Failover,AD23. Restart Standbyon V2 OHLSTBYBSecondary Region – East USCopyright 2020 Oracle and/or its affiliates.AD2STBYAProdAAD14. Upgrade V2with redo applyOracle GoldenGate Replication19

Upgrade Scenario Step 2: Synchronize GG ReplicasPrimary Region 1 – West USApplication TierRead/WriteRead/WriteReadSync Transport with Zero Data LossV1GG Catch UpV1Optionally redirect to Region 1 if application allowsApplication TierAD1AD2LSTBYBSecondary Region – East USProdBRead/WriteV2Copyright 2020 Oracle and/or its affiliates.AD2STBYAProdAAD1Sync Transport with Zero Data LossActive Data Guard Fast-Start Failover,V2Oracle GoldenGate Replication20

Upgrade Scenario Step 3: Co-Exist with V1 and V2Primary Region 1 – West USApplication TierRead/WriteAD2STBYAProdAAD1ReadSync Transport with Zero Data LossV1V1Secondary Region – East USProdBRead/WriteV2Copyright 2020 Oracle and/or its affiliates.Application TierAD2ReadSync Transport with Zero Data LossActive Data Guard Fast-Start Failover,STBYBAD1V2Oracle GoldenGate Replication21

Platinum Advantages for UpgradeFinal Decision PointBenefits1. Zero Downtime and Zero Data Loss2. Evaluate V1 and V2 at the same time3. GoldenGate replication between V1 and V2 provides simple switchoverand fallbackOnce V2 has been validated and deemed acceptable, then: Repeat process and upgrade both V1 primary and standby at the sametimeCopyright 2020 Oracle and/or its affiliates.22

Upgrade Scenario Step 4: Upgrade Prod A and Standby B to V2Primary Region 1 – West USApplication Tier1.Upgrade Prod A Read/WriteAD23. Restart Standbyon V2 OHReadAsync Transport During UpgradeV22.STBYAProdAAD14. Upgrade V2with redo applyValidateSecondary Region – East USApplication TierProdBRead/WriteV2Copyright 2020 Oracle and/or its affiliates.AD2ReadSync Transport with Zero Data LossActive Data Guard Fast-Start Failover,STBYBAD1V2Oracle GoldenGate Replication23

Upgrade Scenario Steps 5/6: Synchronize and Back to NormalPrimary Region 1 – West USReadApplication TierAD2Read/WriteReadSync Transport During UpgradeV2STBYAProdAAD1V25. Synchronize GGSecondary Region – East USApplication TierProdBRead/WriteV2Copyright 2020 Oracle and/or its affiliates.AD2ReadSync Transport with Zero Data LossActive Data Guard Fast-Start Failover,STBYBAD1V2Oracle GoldenGate Replication24

Unplanned Outages for Platinum MAA with ExadataUnplannedOutagesDatabaseDowntime (RTO)ApplicationImpactData Loss (RPO)Key EnablersExadata ClusterNetwork Fabric orStorage FailuresZeroZero or Near ZeroZeroExadataASM Disk Groups in High RedundancyRAC Instance orNode FailuresZeroSingle DigitSecondsZeroExadata, RACApplication Continuity with MAAChecklistData CorruptionsZeroZero or IsolatedFailureZero or IsolatedLogical ImpactActive Data GuardMOS 1302539.1Flashback TechnologiesZDLRADisastersincludingdatabase, clusteror site failuresZero since GGreplica is availableZero or Near ZeroSingle DigitSeconds with GDSEventual ZeroOracle GoldenGateData Guard Fast-Start FailoverCustom App Failover or Global DataServices or Site GuardCopyright 2020 Oracle and/or its affiliates.25

Planned Maintenance for Platinum MAA with ExadataPlannedMaintenanceDatabase Downtime(RTO)Application ImpactKey EnablersExadataInfrastructure SW orHW UpdatesZeroZero or Near ZeroExadata PlatformASM Disk Groups in High RedundancyDatabase and GridInfrastructureSoftware UpdatesZeroZeroRACApplication ContinuityContinuous Availability - Application Checklistfor Continuous Service for MAA SolutionsDatabase Upgradesor non-RollingUpdatesZeroZero or Near ZeroGoldenGateCustom Application failover or Global DataServicesCopyright 2020 Oracle and/or its affiliates.26

Our mission is to help peoplesee data in new ways, discover insights,unlock endless possibilities.

Active Data Guard Primary in one data center (or AD) replicated to another Primary in remote data center (or AD) Oracle GG & Editions Based Redefinition for zero downtime application upgrade Sharding for scalability and fault isolation Local backups on both sites Achieve zero downtime through custom failover to GG replica