Real World Mission Critical Database M It I T AT&T Ith Monitoring At AT .

Transcription

Real World MissionCritical DatabaseM it iMonitoringatt AT&T withithOracle EnterpriseManagerOracle Open World – 2010Presenteddbby Venkat Tekkalur– Principal Technical ArchitectPrem VenkatasamyDirector IT 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

AgendaCompany ProfileCh llChallengesRequirementsApproachInfrastructureEM Implementation DetailsBenefitsCommon CommandsQ/A2 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

AT&T AT&T is a leading provider of wireless, Wi-Fi, high speedInternet, and voice services 90 1 million wireless subscribers 90.1 More than 129,000 Wi-Fi hotspots around the globe The nation’s fastest mobile broadband network AT&T’s global network handles nearly 19 petabytes oftraffic on an average business day 2.52 5 millionilliAT&T UU-verse TV subscribersb ib 100 percent of Fortune 1000 companies are AT&Tcustomers In 2010, again ranked among Fortune’s 50 Most AdmiredCompanies Global headquarters located in Dallas,Dallas Texas3 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

AT&T DBA TeamOne of the DBA support teams in AT&T managingdatabases.databases 2000 ORACLE DBs MMultiplelti l VVersionsi Features RAC Data Guard Golden Gate Streams Flashback 60 DBAs Multiple sub teams4 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

ChallengesDatabase Management and Diagnostics: Wide range of ad hoc tools inuse. Management complexity.Database Monitoring : Multiple home grown custom monitoring solutionsdeveloped over the years.Database Scripts: Complexity with script rollout, updates and versionchanges.Database Version Complexity: Hard to keep up with changing datadictionary views in newer Oracle versions.Database Diagnostics: Growing performance and availability requirementsfor our databases and existing tools cannot keep up with them.New DB Features SupportppComplexity:py Supportingppg new DB featuresinvolved creating scripts, custom monitoring solutions and building tools.5 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Requirements We Set for EM6Provide Ease ofDatabaseManagementDatabaseTroubleshootingand PerformanceTuning Perform alldatabasemanagementduties usingthe tool Manage newdatabasefeatures withease One commontool forenterprise totroubleshootdatabaseperformanceissuesMonitor AllDatabases Provide amonitoringsolution thatis easy tomanage andwill scale wellto meet all ofourrequirements 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.Database BuildAutomation Ability toprovisionOracledatabasesoftware andautomate thedatabasebuild process

Approach: Road to EM 10.2.0.4POCDesign &DevelopmentProductionImplementation7 Grid stability Agent scalability EM monitoring capabilities Design EM solution Develop custom solutions for EM agent deployment,availability and additional metrics for monitoring EM with DR implementation Agent and monitoring deployment 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Proof of Concept FindingsEM POC results showed that EM 10g can meet ourrequirements, but custom work was still needed on thefollowing areas: Agent mass deployment Agent availability (automatic start/stop) User defined metrics to plug monitoring gaps Automate target configuration to the appropriate DBAteams8 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

POC Key DecisionsDeploy the latestEM versionavailable at thattime which was10.2.0.4. Workwith Oracle toidentify all thepatches requiredfor a stableenvironment9Automate agentgdeploymentusing cloningtechniqueSince agentavailability iscritical formonitoring,develop scripts to/pauto start/stopagent duringserver reboot,databasefailovers and torestart agentwhen they areddownffor otherthreasonsAdd additionalmonitoring usinguser definedmetrics to ourrequirements.Define anddeploymonitoringmetrics throughtemplates. 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.Use EM groupsfor managingtarget ownership.p customDevelopprocess usingEMCLI to managegroups based onour internaldatabaseinventory data

EM Implementation –Time Capsule Three major phases Proof of concept in 2007 Production deployment in 2008 Monitoring implementation in 200910 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Key EM Features UsedOracle SoftwareCloningEM GroupsMonitoringTemplatesNotification11 Deploy a standard and fully patched EM agentsoftware across all grid targets. Target ownership, pushing out monitoringtemplates, notification, dashboards, ease ofmanagement Target monitoring metrics and policies management Used in conjunction with groupsOS & SNMP notification methods to page/email alertsout to the appropriate recipients within the DBA team Email repeat notification feature UDMs and UDPs User Defined Metrics and User Defined Policies areused to meet monitoring needs related to databaseadministration performanceadministration,performance, backupsbackups, GoldenGate, compliance programs like SOX, PCIEMCLI Commands Extensive use of EMCLI commands for targetconfiguration, EM group and template management,agent management, password changes 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Agent Installation and Configuration- ImplementationAgent Install Copy the Agent Clone Software Run runInstaller command to clone the Agent HomeTarget Configuration Agentca –f command to discover target Emcli modify target command to set passwordSetup Monitoring PushP h appropriatei t metricst i usingimonitoringit ittemplatesl t bbasedd on ththe ttargett typet Emcli apply template command to push templatesConfigure EM Groups Add the newlyy added targetsgto appropriatepp pEM ggroupsp and EM roles Emcli modify group and emcli grant privs command to configure groups and roles12 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

EM Agents – Key for a SuccessfulImplementationEM Agents are set up to meet the following requirements:Performance Monitor agent operations(trace files, log files) from timeto time. Review metriccollection errors, metricsextending beyond intervalerrors. Cleanup of agent logfiles on periodic basis.Availability Auto start/stop scriptintegrated with VCS clustersoftware where applicable Tracking agent non-availabilitythrough EM repository viewsand starting agents ondemandStability Standardization of agentsoftware . Only 10.2.0.4and above versions withOracle recommendedpatches are deployed in ourenvironment13 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Monitoring Solution Through EM –Key ComponentsMetrics &Policies StandardMetrics User DefinedMetrics MetricThresholds Policies User DefinedPoliciesMonitor MonitoringTemplates Agents Targets(Database,Host,Listener)Notify OMS NotificationRule NotificationMethodsWe used standard metrics, UDM, custom metric thresholds, UDP, monitoringtemplates, notification rule, OS and SNMP notification method for ourmonitoring solution14 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Database Monitoring – Challengeswith Out of the Box MetricsIssues:Metrics for conditions that were not appropriate for some of our databasesMetrics that produced too many alertsMetrics that didn’t exist for conditions that are deemed as required for our environmentBugs with some metrics that are based on the database server generated alerts in 10gSolution:Disable the metrics where possible. If there is a dependency on other metrics, then nullify theth esholdsthresholdsAdjust thresholds, number of occurrences to reduce the quantity of alertsDevelop User Defined Metrics for missing monitoring conditionsWork with Oracle to resolve bugs. If not possible work around the issue with UDMs15 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Database Monitoring – Standard andUDM Metrics Usage16Out of the BoxMetricsUser DefinedMetricsUsed for database, listener and hosttargetsUDMs are only used for database targetsOnly required metrics are used afterthorough testing for reliabilityUDMs used for conditions that cannot bemett withith standardt d d metrics.t iMostM t UDMsUDM areagainst data dictionary viewsThresholds, number of occurrences used.Frequency never adjusted as per OraclerecommendationUDM thresholds, frequency are carefullydetermined to make sure we don’t impactagent and database target performanceMetrics used include: Availability,performance, alert log with special filters,space, RAC, data guardMetrics usage includes: Performance,custom lock monitoring, RMAN backups,scheduler jobs, table space and GoldenGate monitoring 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Database Monitoring - User DefinedMetrics – Key Usage RequirementsUser Defined Metrics is a powerful EM feature thatfacilitates adding additional metrics to meet monitoringrequirements.Key Points about UDM UDM can only return two columns (key and value) In a two-column UDM the first column is the key Change of key triggers a clear notification of theprevious key record and a new notification for thenew key record UDMs can only be of a particular type (number orstring) and the type is based on the value column UDMs requires login credentials to the database Can be pushed through templates17 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Compliance: Standard and UserDefined Policies ImplementationCompliance for security is becoming more and more important fordatabase administrators. EM provides standard policies for security andoptionpto create custom one usingg User Defined Policies.Enable Policies Related to Security We reviewed the available metrics and chose the ones that met our requirements UseU monitoringit ittemplatesl t tto enablebl andd disabledi bl policiesli iCreate User Defined Policies Built User Defined Policies to meet our internal security and SOX,PCI controlsrelated to databasesReports for Policy Violations Created custom reports for policy violations based on repository views tomeet the requirementsAbout User Defined Policies 10.2.0.4 allows UDPs to be created using EM packages. 10.2.0.5 provides aGUI screen in EM for UDP creation Follows a two step processprocess. Create UDM first and then associate the UDMmetadata to create the policy18 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Monitoring – Lessons Learned fromOur ImplementationPlan YPlYourMetricsMonitoringTemplatesUser DefinedUD fi dMetrics19Evaluate and use metrics that are applicableppandmeets your requirements Test your metrics and create baseline metricthresholds based on DB profile (batch, OLTP, mixed)and build monitoring templates based on them Use monitoring templates to push out metrics totargets If few metrics requires change, consider creating atemporary template Build monitoring templates for each target typeUse User Defined Metrics for cases where standardmetrics are not available Plan and develop your UDM carefully knowing all therestrictions with using them 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Monitoring – Lessons Learned fromOur otificationRepositoryRiViews20Remember to track and get notified for metriccollection errors. They happen for various reasons(password issues,issues collection running a long timetime, bug)and failing to rectify them could result in monitoringfailures Query the repository if required to identify thesemetric collection errors, if tracking them through EMscreens is not an option Monitoring and notification are independent in EM.You can monitor all, but notify only a few. Leveragethi ffeaturethisteffectivelyff ti l There are repository views that can help provide allthe metric information. This feature comes in handywhen there is a requirement to compare and validatemetrics across large number of targets 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

EM Notification ImplementationNotification/alerting requires carefully planning in theoverall database monitoring strategy. Notificationchallenges we faced and how we solved them:SolutionProblemAlertCustomizationLimitations withStandard EmailNotification 10.2.0.4 had very little customizationavailable for email notification method Our teams required more information aboutour databases from our inventory recordsand this required customization Requirement to send alerts to differentaddress based on warning and criticalthresholds Cannot utilize the default scheduleUsed EM OSNotificationmethodth d as thethprimaryalertingmechanism.1 EM OS Notification method calls a script in our OMS servers which in turn performs the alerting1.functionality. EM passes alert information as OS variables and we deliver the alert with formatting,additional information and to the appropriate recipients based on the target name2. In addition to OS Notification we also use SNMP traps, custom alerting from repository views to meetadditional alerting requirements3. EM 10.2.0.5 and above provides a notification customization feature for the email method4 EM 10.2.0.54.10 2 0 5 and above provides repeat notification capability for all methods21 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Lessons Learned from OurImplementationPlan YPlYourMetricsMonitoringTemplatesUser DefinedUD fi dMetrics22Evaluate and use metrics that are applicableppandmeet your requirements Test your metrics and create baseline metricthresholds based on DB profile (batch, OLTP, mixed)and build monitoring templates based on them Use monitoring templates to push out metrics totargets If few metrics requires change, consider creating atemporary template Build monitoring templates for each target typeUse User Defined Metrics for cases where standardmetrics are not available Plan and develop your UDM carefully knowing all therestrictions with using them 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Other Useful Customization: AlertLog FilteringWe came up with a custom alert log filter expression thatwill only alert for ORA errors that requires DBADBA’ssimmediate action.23 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Post Go Live: IssuesSome of the key issues we addressed after go live:OMS Performance Tuning:g Apache HTTP parameter tuning to handle moreconnections LoaderLd backlog:b klIncreaseIOC4J processes tot handleh dlconcurrent loader files to avoid backlogRepository Tuning: Increase job queue processes parameter to supportparallel EM task processing Increase redo log size to avoid frequent log switches g repvfyp y utilityy on regulargbasis and take actionsRunningto clean out stuck notifications24 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Post Go Live: IssuesSome of the key issues we addressed after go live:AgentgIssues: Agent crashing due to patch conflict with 10.2.0.4database version patch - resolved by applying the rightpatch combination Agent leaving orphan database connections – resolvedby a combination of patching and a housekeeping taskto bounce agent prior to hitting that condition Missing host performance information in HPUXplatform – resolved by patching Metric collection errors on standby databases – fixed in10.2.0.5 agent25 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Post Go Live: IssuesSome of the key issues we addressed after go live:Database Monitoringg Issues: EM Dictionary queries running longer – resolved bycollecting periodic dictionary statistics TablespaceT blmonitoringit iiinconsistenciesi ti – workaroundkd bybcreating UDMsOther Issues: Load balancer connectivity issue – resolved by LB setting26 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Supporting EM Infrastructure –Ongoing SupportSome of the key tasks we perform on a regular basis tosupport this infrastructure includes Daily health check: Make sure all targets are runningwithout any issues. Investigate collection errors,pending status state and review performance alertsrelated to OMS, OMR and agents Target discovery: We run into target discovery issuesffromtimetitot timetiwhichhi h requiresimanuall interventioni tti Agent upload problems due to connectivity issues Running repvfy and taking care of any issuesreported User privilege management by super administrator27 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Q&A28 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

Thank You29 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.

thresholds based on DB profile (batch, OLTP, mixed) and build monitoring templates based on them Plan Your Metrics Use monitoring templates to push out metrics to targets If few metrics requires change, consider creating a temporary template Monitoring Templates Build monitoring templates for each target type