ITIL V3 Problem Management Process - ITSM

Transcription

Problem ManagementITIL v3 Problem Management Process.root cause analysis

Problem ManagementContent Key definitions Challenges Purpose and Objectives Risks Opening problem – scenarios Critical success factors (CSF) Scope Key Performance Indicators(KPIs) Value to business Problem models Reactive and proactive problemmanagement Process Workflow Triggers Process Interfaces Information Management Roles and Responsibilities

Problem ManagementKey definitionsThe unknown cause of one or more incidents.ProblemA record containing the details of a lifecycle of a single Problem.Problem RecordKnown ErrorKnown ErrorDatabase (KEDB)ProblemManagementA Problem that has a documented Root Cause and aWorkaround.Known Errors are created and managed throughout theirLifecycle by Problem Management.A database containing all known Error Records. It is created byProblem Management. KEDB is a part of the Service KnowledgeManagement System (SKMS)The process responsible for managing the lifecycle of allproblems.Includes activities required to diagnose the root cause ofincidents, deter i e the resolutio to those pro le s a d it simplemented through the appropriate control procedures,especially Change and Release Management

Problem ManagementProblem Management is forensics!

Problem ManagementPurpose and objectivesPurpose identifies once and for all the root causes ofproblems. helps minimise the effects as well as preventingpotential problems occurring in the future thereby attempting to minimise incidents and theircauses Objectives prevent problems and resulting incidents fromhappeningeliminate recurring incidentsminimize the impact of incidents that cannot beprevented.Also maintain information about problems, workarounds andresolutions - strong interface with Knowledge Management

Problem ManagementOpening a Problem- scenarios.IncidentsCriteriaProblemsCritical / High PriorityIncidents withExtensive Impact and potentialTo re-occurCritical ProblemIndependent IncidentsWith significantImpact and possiblesame root causeHigh PriorityProblemMultiple moderateincidents possibleSame cause or multipleCauses.Medium PrioritySeparated ProblemsData analysis of incidentRecords to determineProcess, system, orNetwork issuesTrend analysis to determineissues?DetermineProblem ManagementInclusion.?DetermineProblem ManagementInclusion.While the incident record is open or after it has been resolved it is reviewed for problemmanagement inclusion. If the incident (or group of related incidents) meet the criteria, aproblem record is created to identify cause, solution, and control to prevent re-occurrence.

Problem ManagementScope Diagnosing the root cause of incidents anddetermining the resolution to problems. Ensuring that the resolution is implemented throughthe appropriate control procedures, especiallyChange Management and Release Management.Scope Maintaining information about problems and theappropriate workarounds and resolutions,. strong interface with Knowledge Management, and toolssuch as the Known Error Database will be used for both. Close integration with Incident Management(. same tools, and may use similar categorization, impactand priority coding systems. This will ensure effectivecommunication when dealing with related incidents andproblems.)

Problem ManagementValue to businessProblem Management with Incident and ChangeManagement Value to business increase of IT service availability and quality less downtime and less disruption to business criticalsystems Known Error information reduce incident resolutiontimesAdditional value. Higher availability of IT Services Higher productivity of business and IT staff Reduced expenditure on workarounds or fixesthat do not work Reduction in cost of effort in fire-fighting( or resolving repeating incidents.)

Problem ManagementProblem modelsStandard IncidentModelsPre-defined steps of handling problems in anagreed way. Problem Models may deal with the problem andany associated recurring incidents Creating a Problem model is an additional stepbuilding up from only creating a Known ErrorRecord in the Known Error Database.Problem Model should include the following: steps required to handle the incident and theirorder Responsibilities Ti es ales a d thresholds for completion A y es alatio pro edure A y evide e preve tio a tivities

Problem ManagementReactive and Proactive Problem ManagementReactive PMProactive PMVs.Identifying problems, and finding animmediate workaround to allow thesmooth continuation of business untillthe permanent resolution isimplemented by Change Management To resolve problems quickly,effectively and permanentlyGoal:forward-looking approachongoing and methodical processusing analysis of problem trends andstatisticsPreventive actionsGoal: To prevent/minimise issues occurring,Prioritising in pain factor orderPrioritise resources for the Problem according to the seriousness of the impact on the businessChange Management10

Problem ManagementProcess flow11

Problem ManagementProcess flow – Problem detection and LoggingDetection & Logging: Once a problem is detected all relevant details must be logged : Problem unique IDDate/time stampReference to related Incidents, Known Errors, Cis . Reference to RFCs* Frequent and regular analysis of incident and problemrecords must be performed to indentify trends12

Problem ManagementProcess flow – Categorization and Prioritization Categorisation: used mainly to determine an appropriate allocation of resources (often anextension of the Incident categorisation) Prioritization: Priority also consists of impact and urgency factors Priority also takes into consideration severity: Can the system be recovered or needs to be replaced ? How much will it cost ? How much effort is needed to fix the problem? How long will it take? How many impacted Cis ?13

Problem ManagementProcess flow – Investigation & DiagosisInvestigation & Diagnosis:Finding Root Cause of the problemCMS will be used to dermine the levelof impact and incidents related* These two stages are complex, andrequire a good technical knowledge,supported by problem-solving anddiagnostic skills.14

Problem ManagementProcess flow – Workaround, Known ErrorApplying Workarounds: Workaround is a temporary resolution to an incidentRaising a Known Error Record: If there is a workaround: Work on finding a permanent solution shouldcontinue (where justified) The problem record should be open untilpermanent solution is found or if finding onehas not been justified Documented in the Known Error Database* Known Error Record will be raised as soon as aworkaround or a permanent fix has been found(possibly even sooner)15

Problem ManagementProcess flow – Problem ResolutionProblem Resolution: Permanent resolution should be applied as soon aspossible (ideally.) In many cases in order to apply the resolution RFCwill be raised and transferred to Change Managementfor approval* Until the permanent solution is in place KnownError Database will help in resolving any newoccurrences16

Problem ManagementProcess flow - ClosureProblem Closure: Problem Record can be closed when:1. The resolution has been applied2. Quality of information in the Problem Record hasbeen checked3. The status of any associated Kes has been updated4. All related incidents have been closed* User satisfaction must be checked17

Problem ManagementProcess flow – Major Problem ReviewMajor Problem Review: Major Problem Review will examine: Things that were done correctlyThings done wrongThings that could be done better in thefutureHow to prevent recurrenceIf follow up action is required after thirdparty involvementKnowledge learned will be shared with customersat Service Review Meetings* Follow up actions are the normal part of Continual ServiceImprovement18

Problem ManagementTriggersProblem Records can be triggered.Triggers .in reaction to one or more incidents, andmany will be raised or initiated via Service Deskstaff. .in testing, particularly the latter stages oftesting such as User Acceptance Testing/Trials(UAT), if a decision is made to go ahead with arelease even though some faults are known. . by Suppliers ; through the notification ofpotential faults or known deficiencies in theirproducts or services

Problem ManagementProcess InterfacesIncident ManagementChange Management Incidents (repeated) often point to problemsSolving the problems should reduce the number ofincidents PM ensures that all resolutions or workarounds that requirea change to a CI are submitted through ChangeManagement through an RFC.Change Management will monitor the progress changesand keep PM advised.PM is involved in rectifying the situation caused by failedchanges. Asset &Configuration Mgmt Release andDeployment Mgmt PM uses the CMS to identify faulty CIs and also todetermine the impact of problems and resolutions.The CMS can also be used to form the basis for the KEDBand hold or integrate with the Problem Records.RADM Is responsible for rolling problem fixes out into thelive environment.RADM assists in ensuring that known errors are transferredfrom the development KEDB into the live KEDBPM will assist in resolving problems caused by faults duringthe release process.

Problem ManagementProcess Interfaces – cont.AvailabilityManagementCapacityManagement AM has a close relationship with Problem Management,especially the proactive areas. Some problems will require investigation by Capacity Mgmtteams and techniques, e.g. performance issues.Capacity Mgmt will assist in assessing proactive measures. where a significant problem is not resolved before it startsto have a major impact on the business, PM acts as an entrypoint into ITSCM Problem Management contributes to improvements inservice levels,SLM also provides parameters within which ProblemManagement works,IT Service ContinuityManagementService LevelManagementFinancialManagement for ITServices FM Assists in assessing the impact of proposed resolutionsor workarounds, as well as Pain Value Analysis.PM provides management information about the cost ofresolving and preventing problems

Problem ManagementInvolvement in Information ManagementCMSProblem Record Reference Number CI Impacted Dates and Times Originator Symptoms Category, Priority Actions Taken Relationships Closure detailsCMDBDiagnosticScriptKEDB

Problem ManagementChallenges & RisksChallenges & RisksA major dependency for Problem Management isthe establishment of an effective IncidentManagement process and tools.This implies the following: Linking Incident and Problem Management tools Ability to relate Incident and Problem Records Good working relationship between the differentlevels of support All staff working on problem resolution fullyunderstanding business impact Ability to use all Knowledge and ConfigurationManagement resources available. Ongoing training of technical staff

Problem ManagementCritical Success Factors (CSF) &Key performance Indicators (KPI)CSF & KPIExamples CSF Minimize the impact to the business of incidents that cannot beprevented KPI The number of known errors added to the KEDB KPI The percentage accuracy of the KEDB (from audits of the database) KPI Average incident resolution time for those incidents linked to problemrecords CSF Maintain quality of IT services through elimination of recurringincidents KPI Total numbers of problems (as a control measure) KPI Size of current problem backlog for each IT service KPI Number of repeat incidents for each IT service CSF Provide overall quality and professionalism of problem handlingactivities to maintain business confidence in IT capabilities KPI The number of major problems (opened and closed and backlog) KPI The percentage of major problem reviews successfully performed KPI The percentage of major problem reviews completed successfullyand on time KPI Number and percentage of problems incorrectly assigned KPI Number and percentage of problems incorrectly categorized KPI The backlog of outstanding problems and the trend (static, reducingor increasing?) KPI Number and percentage of problems that exceeded their targetresolution times KPI Percentage of problems resolved within SLA targets (and thepercentage that are not!) KPI Average cost per problem.

Problem ManagementRolesPM Process Owner - accountable for the processProblem ManagementProcess Owner Liaison with all problem resolution groups to resolve problemswithin SLA targets Ownership and protection of the KEDB Gatekeeper for the inclusion of all Known Errors andmanagement of search algorithmsProblem Manager Formal closure of all Problem Records Liaison with third parties to ensure they fulfil their contractualobligations, especially with regard to resolving problems andproviding problem-related information and data Arranging, running, documenting and all follow-up activitiesrelating to Major Problem Reviews.Problem solving groupsTechnical support groups and/or suppliers or support contractors,working on problems- under the coordination of the Problem Manager.

Problem ManagementTHE ENDITIL v3 Problem Management Process.root cause analysis

ITIL v3 Problem Management Process.root cause analysis. Problem Management Content Key definitions Purpose and Objectives Opening problem –scenarios Scope Value to business Problem models Reactive and proactive problem management Process Workflow Triggers Process Interfaces Information Management Challenges Risks Critical success .