Skip to main content

The Best of Service Operation, Part 3

The Value of Known Errors and Workarounds
Originally Published on December 7, 2010

The goal of Problem Management is to prevent problems and related incidents, eliminate recurring incidents and minimize the impact of incidents that cannot be prevented. Working with Incident Management and Change Management, Problem Management helps to ensure that service availability and quality are increased.

One of the responsibilities of Problem Management is to record and maintain information about problems and their related workarounds and resolutions. Over time, this information is continually used to expedite resolution times, identify permanent solutions and reduce the number of recurring incidents. The resulting benefits are greater availability and less disruption to critical business systems.

Although Incident and Problem Management are separate processes, they typically use the same or similar tools.    This allows for similar categorization and impact coding systems.  Each of these tools is an important element of the Configuration Management System (CMS).  One of the most powerful Problem Management tools is the Known Error Database (KEDB).  The KEDB enhances our ability to quickly diagnose incidents, apply the proper workaround to restore service and get the customer back to normal operations. The workaround –  a temporary way of overcoming the impact of a problem or a recurring incident -  can be applied numerous times until a permanent solution is available. 

Ideally, as soon as a solution is identified, it should be applied to resolve the outstanding problem or related incidents.   However, until the resolution is tested and assessed for any unforeseen additional impact, the known error record should be raised and remain open.  Of course if any functionality is changed, this will require an RFC to be raised through the Change Management process. 

Once these new solutions are approved they should be added as permanent records to the KEDB. These records should detail the faults and related symptoms, with precise details of any action that needs to be taken to restore the service or resolve the underlying problem. It is important that these records can be quickly and accurately retrieved and an agreed methodology should be used when recording this data. All Problem and Incident Management staff should be fully trained in the use of the KEDB so that they understand the value of the knowledge it contains and how that knowledge can be applied to the benefit of their customer and business.

Comments

Anonymous said…
While I agree with the set of behaviors described, as a Problem manager who works in the enterprise the terminology (eg KEDB) becomes more of an obstacle than a facilitator of value as described.

Few people in IT appear to get their mind around the KEDB being a repository of knowledge about idenfied causes of incidents and their workarounds. They tend to gravitate to calling in their Knowledge System (and omitting the need to track that knowledge through a lifecycle like a ticket), or keeping it as a bug tracking tool (and failing to fulfill the function of informing incident). The key breakdown I see in both scenarios is the immaturity around knowledge management and the lifecycle of knowledge. Problem Management is often placed in the rut of either ignoring the value of a KEDB (the gist I took away from the post), or overfunctioning in trying to build a system to reflect the problem management process without actually resolving problems. I do not have experience in other enterprises to know how implementation of ITIL worked elsewhere, but I find it humorous that the best example of ITIL framework behaviors around incident and problem management I have seen in my career was in a company that did not identify (or even potentially know) about ITIL. The practices they followed were very close, but the employees were fluidly apart of the process instead of overly self-conscious of it.

Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you.

Roles and Responsibilities:
Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance.Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables. An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and the policies (r…

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service".
I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize:
SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle.
ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the needs …

Incidents when a Defect is Involved

Question: We currently track defects in a separate system than our ticket management system. With that said, my question is does anyone have suggestions and/or best practices on how to handle incidents when a defect is involved? Should the incident be closed since the defect is being worked on in another defect tracking system if it is noted in the incident ticket? I am considering creating an incident statuses of 'closed-unresolved' so the incident can still be reported on in our ticket management system but know it is being worked on/tracked in the defect system. With defects, it is possible that we may never work on them because they are very low priority and the impact is low to the user. However, in some cases a defect is being worked on. Should we create a problem ticket instead?
Thanks, René W.

Answer: RenĂ©. In ITIL, the activity you are describing is handled by the Problem Management process. ITIL does not use the term “defect” but it does use the term “known error” to…