The Best of Service Operation, Part 3
The Value of Known Errors and Workarounds
Originally Published on December 7, 2010
The goal of Problem Management is to prevent problems and related incidents, eliminate recurring incidents and minimize the impact of incidents that cannot be prevented. Working with Incident Management and Change Management, Problem Management helps to ensure that service availability and quality are increased.
One of the responsibilities of Problem Management is to record and maintain information about problems and their related workarounds and resolutions. Over time, this information is continually used to expedite resolution times, identify permanent solutions and reduce the number of recurring incidents. The resulting benefits are greater availability and less disruption to critical business systems.
Although Incident and Problem Management are separate processes, they typically use the same or similar tools. This allows for similar categorization and impact coding systems. Each of these tools is an important element of the Configuration Management System (CMS). One of the most powerful Problem Management tools is the Known Error Database (KEDB). The KEDB enhances our ability to quickly diagnose incidents, apply the proper workaround to restore service and get the customer back to normal operations. The workaround – a temporary way of overcoming the impact of a problem or a recurring incident - can be applied numerous times until a permanent solution is available.
Ideally, as soon as a solution is identified, it should be applied to resolve the outstanding problem or related incidents. However, until the resolution is tested and assessed for any unforeseen additional impact, the known error record should be raised and remain open. Of course if any functionality is changed, this will require an RFC to be raised through the Change Management process.
Once these new solutions are approved they should be added as permanent records to the KEDB. These records should detail the faults and related symptoms, with precise details of any action that needs to be taken to restore the service or resolve the underlying problem. It is important that these records can be quickly and accurately retrieved and an agreed methodology should be used when recording this data. All Problem and Incident Management staff should be fully trained in the use of the KEDB so that they understand the value of the knowledge it contains and how that knowledge can be applied to the benefit of their customer and business.