Getting Started with Problem Management
To understand the process of Problem Management one must
first understand that a problem is distinctively different than an
Incident. It is tracked and recorded
separately, it requires a very different skill set and has a different
objective than those that are required for “Incident Management”. Problem records are unique entities and are
reported upon separately. A repeatable
lean problem management process could very well be the glue that helps IT
Service providers integrate and automate much of the work and effort required
to “prevent” “Eliminate” and to “Minimize” the impact of incidents on your
business and end user customers.
While an incident is an unplanned interruption that creates
an impact to one or more business services, the problem is actually the cause
of one or more incidents. Example: “I can’t access the ERP system”, “The web
portal will not come up!” “I can’t log
in” are all examples of incidents. The
cause or the “problem” might be that the router on the network is down. Therefore, you could have many incidents
related to a single problem record. It
is important to note that an incident never becomes a problem. The Service Desk Agent owns the incident and
the problem management team or technician will work the problem to identify the
root cause of these incidents and most importantly provide a solution to the
Service Desk so that the incidents can be resolved. The root cause and solution for the incident
is logged in the Known Error Database (KEDB) that is owned by the Problem
Management and shared with the service desk and other IT support staff.
Reactive Problem management
Reactive Problem Management is a process that is primarily
performed in support of Incident Management to ensure that the service desk has
the solution to resolve the incident.
Problem Management will identify root cause of the incident. in cases where resolving the cause might
exceed the service level for the Mean Time To Restore Service (MTRS), Problem
Management will attempt to provide a temporary solution to the service desk. This enables the service desk agent to
restore service as quickly as possible, meet the agreed targets and reduce the
impact of the outage. This temporary
solution is referred to as a “Work Around”. A very
classic example might be something such as reboot the system. The system is rebooted, the user is happy
and running, Yay! We met our SLA! The resolution for a permanent solution might
require an RFC. This RFC would be
submitted by problem management and might take two days, two weeks, or two months
depending on the scope and complexity of this change to permanently fix the
problem. Although the “Incident” is
closed in this example, the “Problem Record” will remain open until the
resolution to the problem is completed.
If we the service provider follow industry best practice and
track and record the incident record separately from the problem record our
reports would show that we met the SLA! Even
though the problem could take days, or weeks to resolve, the problem record
stays open until the permanent solution is implemented and recorded in the
Known Error Database that is owned by the Problem Management Process. We internally can generate management information
on the “Problems” to determine the effort that was needed to resolve the
problem, the cost to the business to resolve the problem, and also enable the
service provider to demonstrate the value of IT to the business while building
confidence of users and customers. It
is a beautiful thing! If you like the
value that this reactive side of “Problem Management” can bring to your business
stay tuned for Part Two of “Problem Management for Newbies where the focus is
on “Proactive Problem Management”!
Comments