Skip to main content

Problem Management for Newbies! Part 1 of 2


Getting Started with Problem Management
To understand the process of Problem Management one must first understand that a problem is distinctively different than an Incident.  It is tracked and recorded separately, it requires a very different skill set and has a different objective than those that are required for “Incident Management”.  Problem records are unique entities and are reported upon separately.  A repeatable lean problem management process could very well be the glue that helps IT Service providers integrate and automate much of the work and effort required to “prevent” “Eliminate” and to “Minimize” the impact of incidents on your business and end user customers.
While an incident is an unplanned interruption that creates an impact to one or more business services, the problem is actually the cause of one or more incidents.   Example:  “I can’t access the ERP system”, “The web portal will not come up!”   “I can’t log in” are all examples of incidents.   The cause or the “problem” might be that the router on the network is down.  Therefore, you could have many incidents related to a single problem record.  It is important to note that an incident never becomes a problem.  The Service Desk Agent owns the incident and the problem management team or technician will work the problem to identify the root cause of these incidents and most importantly provide a solution to the Service Desk so that the incidents can be resolved.  The root cause and solution for the incident is logged in the Known Error Database (KEDB) that is owned by the Problem Management and shared with the service desk and other IT support staff. 
Reactive Problem management
Reactive Problem Management is a process that is primarily performed in support of Incident Management to ensure that the service desk has the solution to resolve the incident.  Problem Management will identify root cause of the incident.  in cases where resolving the cause might exceed the service level for the Mean Time To Restore Service (MTRS), Problem Management will attempt to provide a temporary solution to the service desk.  This enables the service desk agent to restore service as quickly as possible, meet the agreed targets and reduce the impact of the outage.  This temporary solution is referred to as a “Work Around”.   A very classic example might be something such as reboot the system.   The system is rebooted, the user is happy and running, Yay!  We met our SLA!  The resolution for a permanent solution might require an RFC.  This RFC would be submitted by problem management and might take two days, two weeks, or two months depending on the scope and complexity of this change to permanently fix the problem.  Although the “Incident” is closed in this example, the “Problem Record” will remain open until the resolution to the problem is completed.
If we the service provider follow industry best practice and track and record the incident record separately from the problem record our reports would show that we met the SLA!  Even though the problem could take days, or weeks to resolve, the problem record stays open until the permanent solution is implemented and recorded in the Known Error Database that is owned by the Problem Management Process.   We internally can generate management information on the “Problems” to determine the effort that was needed to resolve the problem, the cost to the business to resolve the problem, and also enable the service provider to demonstrate the value of IT to the business while building confidence of users and customers.   It is a beautiful thing!  If you like the value that this reactive side of “Problem Management” can bring to your business stay tuned for Part Two of “Problem Management for Newbies where the focus is on “Proactive Problem Management”!

 

Comments

Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you.

Roles and Responsibilities:
Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance.Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables. An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and the policies (r…

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service".
I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize:
SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle.
ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the needs …

Incidents when a Defect is Involved

Question: We currently track defects in a separate system than our ticket management system. With that said, my question is does anyone have suggestions and/or best practices on how to handle incidents when a defect is involved? Should the incident be closed since the defect is being worked on in another defect tracking system if it is noted in the incident ticket? I am considering creating an incident statuses of 'closed-unresolved' so the incident can still be reported on in our ticket management system but know it is being worked on/tracked in the defect system. With defects, it is possible that we may never work on them because they are very low priority and the impact is low to the user. However, in some cases a defect is being worked on. Should we create a problem ticket instead?
Thanks, René W.

Answer: RenĂ©. In ITIL, the activity you are describing is handled by the Problem Management process. ITIL does not use the term “defect” but it does use the term “known error” to…