Skip to main content

Problem Management for Newbies (Part 2 of 2)

Problem Management for Newbies (Part 2 of 2)

In part one of “Problem Management for Newbies” we looked at reactive Problem management and how Problem Management can serve as a pillar of support to incident management.  Problem Management prevents, minimizes and eliminates future incidents and problems from occurring.  There will always be a need for reactive problem management.  IT support can never guarantee that there will not be outages and will always need clearly defined roles, skilled staff and governance for the resolution of incidents and problems when they occur.  Added value to the business is via proactive problem management! 

Proactive Problem Management

Proactive problem management will glean management information from the function of the service desk, and others across the organization.  By viewing and analyzing reports on frequency of incidents, types of incidents,  noting the times that incidents and problems occur and most importantly understanding the business impact, problem management teams can work to get to the root of the root and prevent further incidents and problems from ever even occurring.    The incident Management process has no control over how many incidents occur!  Incident Management can only give assurance on restoring service as quickly as possible when there is an incident.   Problem Management on the other hand can actually reduce the volume of incidents, and eliminate negative business impact that would have otherwise resulted.

Proactive Problem Management Techniques

In addition to analyzing trends, proactive problem management will utilize many techniques for root cause analysis.  Among these are such things as the “Ishikawa Diagrams” (aka: fishbone diagram), the 5 whys, Fault Tree Analysis, and other Total Quality Measurement (TQM)  methods such as brainstorming.   While reactive problem management works with incident to restore service fast, proactive problem management may take time to form focus groups, capture data, report and analyze data and ultimately submit a proposal or Request For Change (RFC) to resolve the problem and prevent future impact.  The skill set involved in proactive problem management while technical and analytical also requires strong management and facilitation experience.

Keys to Success

The key to success in Proactive Problem Management Process maturity is not only to have clearly defined roles and responsibilities, ownership, integration and handoff points but most importantly to define various problem models.   A problem model is a unique set of steps, defining all of the roles, responsibilities and procedures for a specific type of problem.  Not all types of problems are the same. 

Some examples of problems that problem management can resolve

Recurring Incidents – Example: ABC Company’s problem management team have analyzed trends and noticed that the increase in disk crashes compared to Q1 has increased 75%!  What could be the cause?  If we don’t know what cause it then “Houston we have a problem”!  Problem Management might work with vendors and discover that one of them did have a bad batch of disks.  After some research it is discovered that ABC Company has several hundred of the bad disks installed in their organization.   In this case problem management would submit and RFC and work with the vendor to proactively replace all disks that are at risk and proactively prevent future incidents and negative business impact.

Major Problems -  (you know!  The all hands on deck High Impact type)  - Example: A recent problem was identified and the cause of several hundred incidents was that a mirrored server did not fail over when required. Surprise!  This followed a recent change and impacted vital business processes for this company. When investigating the problem the original cause was documented as: “Wrong firmware on secondary router prevented the mirrored server from failing over as it should have”.  The firmware was updated and problem resolved?!  NO!   That is reactive problem management.   In the above example reactive problem management provided a temporary workaround to fix the mirrored servers by updating the firmware, but the real cause or root cause is  WHY did the secondary server have the wrong firmware in the first place?! 
After forming a focus group, with timeline of events and by using RCA techniques, it was determined that the testing was performed with only one router and there was no criteria in the RFC of that change to update the secondary switch.  The real “Problem” was in the Design and Transition process and procedures.  In addition to new procedures in the PMO and the Service Design lifecycle, two new processes were proposed for Service Transition.  The new processes were “Test and Validation” and “Change Evaluation”.  If they had not taken this root cause analysis further this company could have kept doing the same thing over and over and each time expected a different results only to experience chaos and business impact.

Getting to the root of the cause will prevent similar major outages that could have occurred after every major change.  As we saw in this last example, the root of the cause will generally go much wider and broader than a hardware or software break fix type of solution.   Preventing the outage from every happening again increases confidence of staff, the business and customers, prevents cost over runs and enables a service provider for success.  So… what’s the problem?


Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you. Roles and Responsibilities: Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance. Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables.  An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and th

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service". I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize: SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle. ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the

Four Service Characteristics

Recently I came across several articles by researchers and experts that laid out definitions and characteristics of services. ITIL provides us with a definition that can help drive the creation of value-laden services: A means of delivering value to customers by facilitating outcomes customers want to achieve without the ownership of specific costs and risks. An area that ITIL is not so clear is in terms of service characteristics. Several researchers and experts put forth that services have four basic characteristics (IHIP): ·          Intangibility—Services are the results of actions not things. They have no physical presence and represent a logical set of elements. One way to think of service is “work done for others.” ·          Heterogeneity—Also known as “variability”; services are unique items because of the mechanisms used to deliver services-that is people. Because the people element adds variability, the service is variable. This holds true especially for the v