Skip to main content

Problem Management for Newbies (Part 2 of 2)

Problem Management for Newbies (Part 2 of 2)

In part one of “Problem Management for Newbies” we looked at reactive Problem management and how Problem Management can serve as a pillar of support to incident management.  Problem Management prevents, minimizes and eliminates future incidents and problems from occurring.  There will always be a need for reactive problem management.  IT support can never guarantee that there will not be outages and will always need clearly defined roles, skilled staff and governance for the resolution of incidents and problems when they occur.  Added value to the business is via proactive problem management! 

Proactive Problem Management

Proactive problem management will glean management information from the function of the service desk, and others across the organization.  By viewing and analyzing reports on frequency of incidents, types of incidents,  noting the times that incidents and problems occur and most importantly understanding the business impact, problem management teams can work to get to the root of the root and prevent further incidents and problems from ever even occurring.    The incident Management process has no control over how many incidents occur!  Incident Management can only give assurance on restoring service as quickly as possible when there is an incident.   Problem Management on the other hand can actually reduce the volume of incidents, and eliminate negative business impact that would have otherwise resulted.

Proactive Problem Management Techniques

In addition to analyzing trends, proactive problem management will utilize many techniques for root cause analysis.  Among these are such things as the “Ishikawa Diagrams” (aka: fishbone diagram), the 5 whys, Fault Tree Analysis, and other Total Quality Measurement (TQM)  methods such as brainstorming.   While reactive problem management works with incident to restore service fast, proactive problem management may take time to form focus groups, capture data, report and analyze data and ultimately submit a proposal or Request For Change (RFC) to resolve the problem and prevent future impact.  The skill set involved in proactive problem management while technical and analytical also requires strong management and facilitation experience.

Keys to Success

The key to success in Proactive Problem Management Process maturity is not only to have clearly defined roles and responsibilities, ownership, integration and handoff points but most importantly to define various problem models.   A problem model is a unique set of steps, defining all of the roles, responsibilities and procedures for a specific type of problem.  Not all types of problems are the same. 

Some examples of problems that problem management can resolve

Recurring Incidents – Example: ABC Company’s problem management team have analyzed trends and noticed that the increase in disk crashes compared to Q1 has increased 75%!  What could be the cause?  If we don’t know what cause it then “Houston we have a problem”!  Problem Management might work with vendors and discover that one of them did have a bad batch of disks.  After some research it is discovered that ABC Company has several hundred of the bad disks installed in their organization.   In this case problem management would submit and RFC and work with the vendor to proactively replace all disks that are at risk and proactively prevent future incidents and negative business impact.

Major Problems -  (you know!  The all hands on deck High Impact type)  - Example: A recent problem was identified and the cause of several hundred incidents was that a mirrored server did not fail over when required. Surprise!  This followed a recent change and impacted vital business processes for this company. When investigating the problem the original cause was documented as: “Wrong firmware on secondary router prevented the mirrored server from failing over as it should have”.  The firmware was updated and problem resolved?!  NO!   That is reactive problem management.   In the above example reactive problem management provided a temporary workaround to fix the mirrored servers by updating the firmware, but the real cause or root cause is  WHY did the secondary server have the wrong firmware in the first place?! 
After forming a focus group, with timeline of events and by using RCA techniques, it was determined that the testing was performed with only one router and there was no criteria in the RFC of that change to update the secondary switch.  The real “Problem” was in the Design and Transition process and procedures.  In addition to new procedures in the PMO and the Service Design lifecycle, two new processes were proposed for Service Transition.  The new processes were “Test and Validation” and “Change Evaluation”.  If they had not taken this root cause analysis further this company could have kept doing the same thing over and over and each time expected a different results only to experience chaos and business impact.

Getting to the root of the cause will prevent similar major outages that could have occurred after every major change.  As we saw in this last example, the root of the cause will generally go much wider and broader than a hardware or software break fix type of solution.   Preventing the outage from every happening again increases confidence of staff, the business and customers, prevents cost over runs and enables a service provider for success.  So… what’s the problem?


Comments

Popular posts from this blog

The Four Ps of Service Design - It’s not all about Technology

People ask me why I think that many designs and projects often fail. The most common answer is from a lack of preparation and management. Many IT organizations just think about the technology (product) implementation and fail to understand the risks of not planning for the effective and efficient use of the four Ps: People, Process, Products (services, technology and tools) and Partners (suppliers, manufacturers and vendors). A holistic approach should be adopted for all Service Design aspects and areas to ensure consistency and integration within all activities and processes across the entire IT environment, providing end to end business-related functionality and quality. (SD 2.4.2) People:   Have to have proper skills and possess the necessary competencies in order to get involved in the provision of IT services. The right skills, the right knowledge, the right level of experience must be kept current and aligned to the business needs. Products:   These are the technology managem

What Is A Service Offering?

The ITIL4 Best Practice Guidance defines a “Service Offering” as a description of one or more services designed to address the needs of a target customer or group .   As a service provider, we can’t stop there!   We must know what the contracts of our service offering are and be able to put them into context as required by the customer.     Let’s explore the three elements that comprise a Service Offering. A “Service Offering” may include:     Goods, Access to Resources, and Service Actions Goods – When we think of “Goods” within a service offering these are the items where ownership is transferred to the consumer and the consumer takes responsibility for the future use of these goods.   Example of goods that are being provided in the offering – If this is a hotel service than toiletries or chocolates are yours to take with you.   You the consumer own these and they are yours to take with you.               Note: Goods may not always be provided for every Service

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you. Roles and Responsibilities: Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance. Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables.  An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and th