Skip to main content

Posts

Showing posts with the label incident management

ITIL 4 Guiding Principles – Collaborate and Promote Visibility

Communication has always been a key principle for service providers and this ITIL 4 Guiding Principle “Collaborate and Promote visibility” takes us to new heights. Encouraging staff and giving stakeholders the opportunity to develop this skill, will amalgamate teams in ways we never thought possible.  This guiding principle also represents the influence of Agile, DevOps and LEAN on ITSM and best practices. A pillar of Agile is to be “transparent” and LEAN encourages making work visible in order to remove waste and increase flow. Both collaboration and being transparent are a key focus of DevOps integrated teams in order to ensure a continuous delivery pipeline. To understand this further let’s look at the two elements of this ITIL4 Guiding Principles. Collaborate  When we communicate, we are notifying or telling something to a person or a group. Collaboration is quite different and occurs when a group of people work together. The key word here is “together”. The

How ITIL 4 and SRE align with DevOps

In the early days of DevOps, there was a lot of debate about the ongoing relevancy of ITIL and IT service management (ITSM) in a faster-paced agile and DevOps world. Thankfully, that debate is coming to an end. ITSM processes are still essential, but, like all aspects of IT, they too must transform. Recent updates to ITIL  (ITIL 4), as well as increased interest in site reliability engineering (SRE), are providing new insights into how to manage services in a digital world. Here's a look at ITIL 4 and SRE and how each underpins the "Three Ways of DevOps," as defined in The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford.‎ What is ITIL 4?  ITIL 4 is the next evolution of the well-known service management framework from Axelos. It introduces a new Service Value System (SVS) that's supported by the guiding principles from the ITIL Practitioner Guidance publication. The framework eases into its alignment with DevOps and agile through a bi-mo

Service Continuity vs. Incident Management

According to ITIL 4 best practice, Service Continuity focuses on events that would impede business operations so drastically that it would be considered a disaster. Other events that have a less significant impact to the business might be considered as an incident to be managed through the Incident Management Practice or the Major Incident Management Practice. This means that there are different levels of importance and that the distinction between what is a normal incident, major incident or one that might require disaster recovery must be predefined and agreed upon. Documentation then must include clear thresholds and triggers to provoke the appropriate response and recovery into action without delay and additional risk.  There is no question that your organization is increasingly dependent on services that are tech-enabled. The need for resilient solutions are critical to success. A combination of business planning as well as being proactive with security, incident and proble

Incident vs. Problem

You may have seen a similar blog from the Professor a few years back that talked about the distinction between the idea of an incident vs problem.  Everything from that article is still relevant.  As process and methods for development and deployment have matured so has the usage of Incident and Problem Management. This is one of the most often confused points in for Agile, LEAN and ITIL adaptations. The ITIL definition is the same. Incident: Any unplanned event that causes, or may cause, a disruption or interruption to service delivery or quality Problem: The cause of one or more incidents, events, alerts or situation­­­­­­­ Where and how we apply Incident and Problem Management is evolving. A decade ago, and still in some organizations, Incident and Problem Management are processes exclusive to Service Operation.   ITIL is so very relevant and today we find, with the onset of DevOps and cultural shifts, many organizations are adopting little or zero tolerance

Service Operation and the Service Lifecycle – Yesterday and Today

ITSM Best Practice will align five main process with the lifecycle of “Service Operation”. Incident Management Problem Management Event Management Request Fulfillment Access Management  It was not too long ago that the idea of some of these processes were new to service providers. Most will find them to be common in today’s market place.  An organization may not literally follow the best practices for the service operation processes but most likely have some close facsimile when executing Incident, Problem, Request Fulfilment, and Event management processes for provisioning IT services and support.  In order to ensure identity management and authorization for access, some form of “Access Management” will also be needed to support an overall security policy in Service Operation.  I would like to focus on some thoughts for “Event Management” and early engagement of operational staff in the service lifecycle. As organizations mature they begin to realize the value

Visible Ops

Anyone who has worked in Information Technology knows that today, there is and always will be improvement opportunities available to our organizations.  This is especially in light of the pace of change that is taking place in all market spaces and the level of customer expectations that accompanies that change. If you have worked in IT for a number of years, you may remember when change was not welcomed. Well the good old days weren’t always that good and tomorrow ain’t as bad as it seems (Billy Joel).  The challenge is in getting started. If……. ·        the processes that are currently being engaged are not as efficient and effective as you would like ·        you are finding that your environment isn’t as stable and reliable as it should be ·        that when you make changes to your environment it generally results in an outage and prolonged and repeatable firefighting then ……. I recommend that you read The Visible Ops Handbook by Gene Kim, Kevin Behr and Geor

First Call Resolution (FCR) According to ITIL and General Best Practice

A reader recently asked me to comment on what a First Call Resolution (FCR) is according to ITIL and general best practice.   When collecting metrics you want to be sure that the reporting brings good business value. From a reporting perspective it might serve well to report incidents and requests separately.      Each organization will have to have policies for how the metrics are reported based on business value.  One option is to have a policy that will report on “Service Requests” separate from “Incidents”.  If we do not separate the logging and reporting for these very distinct processes the combined metrics and reporting might not be something that is meaningful or that could be acted upon correctly.  You could end up with a very high FCR rate but your Mean Time To Restore Service metric could be breaching the SLA.  Therefore, the question is not whether the call was resolved at first line, but rather was it a FCR for an Incident or Request/Standard Service?  Report upon them

The Value of Incident Models

An “incident” is defined as an unplanned interruption to an IT service, the reduction in the quality of an IT service or the failure of a CI that has not yet impacted an IT service.  The purpose of incident management is to restore a service to normal operations as quickly as possible by minimizing the impact of incidents on IT services.  Incident Management is the process responsible for managing the lifecycle of all incidents by ensuring, that standardized methods and procedures are utilized to record, respond and report on all incidents.  Additionally this process should increase the visibility and communication of incidents to the business and the IT support staff and thereby allowing greater alignment of incident management to the overall IT and business strategies.  In a normal IT environment the IT organization may be dealing with a large number of incidents and many of these are repeatable, something that has happened before and very well may happen again in the future.  I

Incidents and Problems

  An incident is an unplanned interruption to an IT service or reduction in the quality of an IT service and is strictly a reactive process. A problem on the other hand represents a different perspective of an incident by diagnosing its underlying root cause, which might also be the cause of multiple other incidents. Incidents however do not always grow up to become problems.  While Incident Management activities focus on restoring services to normal operations as quickly as possible, Problem Management activities determine the root cause, find the most effective and efficient permanent resolution and ultimately prevent the incident from happening again.    Problem Management can be both reactive and proactive. Proactive Problem Management identifies weaknesses in the environment before actual incidents occur.  These can then be exploited as improvement opportunities.   Reactive Problem Management addresses problems that were identified from one or more incidents.      The pol

Problem, Incident and Change Management Integration

“ Problem Management  seeks to minimize the adverse impact of incidents and problems on the business that are caused by underlying errors within the IT infrastructure and to proactively prevent the recurrence of incidents related to those errors.   In order to achieve this,  Problem Management  seeks to get to the root cause of incidents, document and communicate known errors and to initiate actions to improve or correct the situation”.    Given that statement is directly from the ITIL Best Management Practices text, it’s a wonder more organizations don’t have well integrated Problem, Incident and Change processes in their organizations. I never want to say that there is a single silver bullet solution for a given problem and I’m not suggesting that here.  However having a solid CMS (Configuration Management System) is a good step in the right direction.   Of course before we even think of tools we must have rules.  Thinking holistically we can create an integrated set of best p

The Best of Service Operation, Part 3

The Value of Known Errors and Workarounds Originally Published on December 7, 2010 The goal of Problem Management is to prevent problems and related incidents, eliminate recurring incidents and minimize the impact of incidents that cannot be prevented. Working with Incident Management and Change Management, Problem Management helps to ensure that service availability and quality are increased. One of the responsibilities of Problem Management is to record and maintain information about problems and their related workarounds and resolutions. Over time, this information is continually used to expedite resolution times, identify permanent solutions and reduce the number of recurring incidents. The resulting benefits are greater availability and less disruption to critical business systems. Although Incident and Problem Management are separate processes, they typically use the same or similar tools.    This allows for similar categorization and impact coding systems.  Each of thes

The Best of Service Operation, Part 2

Tool Selection Criteria Originally Published on February 1, 2011 Service management technology plays a major role in our support of the business. There are enterprise wide tools that support service management systems and processes. There are also tools which support the specific lifecycle phases. You should define your process before selecting a tool. Countless organizations have purchased a tool prematurely, only to find that it does not match the workflow of their newly reengineered process. Defining one or more processes first will help to narrow down the requirements and selection criteria and make it easier for the supplier to demonstrate how their product can complement your new process. Match tools to the process, not the other way around. Wading through all the options, vendors, suppliers can often be a daunting task. Let’s discuss a technique for evaluating tools and finding the product which will support our goals and objectives. What Requirements? Meet with th

The Best of Service Operation, Part 1

We continue our "Best of " blog series by moving into Service Operation.   Cost per Incident Originally Published on August 17, 2010 A reader, upon downloading our ITIL ROI calculator , recently asked the following great question, “how do you determine cost per incident?” Cost per incident is a variation of cost per call or cost per contact, all of which are excellent ways to understand the impact of incidents, calls, or contacts on the business. The calculation is fairly straightforward. Cost per incident is the total cost of operating your support organization divided by the total number of incidents for a given period (typically a month). Cost per incident = total costs/total incidents To accurately calculate cost per incident you must: Log all incidents. You may also find it beneficial to distinguish between incidents (unplanned events) and service requests (planned events). Such a distinction will enable you to more accurately reflect business impact

Examples of Major Incident Criteria

The Professor was recently asked for real life examples or best practices for the criteria that organizations have used to define major incidents. ITIL defines a major incident as an incident that results in significant disruption to the business and so real world examples are going to vary from one business to the next. For a financial services company, for example, a major incident could be an incident affecting live money transactions. For a retail company, a major incident could be an incident affecting its point of sale service. For a manufacturing company, a major incident could be an incident that affects the production line. Simply put… real dollars are being lost. A major incident could also be a service outage that affects are large number of users. Those users could be your company’s external customers, or it could be your internal employees. So for many organizations, outages affecting the company’s web site, or its email or customer relationship management (CRM) service

Dealing with Major Incidents

A close friend of mine has a saying that I always remember “All roads lead through incident management”. We know that the primary goal of the incident management process is to restore normal service operations as quickly as possible and to minimize any adverse impact on business operations. This will insure the highest levels of service quality and availability are delivered to the user community, guaranteeing that the business is receiving value and facilitating the outcomes it wants to achieve. The value this process produces for the business is in the ability to: detect and resolve incidents quickly, resulting in higher availability of IT services. align IT activities to real time business priorities and dynamically allocate resources as necessary. identify potential improvements to services, through the analysis of incident trends. So it sounds like we have everything covered as long as we handle all incidents in the same consistent and proceduralized manner. Well not so fast

Achieving ITSM Balance

In speaking with colleagues and practitioners, I have found that one of the greatest difficulties for companies to overcome in a Service Management implementation is the desire to be more complex and unbalanced than is absolutely necessary. One of the most basic and underlying elements of good Service Management is the achievement of balance in how we approach the delivery of value to the customers and users through services. Balance helps us to find an equitable point that brings value to the customers and users without throwing out the efforts and actions needed to keep IT going. When I speak of balance, I am referring to finding the middle ground between extremes. These include balances like the amount of time and effort spent between Incident Management and Problem Management; or perhaps the balance between flexibility and stability; or even the challenges of being proactive versus reactive; customer/service-centric versus technology-centric. There are a multitude of these types

Incidents when a Defect is Involved

Question: We currently track defects in a separate system than our ticket management system. With that said, my question is does anyone have suggestions and/or best practices on how to handle incidents when a defect is involved? Should the incident be closed since the defect is being worked on in another defect tracking system if it is noted in the incident ticket? I am considering creating an incident statuses of 'closed-unresolved' so the incident can still be reported on in our ticket management system but know it is being worked on/tracked in the defect system. With defects, it is possible that we may never work on them because they are very low priority and the impact is low to the user. However, in some cases a defect is being worked on. Should we create a problem ticket instead? Thanks, René W. Answer: René. In ITIL, the activity you are describing is handled by the Problem Management process. ITIL does not use the term “defect” but it does use the term “known er

Defining Categories

I often hear from organizations that they are not reaping the expected benefits from their Incident Management Systems or integrated Service Management suites. One of the biggest reasons is that they are struggling to determine how to categorize incidents, problems, service requests, changes, and so forth. Coming up with the right categories for your organization is easier said than done. If you’ve had to do it multiple times, you’re not alone. Having said that, it is important to persist. Categories drive many process activities such as: Incident matching Second- and third-level escalations Workflow management Self-service decision tree logic Priority definition Knowledge base searches Trend and root cause analysis Metrics production SLA reporting Miscategorized records cause inefficiencies, ineffective reporting and can even damage the relationships between lines of support. For example, are your second-line support teams regularly asking “why was this record assigned to me?” If so,