Skip to main content

How ITIL4 and SRE align with DevOps

Author: Jayne Groll, DevOps Institute - originally posted on Tech Beacon.

In the early days of DevOps, there was a lot of debate about the ongoing relevancy of ITIL and IT service management (ITSM) in a faster-paced agile and DevOps world. Thankfully, that debate is coming to an end.

ITSM processes are still essential, but, like all aspects of IT, they too must transform. Recent updates to ITIL (ITIL 4), as well as increased interest in site reliability engineering (SRE), are providing new insights into how to manage services in a digital world.

Here's a look at ITIL 4 and SRE and how each underpins the "Three Ways of DevOps," as defined in The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford.‎

What is ITIL 4? ITIL 4 is the next evolution of the well-known service management framework from Axelos. It introduces a new Service Value System (SVS) that's supported by the guiding principles from the ITIL Practitioner Guidance publication. The framework eases into its alignment with DevOps and agile through a bi-modal approach that retains many of the activities from previous versions but acknowledges DevOps practices such as value streams and continuous delivery.

Site Reliability Engineering (SRE) is Google's approach to service management, introduced in a book of the same name. It is a post-production set of practices for operating large systems at scale, with an engineering focus on operations.
ITSM Academy will be offering DevOps Institute's SRE Foundation in 2020.  More on that coming soon.

SRE is "what happens when you ask a software engineer to design an operations function." It is both a role and set of practices that have attracted the interest of large enterprises as an adjunct to agile teams and DevOps automation practice

The three ways of DevOps and ITSM Automation is essential to improving flow and service quality. Previously, ITSM automation was used primarily for record keeping and monitoring. In the digital age, most ITIL 4 processes will be underpinned by tools, particularly during transition and operation processes as part of continuous testing and delivery.

Automation is inherent in SRE because it is an engineering practice for operational service management. SREs can code, and therefore will make pervasive use of automation to manage reliability and reduce manual work known, in Google-speak, as toil.

In addition to automation, these other steps are crucial.

1. Increase flow and reliability through change management
The crossroads of agile, DevOps, and ITSM forms the cornerstone of change management. Simplify current change management practices, and you can increase flow many times over.
ITIL 4 and SRE take very different approaches to change management. In SRE, the team is given an error budget that represents the gap between perfect reliability and agreed service level objectives (SLOs). While the team is allowed to regulate its own workload, there are policies and consequences that govern what happens if an error budget is blown or service levels breached. Since error budgets are meant to be spent, the team can make autonomous decisions to increase flow.

ITIL 4 has a stronger emphasis on governance and change approvals. The newest guidance now provides different options for assessing changes based on the change category, ranging from a central decision authority to peer-to-peer reviews.

By definition, the purpose of the ITIL 4 change control practice is "to maximize the number of successful IT changes by ensuring that risks have been properly assessed, authorizing changes to proceed, and managing a change schedule."
ITIL 4 does support the use of automation and rapid decision making to expedite change decisions.

2. Shorten feedback loops by improving incident response
The best way to shorten feedback on the quality of a product or service is through incident management. Fewer incidents equal higher quality.

Both ITIL 4 and SRE refer to incident swarming—a model of networked collaboration—as a means to provide simultaneous and fast engagement to reduce the time and impact of a significant incident. Monitoring systems and dashboards visualize current state and can be shared with key stakeholders. ChatOps systems open engagement opportunities for collaboration, input, and feedback on past, present, and even predictive incidents.

Since SREs are an established role with direct access to developers, feedback on both sides can be fairly continuous. SREs also have the technical ability to diagnose and potentially fix incidents independently, so the ability to capture knowledge at the source is shortened. For its part, ITIL 4 advocates for the breakdown of silos—capturing knowledge at the source—and emphasizes the importance of recording incident activities.

3. Foster continuous learning and experimentation
DevOps encourages a culture of experimentation where "fail fast and learn fast" are the keys to practice, mastery, and improvement. This principle is supported by ITIL 4, SRE, and virtually every agile and ITSM framework. The spirit of continuous learning and improvement is embedded in every ITSM activity. In SRE, failure is an opportunity to improve. In ITIL 4, "improve" is called out as a value chain activity.

Process skills are critical to DevOps
The DevOps Institute’s recent Upskilling: Enterprise DevOps Skills Report proved that process skills are statistically equal to technical and soft skills in the current talent landscape.

It is interesting to note that the upper half of "must-have" process skills do not map to a specific framework or method. These are higher-level, critical process skills that can be applied universally in the management of products and services. While still strongly "nice to have," experience with frameworks such as ITIL, Scrum, and project management was not considered essential by the 1,600-plus respondents to the survey.

Which service management framework is right for you?
Both ITIL 4 and SRE have their merits, and both claim to support the DevOps three ways.

Culturally, SRE is more aligned to DevOps values and agile principles in encouraging self-organization, error budgets, smaller and faster increments, and an engineering mindset. You can also hire SREs.

ITIL 4 promotes a more command-and-control-oriented structure than do DevOps and agile, but it hints at closer alignment. However, for traditional organizations that are not ready to take the leap from change control to self-organization, ITIL 4's bi-modal approach may be attractive, if not sustainable for the long term.

Regardless of the framework you choose, it is imperative that you adopt an agile service management mindset to determine how much is "just enough" or "minimally viable" process for the business.

Either way, IT service management is here to stay.

PS - ITSM Academy will begin offering the DevOps Institute's SRE Foundation in 2020

Comments

Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you.

Roles and Responsibilities:
Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance.Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables. An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and the policies (r…

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service".
I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize:
SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle.
ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the needs …

Incidents when a Defect is Involved

Question: We currently track defects in a separate system than our ticket management system. With that said, my question is does anyone have suggestions and/or best practices on how to handle incidents when a defect is involved? Should the incident be closed since the defect is being worked on in another defect tracking system if it is noted in the incident ticket? I am considering creating an incident statuses of 'closed-unresolved' so the incident can still be reported on in our ticket management system but know it is being worked on/tracked in the defect system. With defects, it is possible that we may never work on them because they are very low priority and the impact is low to the user. However, in some cases a defect is being worked on. Should we create a problem ticket instead?
Thanks, René W.

Answer: René. In ITIL, the activity you are describing is handled by the Problem Management process. ITIL does not use the term “defect” but it does use the term “known error” to…