Skip to main content

SRE Is the Most Innovative Approach to ITSM Since ITIL

Originally published on, written by Jayne Groll, CEO of DevOps Institute

For over a decade, ITIL has been the leading ITSM framework adopted by enterprises across the globe. So, what is driving a rapidly increasing interest in Site Reliability Engineering (SRE) as a service management alternative?

In its own words, Google refers to SRE as its approach to service management: “The SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning.”

In traditional ITSM terms, the role of the SRE is responsible for service level, change, availability, event, incident, problem, capacity, performance, infrastructure and platform management. While the operational practice areas may be similar, there are significant differences in how the practices are approached.

ITIL4 Framework Compared to SRE

Released in 2019, the newest update to ITIL4 remains a complex governance model with four dimensions, seven guiding principles, a Service Value System and 34 processes (now renamed as practices). While ITIL4 pays homage to Agile and DevOps, there is little depth in the publications released to date—to be fair, several other publications in the series are in queue for gradual release. The framework spans just about every aspect of software delivery and operations and seemingly is trying to be the single source of truth in both principle and practice for IT management.

SRE Is More Closely Aligned to Agile and DevOps

By comparison, SRE focuses specifically on the reliability and resilience of complex production system operations. As an engineering discipline, SRE more closely aligns with the Agile and DevOps patterns that are being adopted by product development teams including continuous integration, testing, delivery, and deployment. SREs bring the wisdom of production into the teams thereby breaking down many of the silo walls that have impeded IT for so long.

As a service management alternative, SRE also updates traditional ITSM activities with innovative and self-organizing concepts such as management to service level objectives, error budgets, toil reduction, release engineering, monitoring/observability and embracing risk as neutral approaches to service management. The core SRE book provides practical and actionable guidance for Site Reliability Engineers on managing incidents, learning from failure, testing for reliability, load balancing, handling different types of emergencies, software engineering, and capacity planning. Related publications such as the Site Reliability Engineering Workbook and Seeking SRE provide additional insight.

Most importantly, a Site Reliability Engineer is an actual hireable job with a defined role, set of responsibilities and skills. SREs and SRE teams are encouraged to be creative, accountable and must spend 50% of their time reducing toil by engineering automation in order to make tomorrow better than today. Like Agile and DevOps, SRE supports self-regulation with policies and consequences. In fact, some consider SRE to be the third piece of the Develop (Agile), Deploy (DevOps), Operate (SRE) feedback loop.

What This Means for the “Humans of DevOps” in Operations and ITSM

SRE is breathing new life and opening new career paths for operations and ITSM professionals who a few years ago were battling against the mantra of “NoOps.” According to Linkedin’s 2020 Emerging Jobs Report, SRE is the fifth fastest-growing job role with 34% growth.

Services will always need to be managed. However, competing in a digital age requires new ways of working and thinking with speed and quality as key metrics. Agility must be instilled across the value stream spectrum in order to increase flow and deliver an exemplary customer experience. Your organization may not be as complex as Google’s, but the principles and practices of SRE are applicable to all environments.

You can read the Site Reliability Engineering and Site Reliability Engineering Workbook publications for free from Google. For those wanting to learn more about the practices and patterns associated with SRE, DevOps Institute recently released its Site Reliability Foundation certification with accredited training being offered globally by its Global Registered Education Partner network. Either way, if you are an operations or ITSM professional, I would highly recommend learning more about SRE. It is the future of ITSM as we cross the digital divide.

To learn more; check out this webinar, and consider the following ITSM Academy certification course:


Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you. Roles and Responsibilities: Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance. Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables.  An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and th

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service". I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize: SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle. ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the

Four Service Characteristics

Recently I came across several articles by researchers and experts that laid out definitions and characteristics of services. ITIL provides us with a definition that can help drive the creation of value-laden services: A means of delivering value to customers by facilitating outcomes customers want to achieve without the ownership of specific costs and risks. An area that ITIL is not so clear is in terms of service characteristics. Several researchers and experts put forth that services have four basic characteristics (IHIP): ·          Intangibility—Services are the results of actions not things. They have no physical presence and represent a logical set of elements. One way to think of service is “work done for others.” ·          Heterogeneity—Also known as “variability”; services are unique items because of the mechanisms used to deliver services-that is people. Because the people element adds variability, the service is variable. This holds true especially for the v