Skip to main content

Posts

Showing posts with the label Site Reliability Engineering

Why Am I Excited to Teach Site Reliabilty Engineering (SRE) Foundation?

I really like teaching Site Reliability Engineering (SRE) Foundation course.  I find it really effective to link SRE Foundation to the learners’ needs of incorporating SRE core concepts to ITSM and DevOps (and any other framework!)  This course allows me to explain how SRE improves operational excellence and quality, a key performance measure for ITSM. It also allows me to explain how SRE improves Automation, not only with the DevOps pipeline, but also how Ops uses this data to improve the flow of work into operations, and then automate repetitive tasks by utilizing tools (e.g., ChatOps).  Most importantly, SRE improves collaboration with customers, defining Service Level Objectives (SLO’s) so that IT consistently achieves (and exceeds) customers’ expectations AND delivers VALUE for the organization.  Automated monitoring is NOT enough these days, we must include observability, using automation to manage security, and ultimately delivering improved IT service quality to the business.

How to Hire Site Reliability Engineers (SREs): 5 Top Qualities

Guest Host Post by Jayne Groll previously posted on The Enterprisers Project , May 13, 2021 The Site Reliability Engineer (SRE) role continues to gain momentum in enterprise IT. Hiring managers, consider this advice on how to spot a strong candidate. Site Reliability Engineering (SRE) continues to gain momentum among IT organizations. According to the Upskilling 2021: Enterprise DevOps Skills Report, 47 percent of survey respondents (up from 28 percent in 2020) say SRE is a must-have process and framework skill. As the demand for strong SRE skills rises, so does SRE hiring . However, a challenge for business and hiring managers is determining which skills, traits, and competencies make a strong site reliability engineer. I asked several DevOps Institute Ambassadors and SRE subject matter experts to weigh in on what makes a great SRE. Here’s what they had to say: 1. "Great SREs have a passion for high-quality automation . They have a lot of ideas about automation of toilsome prod

Happy Retirement ITIL© v3 Foundation! Passing the Torch to ITIL 4!

Retirement is a time that marks a new beginning. It’s a major transition that isn’t always easy. This is  true whether it relates to the retirement of people, or a technology, or as is the case with ITIL v3 Foundation, a certification. Like other major transitions, the retirement of ITIL v3 Foundation has sparked a variety of emotions and concerns. On a positive note, we can look back fondly on ITIL v3 and celebrate the progress that it has enabled us to make in terms of promoting the value of service management. It helped us to understand what processes are and the importance of continually improving those processes. It also paved the way for us to understand the importance of aligning service management with business requirements. Concerns, however, have started to creep in. Is ITIL v3 enough in the digital age? Or perhaps more importantly, is ITIL v3 too much when viewed through the lens of adjacent ways of work such as Agile, Lean, and DevOps? Have our processes become unnecessaril

Upskilling Your Service Management Office (SMO)

By Donna Knapp and Jeff Jensen Let’s answer the obvious question first. What is a service management office (SMO)? ITIL® describes an SMO as a “group or department that functions as a center of excellence for service management, ensuring continual development and the consistent application of management practices across an organization.” So given that service management is a “set of specialized organizational capabilities for enabling value for customers in the form of services”, it is the SMO that helps the organization to develop these capabilities. A SMO can be formalized and have significant authority to drive service management in the organization, or it can be less-formal teams focused on continual development of the organization’s management practices. In some organizations, the SMO provides a management structure for the various practice/process owners and managers to report into. This also allows for a roll-up of enterprise metrics and reporting, and in some cases provides

Then and Now – Site Reliability & DevOps

In the past, the idea was to build in the non-functional requirements of service to the best of our ability based on experience or best guess. Sometimes the general thought was, “We will worry about any residual work for availability, capacity, and reliability after the product or service is deployed”. This focus ensured that the product was fit for purpose, but did not ensure that the product was fit for use, that it was reliable. This approach is very costly to the operations of the service and negative consumer impact impedes opportunities for market share.  This type of focus also creates silos between Dev and Ops and Ops become firefighters.  The costs for operations are not sustainable! In addition to loss of revenues, staff morale begins to slip.  So, reliability is really the key to success. Think about your cell phone. A heavy focus on functional requirements would mean that you can make phone calls. You can text, you can  take photos, you can use your maps and a variety of

Effective and Efficient Incident Response – Rethinking the way YOU work!

Learn more about new ways to do work! Explore DevOps, ITIL, SRE, XLA’s and more ! Silos are not uncommon, but when you silo the service desk from second and third-tier support staff, you likely have a recipe for pain. An ineffective incident response system within the organization is painful and disrupts the entire organization, especially the customers. We must shift the way we think and work to stabilize and improve the situation. One organization felt that they had a grip on service desk and incident management, but they blamed the subject matter experts for breaches to Service Level Agreements . The blame game is always detrimental. Their process consisted of the service desk agents receiving the incident, performing the initial triage, and then forwarding it to the subject matter expert based on how they categorized the incident. Sound familiar? Sometimes we pass tickets to and fro, get everybody and their brother involved, wait on email responses, and create chaos that frustrat

ITIL® 4 and Site Reliability Engineering

Originally posted on owlpoint.com , August 11, 2020, and written by Mark Blanke , CEO of Owlpoint, and Chairman of The CIO Initiative One of the aspects of ITIL 4 that has impressed me the most is the integration and reference to so many other best practices and frameworks. One such reference is to Site Reliability Engineering aka SRE . SRE was originally developed by Google in the mid 2000s as a way of operating and administering productions system with a software development mindset. One of Google’s key drivers in building out SRE was to help bring developers and operations people together. Sounds like DevOps , right? In reality, they come from the same mindset, but there are key differences. Google only recently started sharing the SRE concepts. It was their secret sauce and a way to be far more effective in operating their systems and maintaining a highly reliable environment. However, over time, they realized that it would be better for them to share their methods, so the

The EVOLUTION of the ENGINEER – Site Reliability Engineers

ALL CALL SREs REQUIRED!  Let’s take a walk down to the ocean and while you consider the opportunity, benefits, and $$$, think about dipping your toe in. Let’s explore Reliability, Site Reliability, and the Site Reliability Engineer .  No doubt the world is evolving. People are evolving and tech is evolving. Business and customer requirements are evolving. The evolution of systems requires the evolution of engineers. Nature and pandemics put undue stress on our resources! In comes the Certified Site Reliability Engineer .  "Urgent, Urgent, Urgent… All hands on deck!",  is a call that practitioners, managers, and organizations do not want to hear and recognize must stop! Reliability – At a minimum, we recognize that the delivery of service is not dependent solely on the quality of the product itself and the goal is not that the products or service merely be deployed. A service must be operated and sustained over a period. How long? For the life of the service.

SRE Is the Most Innovative Approach to ITSM Since ITIL®

Originally published on DevOps.com , written by Jayne Groll , CEO of DevOps Institute For over a decade, ITIL has been the leading ITSM framework adopted by enterprises across the globe. So, what is driving a rapidly increasing interest in Site Reliability Engineering (SRE) as a service management alternative? In its own words, Google refers to SRE as its approach to service management: “The SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning.” In traditional ITSM terms, the role of the SRE is responsible for service level, change, availability, event, incident, problem, capacity, performance, infrastructure and platform management. While the operational practice areas may be similar, there are significant differences in how the practices are approached. ITIL4 Framework Compared to SRE Released in 2019, the newest update to ITIL4 remains a complex governance model with four dimensi

Up YOUR Game – Become a Certified Process Design Engineer!

I find that there are many people that do not understand WHAT a Certified Process Design Engineer (CPDE) really is (be sure to scroll down on the page and then download the free whitepaper for surprising details). The CPDE role is likely much broader and deeper than you might think! Time and Money?! Yes, but not at the expense of quality and stability!  The role of a Certified Process Design Engineer is a critical skill set for all IT service  providers. There are many frameworks and standards that  define practices and methods for achieving success; ITIL 4 , Agile , Lean , DevOps , COBIT, ISO, and Site Reliability Engineering (SRE) are only a few. My point is that while each describes processes and controls (what to do), they don’t provide clear, step-by-step methods and techniques for designing, reengineering and improving processes (how to do it).  A Certified Process Design Engineer equips managers and staff at all levels to lead the organization to do t

ITIL® 4 vs. 'The Source'​

Part of ITIL 4 ’s value proposition is that it embraces newer ways of working, such as Agile, Lean and DevOps. I was recently asked whether there was a compelling argument for individuals to go to ITIL for information about these approaches, vs. going to ‘the source’. Here’s my answer and I’d love to hear yours. 3) What source? Yes. There is a massive amount of information available about these topics. There are many ‘definitive’ sources of knowledge. For lifelong learners such as myself, these sources are a joy. They can also be overwhelming and at times a challenge to apply. A search for information about Lean, for example, may take you down a manufacturing route which then requires translation. Looking to learn more about Agile? Which method? Scrum, SAFe, extreme programming … you get the point. 2) The source is evolving. As an example, DevOps practitioners often pride themselves in the fact that there is no definitive body of knowledge; rather, there is an evolving col

How ITIL 4 and SRE align with DevOps

In the early days of DevOps, there was a lot of debate about the ongoing relevancy of ITIL and IT service management (ITSM) in a faster-paced agile and DevOps world. Thankfully, that debate is coming to an end. ITSM processes are still essential, but, like all aspects of IT, they too must transform. Recent updates to ITIL  (ITIL 4), as well as increased interest in site reliability engineering (SRE), are providing new insights into how to manage services in a digital world. Here's a look at ITIL 4 and SRE and how each underpins the "Three Ways of DevOps," as defined in The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford.‎ What is ITIL 4?  ITIL 4 is the next evolution of the well-known service management framework from Axelos. It introduces a new Service Value System (SVS) that's supported by the guiding principles from the ITIL Practitioner Guidance publication. The framework eases into its alignment with DevOps and agile through a bi-mo

Site Reliability Engineer – Explosion

The Practice Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations with the goal of creating ultra-scalable and highly reliable software systems. It is an Explosion!  If you have taken any classes including ITIL4, DevOps, Agile, or Lean , you have probably heard how critical Site Reliability Engineering (SRE) is to the Value Streams and Pipelines that deliver products and services to this world. New concepts like understanding “Error Budgets” and the creation of anti-fragile environments are explored. You only need to visit one of the job sites and do a search on “Site Reliability Engineering” to see that there is a huge uplift in demand for Site Reliability Engineers. Try it! T he Role As a Site Reliability Engineer, you'll build solutions to enhance availability, performance, and stability for the resilience of services. You will also work towards a Continuous Delivery Pipeline by automati