Skip to main content

Posts

Showing posts with the label SRE

Why Am I Excited to Teach Site Reliabilty Engineering (SRE) Foundation?

I really like teaching Site Reliability Engineering (SRE) Foundation course.  I find it really effective to link SRE Foundation to the learners’ needs of incorporating SRE core concepts to ITSM and DevOps (and any other framework!)  This course allows me to explain how SRE improves operational excellence and quality, a key performance measure for ITSM. It also allows me to explain how SRE improves Automation, not only with the DevOps pipeline, but also how Ops uses this data to improve the flow of work into operations, and then automate repetitive tasks by utilizing tools (e.g., ChatOps).  Most importantly, SRE improves collaboration with customers, defining Service Level Objectives (SLO’s) so that IT consistently achieves (and exceeds) customers’ expectations AND delivers VALUE for the organization.  Automated monitoring is NOT enough these days, we must include observability, using automation to manage security, and ultimately delivering improved IT service quality to the business.

How to Hire Site Reliability Engineers (SREs): 5 Top Qualities

Guest Host Post by Jayne Groll previously posted on The Enterprisers Project , May 13, 2021 The Site Reliability Engineer (SRE) role continues to gain momentum in enterprise IT. Hiring managers, consider this advice on how to spot a strong candidate. Site Reliability Engineering (SRE) continues to gain momentum among IT organizations. According to the Upskilling 2021: Enterprise DevOps Skills Report, 47 percent of survey respondents (up from 28 percent in 2020) say SRE is a must-have process and framework skill. As the demand for strong SRE skills rises, so does SRE hiring . However, a challenge for business and hiring managers is determining which skills, traits, and competencies make a strong site reliability engineer. I asked several DevOps Institute Ambassadors and SRE subject matter experts to weigh in on what makes a great SRE. Here’s what they had to say: 1. "Great SREs have a passion for high-quality automation . They have a lot of ideas about automation of toilsome prod

Then and Now – Site Reliability & DevOps

In the past, the idea was to build in the non-functional requirements of service to the best of our ability based on experience or best guess. Sometimes the general thought was, “We will worry about any residual work for availability, capacity, and reliability after the product or service is deployed”. This focus ensured that the product was fit for purpose, but did not ensure that the product was fit for use, that it was reliable. This approach is very costly to the operations of the service and negative consumer impact impedes opportunities for market share.  This type of focus also creates silos between Dev and Ops and Ops become firefighters.  The costs for operations are not sustainable! In addition to loss of revenues, staff morale begins to slip.  So, reliability is really the key to success. Think about your cell phone. A heavy focus on functional requirements would mean that you can make phone calls. You can text, you can  take photos, you can use your maps and a variety of

Effective and Efficient Incident Response – Rethinking the way YOU work!

Learn more about new ways to do work! Explore DevOps, ITIL, SRE, XLA’s and more ! Silos are not uncommon, but when you silo the service desk from second and third-tier support staff, you likely have a recipe for pain. An ineffective incident response system within the organization is painful and disrupts the entire organization, especially the customers. We must shift the way we think and work to stabilize and improve the situation. One organization felt that they had a grip on service desk and incident management, but they blamed the subject matter experts for breaches to Service Level Agreements . The blame game is always detrimental. Their process consisted of the service desk agents receiving the incident, performing the initial triage, and then forwarding it to the subject matter expert based on how they categorized the incident. Sound familiar? Sometimes we pass tickets to and fro, get everybody and their brother involved, wait on email responses, and create chaos that frustrat

ITIL® 4 and Site Reliability Engineering

Originally posted on owlpoint.com , August 11, 2020, and written by Mark Blanke , CEO of Owlpoint, and Chairman of The CIO Initiative One of the aspects of ITIL 4 that has impressed me the most is the integration and reference to so many other best practices and frameworks. One such reference is to Site Reliability Engineering aka SRE . SRE was originally developed by Google in the mid 2000s as a way of operating and administering productions system with a software development mindset. One of Google’s key drivers in building out SRE was to help bring developers and operations people together. Sounds like DevOps , right? In reality, they come from the same mindset, but there are key differences. Google only recently started sharing the SRE concepts. It was their secret sauce and a way to be far more effective in operating their systems and maintaining a highly reliable environment. However, over time, they realized that it would be better for them to share their methods, so the

The EVOLUTION of the ENGINEER – Site Reliability Engineers

ALL CALL SREs REQUIRED!  Let’s take a walk down to the ocean and while you consider the opportunity, benefits, and $$$, think about dipping your toe in. Let’s explore Reliability, Site Reliability, and the Site Reliability Engineer .  No doubt the world is evolving. People are evolving and tech is evolving. Business and customer requirements are evolving. The evolution of systems requires the evolution of engineers. Nature and pandemics put undue stress on our resources! In comes the Certified Site Reliability Engineer .  "Urgent, Urgent, Urgent… All hands on deck!",  is a call that practitioners, managers, and organizations do not want to hear and recognize must stop! Reliability – At a minimum, we recognize that the delivery of service is not dependent solely on the quality of the product itself and the goal is not that the products or service merely be deployed. A service must be operated and sustained over a period. How long? For the life of the service.

SRE Is the Most Innovative Approach to ITSM Since ITIL®

Originally published on DevOps.com , written by Jayne Groll , CEO of DevOps Institute For over a decade, ITIL has been the leading ITSM framework adopted by enterprises across the globe. So, what is driving a rapidly increasing interest in Site Reliability Engineering (SRE) as a service management alternative? In its own words, Google refers to SRE as its approach to service management: “The SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning.” In traditional ITSM terms, the role of the SRE is responsible for service level, change, availability, event, incident, problem, capacity, performance, infrastructure and platform management. While the operational practice areas may be similar, there are significant differences in how the practices are approached. ITIL4 Framework Compared to SRE Released in 2019, the newest update to ITIL4 remains a complex governance model with four dimensi

Up YOUR Game – Become a Certified Process Design Engineer!

I find that there are many people that do not understand WHAT a Certified Process Design Engineer (CPDE) really is (be sure to scroll down on the page and then download the free whitepaper for surprising details). The CPDE role is likely much broader and deeper than you might think! Time and Money?! Yes, but not at the expense of quality and stability!  The role of a Certified Process Design Engineer is a critical skill set for all IT service  providers. There are many frameworks and standards that  define practices and methods for achieving success; ITIL 4 , Agile , Lean , DevOps , COBIT, ISO, and Site Reliability Engineering (SRE) are only a few. My point is that while each describes processes and controls (what to do), they don’t provide clear, step-by-step methods and techniques for designing, reengineering and improving processes (how to do it).  A Certified Process Design Engineer equips managers and staff at all levels to lead the organization to do t

How ITIL 4 and SRE align with DevOps

In the early days of DevOps, there was a lot of debate about the ongoing relevancy of ITIL and IT service management (ITSM) in a faster-paced agile and DevOps world. Thankfully, that debate is coming to an end. ITSM processes are still essential, but, like all aspects of IT, they too must transform. Recent updates to ITIL  (ITIL 4), as well as increased interest in site reliability engineering (SRE), are providing new insights into how to manage services in a digital world. Here's a look at ITIL 4 and SRE and how each underpins the "Three Ways of DevOps," as defined in The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford.‎ What is ITIL 4?  ITIL 4 is the next evolution of the well-known service management framework from Axelos. It introduces a new Service Value System (SVS) that's supported by the guiding principles from the ITIL Practitioner Guidance publication. The framework eases into its alignment with DevOps and agile through a bi-mo

Site Reliability Engineer – Explosion

The Practice Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations with the goal of creating ultra-scalable and highly reliable software systems. It is an Explosion!  If you have taken any classes including ITIL4, DevOps, Agile, or Lean , you have probably heard how critical Site Reliability Engineering (SRE) is to the Value Streams and Pipelines that deliver products and services to this world. New concepts like understanding “Error Budgets” and the creation of anti-fragile environments are explored. You only need to visit one of the job sites and do a search on “Site Reliability Engineering” to see that there is a huge uplift in demand for Site Reliability Engineers. Try it! T he Role As a Site Reliability Engineer, you'll build solutions to enhance availability, performance, and stability for the resilience of services. You will also work towards a Continuous Delivery Pipeline by automati

Site Reliability Engineering

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies that to operations with the goal of creating ultra-scalable and highly reliable software systems.  Google’s mastermind behind SRE, Ben Treynor, describes site reliability as “what happens when a software engineer is tasked with what used to be called operations.” Historically, Dev teams want to release new features in a continuous manner (Change). Ops teams want to make sure that those features don’t break their stuff (Reliability). Of course the business wants both, so these groups have been incentivized very differently leading to what Lee Thompson ( (formerly of E*TRADE) coined the “wall of confusion”.  This inherent conflict creates a downward spiral that creates slower feature time to market, longer deployment cycles, increasing numbers of outages, and an ever increasing amount of technical debt. The discipline of SRE can begin to reduce this dilemma by