Skip to main content

How to Hire Site Reliability Engineers (SREs): 5 Top Qualities

Guest Host Post by Jayne Groll previously posted on The Enterprisers Project, May 13, 2021

The Site Reliability Engineer (SRE) role continues to gain momentum in enterprise IT. Hiring managers, consider this advice on how to spot a strong candidate.

Site Reliability Engineering (SRE) continues to gain momentum among IT organizations. According to the Upskilling 2021: Enterprise DevOps Skills Report, 47 percent of survey respondents (up from 28 percent in 2020) say SRE is a must-have process and framework skill. As the demand for strong SRE skills rises, so does SRE hiring.

However, a challenge for business and hiring managers is determining which skills, traits, and competencies make a strong site reliability engineer. I asked several DevOps Institute Ambassadors and SRE subject matter experts to weigh in on what makes a great SRE. Here’s what they had to say:

1. "Great SREs have a passion for high-quality automation. They have a lot of ideas about automation of toilsome production tasks that can improve reliability and save a lot of time for operations. They are good communicators and like to spend time with developers to understand how new products and services can be deployed and operated in high-scale, high-reliability environments." - Marc Hornbeek, CEO and principal consultant at Engineering DevOps Consulting and author of Engineering DevOps

2. "A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; they define alerts to detect SLI (Service Level Indicator) thresholds. They enable developers on CI/CD automation, quality thresholds, and deployment automation using infrastructure as code. They enable developers to understand how their applications are performing in production building observability. They thoroughly understand deployment and fail-safe strategies. They influence in building fault-tolerant, autoscaling, cost-efficient, high-performing design and architecture.

"An SRE should ensure the consumption of platform standards and consistency of tooling. SREs handle on-call events and do post-mortems. They ensure error budgets are followed, they ensure self-regulation of velocity and stability, and they ensure excess Ops work overflows to the Dev team." - Shivagami Gugan, CTO at CX Tech Unicorn

3. Prize Communication. "A great SRE must have a mix of developer and operations skills. Ideally is not just an ops person and not just a development person. The person must transition between ops and dev very smoothly. A great SRE knows how to communicate well, either writing documentation or talking with their colleagues (especially when working remotely)." - Andre Almar, Co-founder and technical trainer at DevOps Bootcamp

4. Look for longer-term. support experience. “When Google pioneered the SRE approach, they were adamant that all SREs be skilled developers. So, spotting a good SRE is very similar to how one would identify/screen for a good developer. In our company, we use HackerRank to test the proficiency of the devs we hire. Culturally though, the best SREs are developers who have spent time actually maintaining the products that they have built. Many organizations and service providers still adopt short-term project-oriented team structures, so developers end up being shuffled from one product to another instead of sticking with the same product and learning how to support/improve/stabilize it over time." - Lisa Chan, Head of software engineering & DevOps at PETRONAS

5. Look for a person that demonstrates empathy. "Typically, the greatest concentration is on the technical skills, and yes, these are important and to be considered when looking at the toolset to be employed. However, knowledge in the use of tools is something that can be easily trained. Furthermore, any enterprise implementing good SRE is also considering that tools can be easily swapped out, so the need to know and have experience in specific technologies is really not as fundamental as other areas that can’t be trained.

To spot a great SRE, it is key to find someone who has empathy. The greatest barrier to the implementation of any way of working is culture, and for Agile, DevOps and SRE, it is about an open culture. The greatest enemy to having a flowing and open culture is a closed mind. If a candidate is the kind of person who will consider their own role as primary and all others as secondary is possibly not a best fit. Therefore, and something of good advice to candidates also, is to have a holistic perspective for the role you are in and have a balanced perspective on how you fit and impact the other roles around you. Beyond holistics, it is also about having respect for what others do and the challenges they may face. In all, empathy!" - Stephen Walters, Solution architect at xMatters, Inc.

To learn more, consider the following ITSM Academy Certification Course:


Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you. Roles and Responsibilities: Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance. Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables.  An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and th

Four Service Characteristics

Recently I came across several articles by researchers and experts that laid out definitions and characteristics of services. ITIL provides us with a definition that can help drive the creation of value-laden services: A means of delivering value to customers by facilitating outcomes customers want to achieve without the ownership of specific costs and risks. An area that ITIL is not so clear is in terms of service characteristics. Several researchers and experts put forth that services have four basic characteristics (IHIP): ·          Intangibility—Services are the results of actions not things. They have no physical presence and represent a logical set of elements. One way to think of service is “work done for others.” ·          Heterogeneity—Also known as “variability”; services are unique items because of the mechanisms used to deliver services-that is people. Because the people element adds variability, the service is variable. This holds true especially for the v

How Does ITIL Help in the Management of the SDLC?

I was recently asked how ITIL helps in the management of the SDLC (Software Development Lifecycle).  Simply put... SDLC is a Lifecycle approach to produce the software or the "product".  ITIL is a Lifecycle approach that focuses on the "service". I’ll start by reviewing both SDLC and ITIL Lifecycles and then summarize: SDLC  -  The intent of an SDLC process is to help produce a product that is cost-efficient, effective and of high quality. Once an application is created, the SDLC maps the proper deployment of the software into the live environment. The SDLC methodology usually contains the following stages: Analysis (requirements and design), construction, testing, release and maintenance.  The focus here is on the Software.  Most organizations will use an Agile or Waterfall approach to implement the software through the Software Development Lifecycle. ITIL  -  is a best practice for IT service management (ITSM) that focuses on aligning IT services with the