Skip to main content

How to Hire Site Reliability Engineers (SREs): 5 Top Qualities

Guest Host Post by Jayne Groll previously posted on The Enterprisers Project, May 13, 2021

The Site Reliability Engineer (SRE) role continues to gain momentum in enterprise IT. Hiring managers, consider this advice on how to spot a strong candidate.


Site Reliability Engineering (SRE) continues to gain momentum among IT organizations. According to the Upskilling 2021: Enterprise DevOps Skills Report, 47 percent of survey respondents (up from 28 percent in 2020) say SRE is a must-have process and framework skill. As the demand for strong SRE skills rises, so does SRE hiring.


However, a challenge for business and hiring managers is determining which skills, traits, and competencies make a strong site reliability engineer. I asked several DevOps Institute Ambassadors and SRE subject matter experts to weigh in on what makes a great SRE. Here’s what they had to say:

1. "Great SREs have a passion for high-quality automation. They have a lot of ideas about automation of toilsome production tasks that can improve reliability and save a lot of time for operations. They are good communicators and like to spend time with developers to understand how new products and services can be deployed and operated in high-scale, high-reliability environments." - Marc Hornbeek, CEO and principal consultant at Engineering DevOps Consulting and author of Engineering DevOps

2. "A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; they define alerts to detect SLI (Service Level Indicator) thresholds. They enable developers on CI/CD automation, quality thresholds, and deployment automation using infrastructure as code. They enable developers to understand how their applications are performing in production building observability. They thoroughly understand deployment and fail-safe strategies. They influence in building fault-tolerant, autoscaling, cost-efficient, high-performing design and architecture.

"An SRE should ensure the consumption of platform standards and consistency of tooling. SREs handle on-call events and do post-mortems. They ensure error budgets are followed, they ensure self-regulation of velocity and stability, and they ensure excess Ops work overflows to the Dev team." - Shivagami Gugan, CTO at CX Tech Unicorn

3. Prize Communication. "A great SRE must have a mix of developer and operations skills. Ideally is not just an ops person and not just a development person. The person must transition between ops and dev very smoothly. A great SRE knows how to communicate well, either writing documentation or talking with their colleagues (especially when working remotely)." - Andre Almar, Co-founder and technical trainer at DevOps Bootcamp

4. Look for longer-term. support experience. “When Google pioneered the SRE approach, they were adamant that all SREs be skilled developers. So, spotting a good SRE is very similar to how one would identify/screen for a good developer. In our company, we use HackerRank to test the proficiency of the devs we hire. Culturally though, the best SREs are developers who have spent time actually maintaining the products that they have built. Many organizations and service providers still adopt short-term project-oriented team structures, so developers end up being shuffled from one product to another instead of sticking with the same product and learning how to support/improve/stabilize it over time." - Lisa Chan, Head of software engineering & DevOps at PETRONAS

5. Look for a person that demonstrates empathy. "Typically, the greatest concentration is on the technical skills, and yes, these are important and to be considered when looking at the toolset to be employed. However, knowledge in the use of tools is something that can be easily trained. Furthermore, any enterprise implementing good SRE is also considering that tools can be easily swapped out, so the need to know and have experience in specific technologies is really not as fundamental as other areas that can’t be trained.

To spot a great SRE, it is key to find someone who has empathy. The greatest barrier to the implementation of any way of working is culture, and for Agile, DevOps and SRE, it is about an open culture. The greatest enemy to having a flowing and open culture is a closed mind. If a candidate is the kind of person who will consider their own role as primary and all others as secondary is possibly not a best fit. Therefore, and something of good advice to candidates also, is to have a holistic perspective for the role you are in and have a balanced perspective on how you fit and impact the other roles around you. Beyond holistics, it is also about having respect for what others do and the challenges they may face. In all, empathy!" - Stephen Walters, Solution architect at xMatters, Inc.

To learn more, consider the following ITSM Academy Certification Course:


Comments

Popular posts from this blog

What is the difference between Process Owner, Process Manager and Process Practitioner?

I was recently asked to clarify the roles of the Process Owner, Process Manager and Process Practitioner and wanted to share this with you. Roles and Responsibilities: Process Owner – this individual is “Accountable” for the process. They are the goto person and represent this process across the entire organization. They will ensure that the process is clearly defined, designed and documented. They will ensure that the process has a set of Policies for governance. Example: The process owner for Incident management will ensure that all of the activities to Identify, Record, Categorize, Investigate, … all the way to closing the incident are defined and documented with clearly defined roles, responsibilities, handoffs, and deliverables.  An example of a policy in could be… “All Incidents must be logged”. Policies are rules that govern the process. Process Owner ensures that all Process activities, (what to do), Procedures (details on how to perform the activity) and th

The Four Ps of Service Design - It’s not all about Technology

People ask me why I think that many designs and projects often fail. The most common answer is from a lack of preparation and management. Many IT organizations just think about the technology (product) implementation and fail to understand the risks of not planning for the effective and efficient use of the four Ps: People, Process, Products (services, technology and tools) and Partners (suppliers, manufacturers and vendors). A holistic approach should be adopted for all Service Design aspects and areas to ensure consistency and integration within all activities and processes across the entire IT environment, providing end to end business-related functionality and quality. (SD 2.4.2) People:   Have to have proper skills and possess the necessary competencies in order to get involved in the provision of IT services. The right skills, the right knowledge, the right level of experience must be kept current and aligned to the business needs. Products:   These are the technology managem

The ITIL Maturity Model

Most organizations, especially service management organizations, strive to improve themselves. For those of us leveraging the ITIL® best practices, continual improvement is part of our DNA. We are constantly evaluating our organizations and looking for ways to improve. To aid in our improvement goals and underscore one of the major components of the ITIL Service Value System , Continual Improvement .   AXELOS has updated the ITIL Maturity Model and is offering new ITIL Assessment services. This will enable organizations to conduct evaluations and establish baselines to facilitate a continual improvement program. A while back I wrote an article on the importance of conducting an assessment . I explained the need to understand where you are before you can achieve your improvement goals. Understanding where you are deficient, how significant gaps are from your maturity objectives, and prioritizing which areas to focus on first are key to successfully improving. One method many organi