Skip to main content

Posts

Showing posts with the label Service Level Objectives

5 Essentials You Must Be Doing to be an SRE

Site Reliability Engineering (SRE) is more than a job title; it’s a mindset, a philosophy, and a set of practices designed to bridge the gap between development and operations. However, not every team or professional using the SRE title truly embodies what it means to be an SRE. In this blog, we’ll explore five key practices that define true SREs. If you’re not doing these, you might want to rethink calling yourself or your team an SRE. 1. Prioritizing Reliability Over Everything Else SREs live and breathe reliability. If you’re not actively measuring and maintaining your systems' availability, performance, and durability, then you’re missing the core purpose of SRE. What You Should Be Doing: Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Use error budgets to balance feature development and system stability. Implement incident response processes to minimize downtime. 2. Automating Toil Away Toil—the repetitive, manual tasks that...

Why Am I Excited to Teach Site Reliabilty Engineering (SRE) Foundation?

I really like teaching Site Reliability Engineering (SRE) Foundation course.  I find it really effective to link SRE Foundation to the learners’ needs of incorporating SRE core concepts to ITSM and DevOps (and any other framework!)  This course allows me to explain how SRE improves operational excellence and quality, a key performance measure for ITSM. It also allows me to explain how SRE improves Automation, not only with the DevOps pipeline, but also how Ops uses this data to improve the flow of work into operations, and then automate repetitive tasks by utilizing tools (e.g., ChatOps).  Most importantly, SRE improves collaboration with customers, defining Service Level Objectives (SLO’s) so that IT consistently achieves (and exceeds) customers’ expectations AND delivers VALUE for the organization.  Automated monitoring is NOT enough these days, we must include observability, using automation to manage security, and ultimately delivering improved IT service quali...