The Evolution of the Site Reliability Engineer

Episode 488 · July 23rd, 2019 · 29 mins 2 secs

About this Episode

This episode of The New Stack Makers podcast talks to two Digital Ocean alumni and co-chairs of SREcon 2020 Americas who have had two very different journeys that led them to become one of the most wanted roles in tech — site reliability engineers. As the name suggests, an SRE is someone focused on the reliability of an organization's most important systems. 

The term site reliability engineer dates back to 2003 when it was coined at Google, but it certainly has existed for decades more in different forms — disaster recovery and production testers, for example — as engineers have always tried to keep essential services like healthcare and finance online. The growing demand for SRE came as we went cloud-native and needed these engineers to work in production and on operations, with a heavy focus on automation and observability.

As systems became increasingly distributed, this is a role that has evolved from just shoring up uptime for a monolith to a relationship broker who has views into organization-wide systems, a knack for problem-solving, and a love of metrics.