By Shaun O'Keefe

Just Enough Reliability

Platform Engineering Ballroom 1 Thursday at 3:10pm - 3:40pm

SRE is all about simplicity. So why do so many of us have a copy of The Google Book sitting on our desk? So much of what we build seeks to emulate companies with millions (if not billions) of customers, while we have neither the scale to justify such effort, nor the runway to still be around when we’re finished. This talk asks - does my medium-sized scale up need Chaos Engineering, circuit breakers, and a 10 tier severity matrix… or does it need a spreadsheet?

SRE suffers from a rather serious problem. We are a group of people who love working on complicated problems. We want challenges and ways to show what we know about reliability. And when we can’t find them… we create them.

This hurts any business. Work that doesn’t help your customers, hurts. But for small start ups and scale ups, it can be fatal. The wrong project can leave you with a lot of burnt runway, fewer customers, and a business that is no longer willing to invest in reliability. Worst of all - none of this is done cynically. Most SREs genuinely want to do good, but need better tools to understand what matters.

SREs need to be practiced in the art of understanding what their business needs now, and what can wait. This talk proposes the humble CUJ (Critical User Journey) as the foundation upon which we can practice that art. It will show engineers how to find their product’s CUJs, use them to drive Observability and measurement where it matters, to break the age-old deadlock of prioritisation using language the business understands, and to understand when reliability work can genuinely afford to wait.

Shaun O'Keefe

Shaun O'Keefe

I’ve been working in and around automation, devex and reliability ever since we were calling it devops. What began as a way to hide the fact I really, really couldn’t write CSS has turned into a surprisingly successful career that I’m not entirely sure I’ve earned. I’ve worked as a staff IC, line manager, as a technical group lead, and one time even a CTO (although that was at a start up, where you can call yourself a CTO and nobody will stop you). I’m a humanist, a minimalist, and a secret socialist. I’m also a member at Aomodori - a reliability consultancy I’ve founded where I can hopefully live out all these lovely values, helping start ups and scale ups building just enough reliability to keep everyone thriving.

I still can’t write CSS. But now I’ve got Claude Code so that’s okay.