On-Call Nightmares With Jay Gordon

Posted on Tuesday, Jan 7, 2020
Jay Gordon is the host of the popular On-Call Nightmares podcast. Matt and Jay discuss some of the stories Jay has heard, as well as how on-call has changed over the years.

Show Notes

“All these conversations at the bar…why is nobody recording them?” - Jay Gordon, the host of the popular On-Call Nightmares podcast, talking about where the idea for the show came from.

One of the biggest myths is that on-call is just an extra part of a SRE or sysadmin’s job. That it’s not really a big part of their duties. It’s just a thing you do; it hasn’t always been taken seriously, especially the impact of being on-call to the individual.

Remember - on-call isn’t just for ops or SRE. Andrew Clay Shafer used to describe himself as a “conscientious developer”, even prior to the ideas of DevOps. Because he thought about things this way, it caused him to be a better developer, and this heavily contributed to the foundation of the DevOps movement.

Software engineers are often resistent to being on-call because of what they think it means - based on the horror stories they hear from their coworkers and friends who work in Ops.

How has on-call changed?

Jay: “Automation has made so much of the difference”

Well-documented automation makes it easier to track down what might be contributing to issues. Having things watching what is going on through the deployment process and watching what’s going on. We have a greater ability to spin up replacement systems, too.

We are changing from a model of having one team who is on-call for everything inside the business; now it is more about selected domain experts on call for the thing they know really well. Being on-call as a developer, you know you are only being called about things you know about. Additionally, the more people that go on call, it’s much less actual impact to all the folks who are on-call. So the experience is a lot different. “We’ve reduced the individual blast radius by distributing it” - Jay.

“The beautiful thing about going on-call is you get to go off-call. If you aren’t on-call, I have news for you - you’re always on-call” - Matt. It’s very relieving to know you are not on call, so you don’t have to worry that someone will call you. “Trust me - your ops team knows how to find you, and they will” - Matt.

On-call requirements are different

Not every company or service requires 24 hour on-call support. When you are thinking about where you want to work, consider this. That said, if you do work for an organization that provides a service around the clock, on-call is likely a part of that job, and everyone should consider it part of their service ownership. But ultimately, make the decision for the role that works for you. It’s less about the title or role, than it is for the type of company or organization and what they need. As Jay points out, “in the end, we are all just people, and we have basic requirements - like eating, having water, getting enough sleep, and spending time with people we like. On-call should still let you do these things”.

A good question to ask when getting into a role that has a on-call component, is ask “how are incident responders rotated off of an incident?” Responders stop being effective after a couple of hours - understanding things like “what’s the size of the rotation?”, “what are the expectations of a responder during an incident?”, are much more important to know than “how often will I get paged?”

How to avoid having an on-call nightmare

Jay: “It always comes down to tech debt. It’s amazing how much tech debt comes down to a lack of documentation. It becomes one of those scary parts that if it falls down, nobody will know what to do”


Jay Gordon

Jay Gordon

Jay Gordon is a Senior Cloud Ops Advocate with the Microsoft Azure Advocates. He and the rest of the Advocacy team are focused on helping Developers and Ops teams get the most out of their cloud experience with Microsoft Azure. Prior to Microsoft, Jay was part of teams at DigtialOcean, BuzzFeed and MongoDB. Jay lives in New York City with his wife and has a goofy pug named Rico.


Matt Stratton

Matt Stratton

Matt Stratton is a DevOps Advocate at PagerDuty, where he helps dev and ops teams advance the practice of their craft and become more operationally mature. He collaborates with PagerDuty customers and industry thought leaders in the broader DevOps community, and back in the day, his license plate actually said “DevOps”.

Matt has over 20 years experience in IT operations, ranging from large financial institutions such as JPMorganChase and internet firms, including Apartments.com. He is a sought-after speaker internationally, presenting at Agile, DevOps, and ITSM focused events, including ChefConf, DevOpsDays, Interop, PINK, and others worldwide. Matty is the founder and co-host of the popular Arrested DevOps podcast, as well as a global organizer of the DevOpsDays set of conferences.

He lives in Chicago and has three awesome kids, whom he loves just a little bit more than he loves Doctor Who. He is currently on a mission to discover the best phở in the world.