Austin uses the example of a platform team managing a large portfolio of applications that all need to share a common layer, like SSO. Sounds simple, but complexity only gets shifted around depending on your approach.
“The role of this platform team is to impose some simplicity onto this complex design that has been foisted upon them. We started coming up with some very complicated solutions to how to do different tasks for different systems. Until we stepped back and instead focused on what these systems had in common. Once we reframed into looking at the similarities instead of the differences, we were able to make a lot of progress very quickly.”
Simplicity is being able to reason about what is happening, even if what is happening is a very difficult and involved process. Just because something is easy, that doesn’t mean it’s simple. Just because something is simple, that doesn’t mean it’s not difficult to do. Simplicity can be very hard to implement when you see it that way.
“Breathing sounds very simple, but it’s not very easy if you think about what is happening in your body to make you breathe. It’s an incredibly complex series of physical biological systems: I’m not a biologist or doctor, so I don’t even understand them all.”
The microservices vs. monolith debate keeps popping up. Austin argues that microservices were never about managing technical complexity, they were about managing organizational complexity. The abstractions for microservice platforms are managing our organizational problems but at a high technical cost. It’s how we try overcoming that cost that might be making lives more difficult.
“On the technical side, that now means you have to be able to have a deeper understanding of your software: you have to be able to introspect the behavior of your application at every single point that it is used … At some point, we’ve covered it with thousands of probes … so much so that no one can even step back to see the application anymore, because we’ve lost sight of what it actually is.”
People are at the heart of these problems because they’re the ones who need to constantly re-evaluate system state. Austin makes the point that people usually fail to do that hard part of system design because it’s seen as a Day 2 problem.
“We make decisions about system normalcy at the tail end of design, using what we’ve known up to that point. Most people aren’t thinking about how they write logging statements to understand application performance at that point.”
We talk about “The Field Guide to Understanding Human Error” for the first time on this show! How did that take so long??? When it comes to understanding systems, it’s better to be clear than to be concise.
“The designers of an airplane cockpit changed the airspeed indicator from a round dial to a linear tape, figuring that it would be easier to quickly read airspeed if your target was in the middle of the tape… What they found was that it actually made it harder to understand and it took cycles away from pilots and added more response time whenever things went wrong.”
Experience is thinking about how humans interpret signals. It’s important to know when to be clever and complex and when it’s better to optimize for reasonability. Austin gives some practical tips to simplify choices when it comes to designing simpler systems.
“While simplicity and easy aren’t necessarily connected, simple and good kind of are.”
Austin wishes he would have known how to write good logs sooner. Very rarely do people stop to consider how to make metrics useful to other people. It’s less about how than why.
Austin is also glad we didn’t ask him about running production software on Windows based systems. The problem wasn’t Windows, it was dealing with heterogeneity.
Austin Parker has been solving - and creating - problems with computers and technology for most of his life. He is the Principal Developer Advocate at LightStep and maintainer on the OpenTracing and OpenTelemetry projects. His professional dream is to build a world where we’re able to create and run more reliable software. In addition to his professional work, he’s taught college classes, spoken about all things DevOps and Distributed Tracing, and even found time to start a podcast. Austin is also the co-author of Distributed Tracing in Practice, published by O’Reilly Media.
Austin is an international speaker, having presented to audiences in Europe and North America on topics relating to Observability and DevOps. In addition, he has led or assisted with workshops on OSS projects such as OpenTelemetry and OpenTracing at events such as QCon SF 2019 and QCon London 2020, and O’Reilly Infrastructure and Ops 2020. Finally, he has extensive experience speaking to diverse audiences in a variety of media formats through his podcast On-Call Me Maybe and his event livestreams such as OPS Live!
George Miranda is a Community Advocate at PagerDuty, where he helps people improve the ways they run software in production. He made a 20+ year career as a Web Operations engineer at a variety of small dotcoms and large enterprises by obsessively focusing on continuous improvement for people and systems. He now works with software vendors that create meaningful tools to solve prevalent IT industry problems.
George tackled distributed systems problems in the Finance and Entertainment industries before working with Buoyant, Chef Software, and PagerDuty. He’s a trained EMT and First Responder who geeks out on emergency response practices. He owns a home in the American Pacific Northwest, roams the world as a Nomad with his wife and dog, and loves writing speaker biographies that no one reads.