Service Mesh with Jason Morgan

Transcript

Mandi Walls: Welcome to Page it to the Limit, a podcast where we explore what it takes to run software in production successfully. We cover leading practices used in the software industry to improve the system and liability and the lives of the people supporting those systems. I’m your host, Mandi Walls. Find me at L-N-X-C-H-K on Twitter. All right. Welcome back. This week I have with me, Jason Morgan. Jason, welcome to the show.

Jason Morgan: Hey, thanks for having me.

Mandi Walls: Tell us about yourself. What do you do? Where do you work? What are you interested in?

Jason Morgan: I am a technical evangelist, which is a title that I thought I made up, but now I see it all over the place. So not sure I made it up, but I’m technical evangelist for a company called Buoyant. And it’s my job to talk to folks about the Linkerd open source project, which is a service mesh built for Kubernetes, and try and convince them to use it.

Mandi Walls: Awesome. So let’s start out with just the basics. What’s a service mesh?

Jason Morgan: Yeah. Great, great question. A service mesh is a construct that came about, I’m not sure exactly when, but somewhere in the 2013, 2015 time range, and effectively what happened is as folks started building, whether you want to call them microservices or not… Distributed applications. The initial approach was to take tooling for service discovery, take tooling for setting up HBS or encryption between your connections. Take load balancing, take gathering metrics, and shove them into a library that developers would import into their applications, that would do kind of how to be a good citizen on the network stuff. And that first pattern was okay until the libraries had to change. And as you get 10, 20, 30 teams using the shared library, especially if you have multiple languages, you get more complex. But as you get a bunch of teams using the shared library, it was really hard to change this, what is at its core infrastructure code, inside all these different apps at the same time. So basically this became a problem. As people started to do microservices at scale, and folks came up with some different ways to solve it. The one that our company came up with was this idea of, hey, what if I took that library that has these kind of network functions and stripped it from the application and instead ran it in its own separate process and then routed everybody’s communication, this communication through this second process? And if you do that, suddenly you take that how to be a good citizen on the network functionality out of your app. You still get the benefit of it in your app, but you don’t get the drag on development or developer velocity that you see when it’s inside your application. Does that make sense?

Mandi Walls: So is it acting as sort of a proxy for all of its other requests so that I don’t have to keep updating when microservices move around as well as everything else that’s going on?

Jason Morgan: In fact, that’s a way easier way to describe it. So really what it is, it’s taking all the network logic and shoving it into a proxy and you run the clock inside your application. You force all your traffic to go through the proxy. And now suddenly your app can just do looking up users or checking the database for whatever… Or whatever it’s trying to do. And then it can just assume that it’s going to get encryption. It can assume that it’s going to get standard metric surface. It’s going to assume that it’s going to make good load balancing choices. And then the proxy will just do that for it.

Mandi Walls: Oh, okay. So then when I have changes to update in the network, it just goes out to those little proxies and that takes care of letting everybody else know what’s going on?

Jason Morgan: Yeah, exact… So yes, and even better, it shifts a infrastructure concern to your platform team and away from your app developers, which is the real innovation. If you can just have developers not do it, then they can focus on business logic, and the platform team that really cares about the TLS, the observability and metrics, the load balancing, all that stuff. They can handle configuring that site.

Mandi Walls: Okay. So as a recovery system, and I have definitely been in a place where I have received code to be installed in production that has non-production resources hard coded into its configuration, and had to unwind things, change the configs for fraud, wind them back up and then deploy them. So this sounds much better.

Jason Morgan: The idea is just let app devs focus on the app. Let the platform team focus on the platform. And the service mesh has been a great tool for it. In general, we see, as folks are maturing in their Kubernetes journey, more and more people are running into the, “Holy cow, I need a service mesh,” situation.

Mandi Walls: Is it useful outside of Kubernetes, for folks who are maybe using something else?

Jason Morgan: So service mesh. Some service mesh offerings absolutely work beyond Kubernetes. Linkerd service mesh, the one that I promote and work for, it does not. It is solely focused on the Kubernetes environment.

Mandi Walls: Okay. So tell us a bit about that one. Is it open source project? How’d you guys get involved with it? What’s its life like?

Jason Morgan: Yeah. So it came out, so the original founders of Buoyant are some folks that were working at Twitter and were using something… Either it was Finagle, or it was some earlier distribution that was very similar to the Finagle tool. They built the service mesh being like, “Hey, we could take something like Finagle, put it into a little proxy and ride with it.” And so there was an initial iteration called Linkerd One, which was basically this library packed into a proxy and it was nice, but it was built on Java. And so it wasn’t that performant. And it also was targeted originally at… And I forget even the name of the orchestrator. Mesosphere. So it was originally targeted at that. And then Linkerd Two, the version they rewrote, is entirely Kubernetes focused. It doesn’t have any Java, it’s all Go and Rust. And it’s made to be really easy to use in Kubernetes. So our philosophy for our service mesh is if you’re running an app in Kubernetes, you should be able to install Linkerd, add your app to the mesh, and it still works. In fact, it just works better. Because now you have mutual TLS. Now you have standard metrics. Now you have request level, load balancing, and some things you didn’t have before, but without any change to your app. So by writing Linkerd entirely for Kubernetes, we’re able to keep it really simple.

Mandi Walls: Interesting. Okay. And then it doesn’t matter what language your actual application is written in because it’s all making requests over the network anyway? Doesn’t have to care?

Jason Morgan: Yeah, exactly. So the way proxy really works. Kubernetes is this big container orchestrator tool. So you give it app definitions and it goes and schedules them across however many nodes you have in your cluster there. And it has this construct called the sidecar, which is literally, you can take your application container and you can put a motorcycle sidecar beside every instance of your app. And all the traffic just gets routed through the sidecar. So if you want to leave the application or enter the application over the network, you have to go through the sidecar. And so that allows us to take over all your network traffic in an entirely language agnostic way.

Mandi Walls: Interesting. So does that then also help folks who are moving into Kubernetes, does it reduce their lift and shift because they don’t have to actually… Do you actually have to know that you’re in Kubernetes at that point? If you’re just going to hook up a little sidecar that everything’s coming through, does it actually look significantly different from when you had it on another container somewhere, or a VM or whatever?

Jason Morgan: Yeah. The hard thing about moving to Kubernetes, I guess there’s a couple things that could be hard about moving to Kubernetes. Is one, is your app comfortable being started and stopped by the orchestrator in an entirely arbitrary fashion? And you have to answer that. And then, do you have the Kubernetes constructs you require? So Kubernetes, it simplifies things like service discovery, but it does it by saying, “Hey, you have to make a little object that represents your service and we’ll go collect up all your containers that are part of that service. And we’ll attach them to it so that it’s easily found, but you’re still going to need to make the underlying Kubernetes constructs to get going.” What it does do for Kubernetes adopters is, it lets them, as they move their apps to Kubernetes, it gives them a level of visibility that isn’t there in native Kubernetes. As you move your app in, you’ll be able to see, hey, for every transaction, what’s the success rate for a given application or a name space or an individual instance of my application? What’s the success rate? What’s the latency? What’s the request volume that it’s seeing? Things like that.

Mandi Walls: So then are you hooking up say your metrics and monitoring collection directly into that component instead of the native application?

Jason Morgan: Yeah, absolutely. I remember I used to work at a satellite imagery company and I had to beg 30 different app development teams to build a slash metrics endpoint.

Mandi Walls: Yes.

Jason Morgan: I’m like, please, everybody build an endpoint. I know y’all have metrics endpoints and you’re putting… Just everybody do it at slash metrics and try and get some similar info. This is what we’re looking for, vaguely. And the process was ultimately totally futile. I think. Who was that guy, the Sisyphus guy pushing the rock up the hill? I felt a lot like that. And now with a service mesh, you just don’t do it. Don’t tell them to implement anything because they don’t care. They don’t care if you have the metrics that you want. They care, can they figure out? I mean, they being the app team. They’re focused on hitting their numbers, getting their function out or their feature out whatever it is. And you can shift the infrastructure concern of, “Hey, I really care that we all have similar metrics.” You can shift that back to the platform team because I’ll just put a proxy beside your app and I’ll get the metrics from the proxy. Because that’s already configured to surface them for me, which is really nice. Same thing, when you got 30 or a hundred services in your microservice application, seeing how they talk to each other. I’ve had three customers in the last two weeks that have been like, they’ve used our tool and they’ve been like, “Wow, I can see what’s talking to what in the environment. Is there any good way to export this?” I’m like, “Well, you don’t have to. You can just look at the site or take a screenshot or something.” But just showing what’s talking to what in this sort of environment can be really powerful. And so [inaudible 00:11:00], is just surfaced by a service mesh.

Mandi Walls: Awesome. So for folks who are considering this as part of their solution, how do they get started with this?

Jason Morgan: Yeah, if you go to Linkerd dot I-). L-I-N-K-E-R-D dot I-O.

Mandi Walls: We’ll put it in the show notes for folks.

Jason Morgan: We have a getting started guide that’s really handy. I tell people at conferences, “If you can’t get through it in 30 minutes, come up to me. I’ll say sorry, I’ll buy you a drink.” Even on conference wifi it’s a very straightforward process. Just try it. If you’re going to do something Kubernetes, if you have an app running in Kubernetes, try Linkerd in your staging environment. Your dead environment, wherever’s safe, and add an application to it. See what results you get. A lot of folks and myself included. I used to be really worried that adding the service mesh to Kubernetes would require a lot of work. When I was at VMware, we used to recommend to folks, “Hey, let’s get you going with Kubernetes. Then we’ll talk about service mesh once you’re really comfortable in that space.” Well that’s because our thinking was around the STO service mesh, which is very powerful and a great tool, but is also really complicated and has a huge management burden. With Linkerd, it’s easier to run your app in Kubernetes with Linkerd than it was before, which is really the big selling point, in my view. So try our getting started guide. See if you can get your app working. We also have a Slack, it’s really helpful. Come hit us up. And then there’s a ton of videos that we have, which just kind of go into what is Linkerd? How does it work? How do you do some things? The big design choice we made was, any complexity in Linkerd, you have to opt into.

Mandi Walls: Oh, okay.

Jason Morgan: So when you start, you can use Linkerd without changing anything about your Kubernetes environment beyond adding a one line annotation to your deployment saying, “Add the proxy.” That’s all you have to do. When you want more features or functionality from the mesh, you can get them. You then have to opt into new bits of complexity. We just did a talk on how do to use policy, how to restrict who can talk to what on the basis of identity, your Kubernetes identity. And that’s something where you’re going to have to use custom resource definitions and things will get more complicated. It’s not brutal, but it’s more than the basics. You can [inaudible 00:13:22] mutual TLS, your metrics and your better load balancing just by installing it and trying it out.

Mandi Walls: Do you recommend now, that instead of starting with plain vanilla Kubernetes and then adding the service mesh on, that you just start with the service mesh as part of your initial Kubernetes rollout?

Jason Morgan: I think depends on what you’re doing. If you’re doing your own custom applications, yes. If I’m hosting an infrastructure cluster. When I build a cluster to host their registry and the CI tool, some things like that, no, I won’t put on a service mesh because I don’t care. If I’m going to start my app with a helm chart and it’ll either work or it won’t work. I’m not going to troubleshoot it. If there’s a problem, I’m going to turn it off and turn it back, on kind of thing. Well, I’ll delete it and recreate it, if I’m doing a custom web app. Yeah, absolutely. You will just get a tremendous amount of value for very little cost.

Mandi Walls: Are there security concerns or things that folks should be aware of?

Jason Morgan: Yeah. Well, that’s kind of the thing. There are security concerns in running in Kubernetes. By default, you’re going to send stuff plain text around your cluster, which I get it. I get it. You just want it to work. But that’s the default thing. The default says that everything trusts everything inside a Kubernetes cluster. And let’s just go. Send packets around. When you add a service mesh, you get… So mutual TLS. TLS is, I’m a client. I go find a server. I know that server is who it says it is because of its certificate. And I can encrypt my session between it. Mutual TLS is that, except for the server now also checks the client. So now both sides know who they’re talking to. So when you install Linkerd and you add your app to it, every communication runs over a mutual TLS connection. I’d say, because I don’t want to pretend that everything’s great about Linkerd. It’s pretty great. Although, and I’m paid to say it’s pretty great. It’s pretty great. But there are costs to running any service mesh. We talk about this a lot. For every mesh, and you should evaluate this for yourself as you get going, there are… I call them the three taxes of a service mesh. There is the CPU and memory tax. The way a service mesh works is you put a proxy beside your application instance, your individual container. So that proxy is going to cost you CPU and memory. There is a latency tax. The traffic, instead of going from app A to app B, is now going to go app A, proxy A, proxy B, app B. So I get two additional network hops, and that is going to cost me latency. And then the last but not least, operating a service mesh is going to involve some sort of operational overhead for you as a platform team. I generally address platform teams, sorry, devs out there. I think you’re great work. On great apps and all that stuff. But the main folks that are using Linkerd are our platform teams team. You need to understand what is the cost to you of running this thing? Linkerd tries to be the lowest cost service mesh. That’s our basic goal. We wrote our proxy in Rust. We didn’t use the Envoy proxy, which is what a lot of folks do, which is this really big, powerful load balancer that you can do tons of great stuff with and is great-

Mandi Walls: But it’s heavy.

Jason Morgan: We wrote ours in Rust and it’s really lightweight. And it’s purpose built for being in Linkerd. It seems to be the most performance. So when we test it, we’re using an open benchmarking suite. But when we test it, we always see Linkerd being the lowest latency proxy operationally. You can’t beat us in terms of operational simplicity. I’m dealing with a bank later today. I’m going to help them finish installing linkerd all the way through to production. We started 11 days ago. They had an app in four environments and they’re going to have done all four environments in 11 days, through to production. This isn’t [inaudible 00:17:14] book or some super tech company. These are folks that paid another company to develop their app, that are new to Kubernetes, and they’re just working, and it’s just straightforward. People measure Linkerd adoption in days and weeks, not months.

Mandi Walls: It sounds amazing, for everybody out there who’s running or looking at running Kubernetes. One of the recurring questions we have on the show is if you want to debunk a myth? Do you have any favorite or recurring myths that pop up about service mesh or Kubernetes in general that you want to debunk for folks?

Jason Morgan: I don’t know about a myth, but I certainly believed before I started working at Buoyant, and again, paid to say it. But before I started working here, I believed that you needed a team to run your service mesh, one to three folks to run it. And you needed to heavily integrate your service mesh with your pipeline and involve some amount of awareness from the developer side. I don’t see that being true. And I haven’t seen that being true for the last year and a half that I’ve been here. It’s not a requirement to get going. Yeah, that’s really the big one. I don’t know about myths in the service mesh space. You see lots of marketing stuff and service mesh itself is a term I thought was kind of a weird one. Although I now kind of get it. It’s a big web of services talking to each other.

Mandi Walls: That’s my initial impression was with it too, because coming off of some of the enterprise customers I used to work with at a prior job, brand stuff explicitly on a service bus. So, okay. I understand the bus is kind of more of a straight line thing. Okay. The next evolution of that, sure, looks like a mesh. So in my mind it kind of fit that way as it was the evolution of these cranky old service bus architectures that people had put together 10 years ago or so. But that was just my impression of it.

Jason Morgan: I had so many horror stories from working with enterprise service buses. So.

Mandi Walls: That’s a whole other show. Yeah, absolutely. Lots of crazy business happening in enterprise architectures.

Jason Morgan: I’d say my big one is, it’s not as high cost to try out as you think. And your evaluation time. I see folks who are doing service mesh evaluations that last longer than the time to adopt something. The time to adopt Linkerd. Because really no matter who you are, even if your processes, even if your internal change processes are really slow, you don’t need to change your app. You don’t need to change your environment in a significant way to use Linkerd. So you can get going without locking yourself into our service mesh. Because you’re not going to be writing customer resource definitions to make it work. It’s just going to work. For those that don’t know, Kubernetes is cool in a couple ways. One of them is, it’s really good at doing declarative configuration management. You describe what you want in YAML and it makes it happen. And then the other thing it’s really good at is the Kubernetes API, is just this big, dumb rest API. And you can extend it to manage whatever objects you want. And so software vendors now put their own command and control stuff into Kubernetes through what’s called a custom resource definition. So custom resources are extensions to the Kubernetes API that let you control other software. For me, the way I do this, and y’all feel free to do it your own way. But the way I look at how complex something is, I think how many custom resources do I have to use to make this thing work? And the thing I’m proud of with Linkerd is that to make it work, you need zero custom resources. Not one to get the core functionality of it. And I think that’s really neat, which is why you can adopt it fast.

Mandi Walls: Awesome. So tell us a little bit then about… We’ll close up with a little bit about Buoyant. So Linkerd is an open source CNCF project. And then what kind of things does Buoyant add to that?

Jason Morgan: To be clear, we are a CNCF project. We are the only service mesh to hit graduated status with this. So that means we’ve hit the top tier of maturity, according to the CNCF. And alongside projects like Istio or Open Service Mesh, we’re the only one at graduate status. Now, caveat here, I’m doing a little bit of marketing stuff because Istio only joined earlier this year. They haven’t had time to hit graduate status, but for us it was a big deal. And we had graduate status around this time last year. Buoyant the company, we make a tool called Buoyant Cloud, which basically allows you… Imagine you did want to do Linkerd And you want it to go fast. And you’re like, “All right, this is great, but I really don’t want to figure out how to monitor it, how to maintain it, how to do any of the core operational functionality in it.” You can install our agent from Buoyant Cloud into your cluster there. And then we will handle things like installing and upgrading Linkerd. We’ll handle rotating your certificates. We’ll collect all your metrics for you. So you don’t need to worry about a metric store. We do other. Things we’ll alert on it. You don’t have to learn, what are the failure conditions on this project around this tool? Because we know all the failure conditions and they’re actually written as rules in our tool that will alert you if something goes wrong. We’ll do some… I’d say about 80% to 90% of the operations and maintenance of Linkerd, for you.

Mandi Walls: Cool.

Jason Morgan: That’s our big value add there, plus we make Linkerd.

Mandi Walls: Awesome. All right. Is there anything else you’d like to share with folks today? We’ve covered a lot, from the basics of service mesh up through everything. Is there anything else you’d like to add or share?

Jason Morgan: No, I guess I just ask, if this was interesting, if you liked it, if you have any thoughts or anything you want to say, come join us on Slack. Slack dot Linkerd dot I-O. I’d love to hear from you and just see where we can fit into what you’re doing.

Mandi Walls: Yeah. Sounds like an awesome tool for folks out there who are looking at Kubernetes. And we’ll add all the links that we talked about into the show notes for folks. So if you are listening in your favorite pod catcher, they’ll be in the About page and they’ll be on the website for folks listening online. Well, this has been great, Jason, thank you so much for all the info. I hope this helps folks out there who are considering Kubernetes and service mesh for their projects.

Jason Morgan: And it’s been great from my end. It’s just totally delightful talking to you, and I appreciate you having me on it.

Mandi Walls: Awesome. Thank you. So we’ll wrap up there for folks and we’ll wish everyone an uneventful day. That does it for another installment of Page it to the Limit. We’d like to thank our sponsor, PagerDuty for making this podcast possible. Remember to subscribe to this podcast if you like what you’ve heard. You can find our show notes at Page it to the Limit dot com and you can reach us on Twitter at Page it to the Limit using the number two. Thank you so much for joining us and remember, uneventful days are beautiful days.

Service Mesh With Jason Morgan

Transcript

Show Notes

Additional Resources

Guests

Jason Morgan

Hosts

Mandi Walls (she/her)