Internal Developer Platforms with Dave Bresci

Transcript

Mandi Walls: Welcome to Page It to The Limit, a podcast where we explore what it takes to run software in production successfully. We cover leading practices used in the software industry to improve the system reliability and the lives of the people supporting those systems. I’m your host, Mandy Walls. Find me @lnxchk on Twitter.

Mandi Walls: Awesome. Thanks everybody. Welcome back to Page It to the Limit. Today I’ve got two guests, one of them you’re familiar with, so we’ll start out with the other one. Our first appearance of Dave Bresci here on the podcast. Dave, tell the folks who you are and what you do here at PagerDuty.

Dave Bresci: Thanks for introducing me. Mandy. I’m Dave Bresci. I manage a couple of the SRE teams here in the infrastructure group. Specifically I think for today’s topic, I manage the team that’s responsible for developer enablement here at PD. I’ve been at PagerDuty for about six years.

Mandi Walls: Awesome. Our other guest today, Tiago, is on the other side of the table and we’re going to be talking about internal developer portals today. So let’s start out with what’s the definition? What are these things? What do they do for people?

Tiago Barbosa: Yeah, sure. I can probably try to give my personal perspective on that because if you go online, there’s a lot of different perspectives on internal developer portals and some people really love it, others not as much. So the idea of the internal developer portal is basically to simplify the way that you interact and get visibility on your services. It is basically to improve the developer experience and how people interact with the services that the developers are building. And one thing that we need to take into consideration is that this is not a tool that is used exclusively by developers. So you might have multiple stakeholders from the product side or people that are just kind of worried about how the service is behaving or just want to get more information about the service itself. And you don’t want to reach out to every single developer out there to understand it. It needs to be self-service, be simple to use, and that’s I guess one of the main advantages I see in the internal developer portals,

Mandi Walls: Dave, is that how we’re using them here at PagerDuty.

Dave Bresci: I might approach the question from a different angle from Tiago, which is what I find valuable about the developer portals specifically. One of the reasons why we wanted to implement one here at PagerDuty is I want to minimize and reduce the number, the content switching and the number of places that developers have to go to get the information that they need. To me, that’s the beauty in the allure of the developer portal is that basically how much can we shove in one place? That way people don’t have to have 12 tabs open to do different things. And part of the challenge there is you’re trying to bring out maybe sometimes disparate things together in one place. So cohesion can sometimes be a challenge, but I think with a lot of the new, it’s very exciting because if you look at where we are today versus a few years ago in terms of the products that are out there and the companies that are working on things like this and the fact that developer portals now are maybe not ubiquitous, but a lot of people know what they are and why they’re valuable. I mean for us, we’re using backstage internally, so it’s like how many things can we put into Backstage to make it so the developers can be productive and get all their source information from the accurate, hopefully source information from this one spot. And that’s to me why they’re neat and what the potential is of something like this.

Mandi Walls: So these are meant to be more than sort of the traditional configuration management database that if you’re of a certain age, you might remember those from back in the day and we’re looking more for information about all the kind of stuff that might have impact your services. Is that sort of where we’re headed with these? They seem more complex than what we might’ve had as a system of record in the past.

Tiago Barbosa: I think the information that we are trying to expose with tools like Backstage and others might be slightly different than we were trying to do in the past because independent of whether you’re building microservices or not, but we are in fact in the era of microservices, so our distributed systems are very complex, so there are a lot of dependencies that basically might impact the availability of our services, and it’s not only the availability of services, it is basically what they’ve mentioned earlier, which is the context switching and the fact that historically we always had a lot of different data sources and it’s very challenging for developers to know all of them and to know where the information is. And so looking into different locations to find that information is very well. It takes a lot of time if you look at the time that it takes to onboard new developers. If you take a look into the time that it takes to solve an issue that you are having, if you need to search for information that you don’t know exactly where it is, that takes a toll on the team in the end. And that’s something I think is really important. Tools like Backstage and others also documentation, having documentation as part of your code and just easily expose it on the IDP as part of your service. It’s very powerful.

Dave Bresci: I want to add to that too because Mandy, one thing you mentioned earlier is configuration management, which sort of plays into it, which is a push, but there’s also a push and pull, right? If I’m thinking about a user journey in a way is if I can go into Backstage, I have my developer docs that tell me how I should build a service. I have the scaffolder that actually has a templated list of things that I can enter to actually create the service, create the repo in GitHub or in create the Terraform if necessary. All these things for the default service and then eventually what you could have, which we don’t have yet today, but it’s the health checks that tell you how your services are scoring against a certain type of criteria that you have. Are you the latest version of this? Is it secure? Is it whatever definitions of well architected services that you want to have for your service? You can take the full lifecycle to go so your developers only sitting in one place to get all these things so they can know from documentation to creation through basically health checks of the services themselves to make sure that they’re properly configured. That’s really powerful stuff to keep in one place. Now there’s other things that these developer portals offer too. I mean there’s things like you can run developer surveys out of them, you can do, so that’s what I meant before about aggregating maybe disparate things into one place, but one of the most important things is that particular, how do I create a new service and how do I make sure the service is healthy? Where’s my reference stuff and I can just go to one place to see all that information. And of course we have the PagerDuty plugin to tell us whether the service is healthy as well. There’s a lot of information you can see with the plugin systems. You can get information from all these different sources to put into this one place. And that’s again, if I want to simplify it, it’s like how much information can you have in one place for a service owner to be able to properly and easily manage their services?

Mandi Walls: So Dave, what does that lifecycle look like? Your team owns the one that we run internally, the backstage that we have for our developers. How do you go about adding what plugins might be needed or deciding what things should look like for the other developers that are going to come in as consumers of that service? The IDP itself.

Dave Bresci: When we started out, I think if you go to the backstage GitHub repo, you can see in order who the adopters were. I think we were one of the first 50, and I have to thank actually the next engineer, Mark Shaw, who used to work here, and it came in during hack week. So his thought was during a hack week was, Hey, this thing called Backstage, it’s really cool and we can use it. Why don’t we use it to help onboard our backend service skeleton? So the initial use case and the one that was the most attractive to us was can we put our service skeletons into a developer portal to make it easier for service owners to build new services? So we started with the backend service, we did it during hack week, and then we saw the value of it immediately and we’re like, we got to actually start owning this as a team. We’re going to bring backstage in as an own component from our team, and slowly but surely we started adding things in front end skeleton, python, skeleton, et cetera. So that’s all the skeleton and scaffolding. And then we were like, okay, we need the developer docs in there too, because our developer docs were in Wiki pages all over the place. And actually one of the cool things in Backstage is there’s a plugin for Confluence, so you can curate the pages that you want from Confluence, so you can have the developer docs within Backstage that you write that are in GitHub and then get displayed in Backstage, but when you do a search, you can search across those and Confluence if you have certain tags on your pages so we can curate which pages come up through the backstage search. So we have developer docs in there. Then we started building things like using fact retrievers, which is does this service have a canary enabled? Does this service have Fast Rollbacks enabled? We use it for migration tracking right now. So we’re in the middle of a Terraform upgrade actually almost done. We’re using some dashboards to tell us which versions all of our production applications, which versions of Terraform they’re using so we can track how far along we are on the migration. So it’s organically grown over the last few years. I want to spend a little more time this year making things more intentional and working more on that scorecarding that we were talking about earlier. I think that’s going to be really important for us. We’ve been largely internal customer focus. We’ve listened to folks with what they wanted. I mean we use a PagerDuty plugin for services. A lot of stuff is self-service for the service owners in terms of what they use within Backstage, but that was actually a request that was like, Hey, can we get the PagerDuty plugin enabled? Some teams use Argo CD plugin because we use that for Kubernetes deploys, and really a lot of it depends on, there’s some stuff that’s centrally deployed by us and there’s other things that are pulled from the teams, and I like the flexibility of that too, so that teams can still have some self-determination in terms of what they use.

Mandi Walls: Yeah, absolutely. And you’re mentioning the flexibility there and the plugin structure leads me sort of into my next question for Thiago about that PagerDuty plugin that we’ve now adopted. It was one that the Spotify folks had been, they had owned it and then we’ve taken it over as of the fall. So Tiago, you’re primarily responsible for shepherding that through its lifecycle. What does that look like? What does a plugin present for the users and when you’re working on it, what does that do then for the consumers of the IDP?

Tiago Barbosa: The plugin can be like a front-end plugin or backend plugin or both basically. And the idea is that this is the integration that you have, so it’s basically how you extend backstage to use or leverage other services. And our PagerDuty plugin is exactly that. So basically has a front end components that basically allows you to have something in the UI of your services at this point in time that basically allows you to see use on call for that service. The incidents that you have that are still basically that have been triggered, you can see change events, you can create incidents from there. And so basically it’s how you integrate with external services from Backstage and how you basically bring these capacity that Dave was mentioning of augmenting the experience for developers while using Backstage

Mandi Walls: And what’s that written in, if we have anybody out there who’s interested in helping out with that project.

Tiago Barbosa: So the front end side is basically React and the backend is TypeScript. And so we do have these two plugins and basically the front end plugin kind of exposes the information, but the backend plugin is now responsible for basically to doing the calls to our REST APIs and also allows you to use the scaffolder that Dave was mentioning earlier. So you can actually in just a single click, create your own service using a template, but at the same time register a service in PagerDuty and create the integration automatically in Backstage. So you can use the template, basically create your code, you can start writing your code from there, but also have the PagerDuty card configured automatically for you

Mandi Walls: Your application developers are interacting with. This is the sort of goal then to have them always in this single pane of glass and sort of never reach out or need to go to the other services that they might be using, whether it’s their monitoring or observability or logging or whatever those things are. They always going to be sort of first in the IDP for the things that they need for their application lifecycle.

Tiago Barbosa: From my perspective, there’s only so much that you can do from backstage side breaking up the experience or basically, let’s take an example of PagerDuty. I don’t think Backstage would be the right place to set up escalation policies and oncall schedules and all that, but it’s the perfect place to see who’s on call, right? It’s the perfect place to see. The incidence is also the perfect place to have some information about how the services behaving. So having some analytics capabilities, one thing that we are working on right now is actually to bring more cards that will allow you to have not only visibility from a service perspective, which is the only thing that is available today, but also have that visibility from a team perspective. So there’s a lot that I believe that we can bring from PagerDuty sites into Backstage, but there’s certain tasks that will, in my opinion, will always need to be done in the external service. I would say, Dave, I don’t know if you have a different opinion on that one.

Dave Bresci: I totally agree. I actually think you have to be really selective about what you decide to put in there because it could be very easily overwhelming to have too much in and then it becomes noise. The balancing act between trying to figure out what’s helpful and what’s noise is tricky. Right now we’re on the low end. We can put more information in at least us internally. I feel like we can put more stuff in there that would be helpful for folks, but I can also see it get to the point like, oh, here’s your monitoring tool plugin. Here’s your logging tool plugin, here’s whatever. There’s no way to see all that in one shot and it becomes counterproductive. So probably you need to limit the use cases to whatever makes sense for your business or for your culture or your engineering environment. And different companies may have different things that are important them or that they want to highlight within their own developer portal, and it actually might change over time too at a certain time. If you’re struggling operationally, maybe you want to focus on all your monitoring stuff in one spot or if you’re fine, but you want to focus on creating new services, you’re creating a lot of new services and maybe that’s what you want to focus on, but you got to be selective and Tiago is right. Just give people the ability to blow out from there. Maybe you can put a lot of stuff in one grid and then just say, click it’s red, click here and then go somewhere, go to another portal to actually see the details of what’s going on. But yeah, it’s tricky.

Mandi Walls: Yeah, it does seem like there could be a lot here for decision makers. Like you say, we’re focusing on health and reliability of certain aspects of the services and drilling down in those sorts of components versus just kind of spray and pray that everybody’s going to follow the directions the way they’re supposed to for every plugin it’s there. Yeah, it’s interesting. Diago, you talked to some of our customers and users, what do you see them add to their backstage when you talk to them?

Tiago Barbosa: So of course there’s a lot of the integrations with GitHub and Azure DevOps and all this stuff because well, you need to host your projects somewhere. I think the scaffolder is one of the things that people when they start is one of the things that they invest more time in. But then in terms of integrations, it depends. A lot depends on the stack that you are using. From my conversations, of course, they are basically on the context of our plugin, so they are using PagerDuty and so our conversations go from that perspective, but it really depends on the current stack that they have. Most customers that I’ve been interacting with, they use Kubernetes, so they leverage the Kubernetes integration as well. They also have something from authentication point of view, which also depends a lot, but no configuration of Backstage that I’m seeing is using the same plugins, which is good to see as well.

Mandi Walls: Yeah, absolutely. Every stack’s going to be pretty individual, so yeah, that follows. So Pager has been using Backstage for a while now that there are more commercial offerings I guess you could say in this space. Dave, what do you feel like you get from Backstage if you can share that with us that you feel like is the sort of key components that you really like there?

Dave Bresci: Well, first off, I want to say I’m happy that there’s more commercial offerings because it shows it’s sort of a proof that there’s value here and that there’s the need. I’ve looked at things like Cortex and Port and Compass and I wouldn’t call ops level necessarily a developer portal, but there’s a lot of cool ops level does some cool stuff. They actually have a backstage plugin as well. It’s a tricky question like why are we on Backstage? And I like, well, I like that. I like that Backstage is open source. Also, migration is a pain in the butt. So you have to figure out what’s the value that I’m going to get from migrating to another solution. I like Backstage extensibility in terms of the has a lot of plugins and a lot of people are developing new things for it and their adoption is going crazy. I was at Backstage Con, it was the first day of CubeCon and I’ve been talking to some folks over there and of course because we use Backstage now, I know Backstage better than these other solutions, but their adoption has gone. You see that chart going up into the right, there’s way steep. So there’s definitely momentum in terms of market leadership I guess we would say. So there’s advantages there. And we were a launch partner for the marketplace. We have a good relationship with them, so there’s good reasons to be there. I mean the other commercial offerings, actually they’re pretty nice if you actually go through and look at what some of these other folks have to offer. There are, I’m not going to say one’s better necessarily than the other. It just so happens that Backstage was the one that was open source when we started. It was the one that was there and we were like, Hey, let’s adopt this thing. And I like where it’s gone and I like the things that we can do with it. And I hope I’ve given you a sufficiently neutral answer

Mandi Walls: And now I’m going to poke that bear a little bit too. Of the things that folks might have heard about Backstage is that yeah, it’s this cool thing that’s going to help developers, but it also is kind of fiddly to try and run and put the resources behind to actually create your own instance and deal with it and maintain it as a sort. It’s a production level dependency at that point where you’ve got all that stuff there and putting that investment into the thing. And I think some folks that I’ve talked to just on a surface level are like, we’re not sure yet and it’s worth that investment.

Dave Bresci: This is one of the things that actually comes up on my team as a challenge. So not only do we have to have our own investment in it, but Backstage is written in TypeScript and most infrastructure engineers are not good at TypeScript. So to Spotify’s credit, one of the things that they mentioned at CubeCon is they’re coming out with this new Quick start or declarative integration that’s going to be mostly YAML based, which is good because that’s more in line with what infrastructure engineers are familiar with. It’s still running your own thing. So roadie is a company that’s out there that does manage Backstage, so that’s an option for folks who like backstage but maybe don’t want to run their own. I’m hoping maybe with Quickstart something makes it easier to manage it yourself, but yeah, it’s very true when you’re doing a calculation about what the cost of maintenance is and cost of migration is, we haven’t invested as much into the user experience or maybe into the plugin architecture or of Backstage precisely because in order to do that, I have to weigh it against the other items that are on our roadmap for my team to do because we have to do the work ourselves versus if you were going with a managed solution, you’d be like, use all the things, we do it for you. It’s a build versus buy question is eternal. I want to spend some time this year clear up some roadmap time for the team to actually invest and see what we can do with Backstage with some dedicated time to improving the experience, adding more capabilities to it, and then I feel like we’ll have a better idea of what its value versus comparison with other solutions. I mean, the other thing is other solutions cost money, so you have to balance that out as well. So there’s no such thing as a free ride, I guess. But in terms of the capabilities for what we have for here in PagerDuty, the key critical user journeys that we wanted to cover are the scaffolding, the developer documentation, some of the fact checking tech insights. That stuff is covered now. It’s just there’s a lot more we can do and especially the scorecard stuff for me that is going to be, if we can get that out there and start showing service owners, here’s the health of your service. Maybe you’ve got the old dependencies, maybe you’re running an old version of something, maybe you don’t have your canaries configured properly, things like that and be able to show them, here’s something you need to take advantage of or take a look at, that is super powerful. So whatever the solution is, that to me is the most powerful thing inside a developer portal that I’m hoping we can get to this year.

Mandi Walls: Yeah. Do you feel like Backstage has played a role, we’ve talked on the show in the past. I think maybe with Rich about the journey of devolving, the monolith and some of the things that have gone along there into more flexible microservices and that comes with all that additional complexity. And then do you feel like Backstage fits into that process to deal with that part of the journey of creating many more services and helping all these folks go off in their own way to put their stuff together?

Dave Bresci: Having things like a scaffold are in there where you’re be able to create new services and being able to define your golden path and having new services be created using the latest and greatest standards that you have is really powerful for new service creations. I think that’s really made a difference for us. The part that hasn’t addressed is existing services, which is why the health status of things is so key because the majority of your services are existing services. Even now, we’ve had backstage for three years. I don’t know what percentage of services have been created in the last three years that we’ve had it running, but it’s less than 50% now. To be fair, we’ve also undergone a migration from container scheduler onto Kubernetes, and we had the Kubernetes Lifecycle docs within Backstage, and so that actually was really powerful and made things a lot easier for service owners to make that migration. So maybe things are a bit closer to the golden path than we thought than they were before, but it’s still tough. If you focus on new services, you’re not going to get the one, the majority of the services that are existing, and that’s the trickiest bit. So the more microservices that you have, the more complicated things get. I don’t think there’s any way around it. And the more tools you have at your disposal to try to make sure that you can try to maintain some sort of standardization or some sort of overall health, I’ll take it. There’s no perfect tool out there. If I can get incrementally better, that’s better than staying still.

Mandi Walls: Awesome. Yeah, totally get that. As we finish up, one of the questions I like to ask, is there a myth you’d like to bust about these, especially Togo as talked to some of our folks in our community, or is there a pet peeve about IDPs that we can share with folks?

Tiago Barbosa: One thing that I’ve seen in my conversations with different customers is that usually I say two types of approaches to IDPs. One is those customers in those teams that really feel like the IDPs are the solution to every problem they have. And then you have the others that believe that the IDPs are not solving anything and they are not totally against it because they are using it, but they don’t feel like it’s something that is going to bring a lot of value. And essentially, if we are talking about Backstage, they need to maintain it, they need to upgrade the new versions. It’s constantly evolving because it’s open source and the team is very active and the community is very active as well. So everything’s changing very often. So IDPs are not going to solve all problems, but they have their purpose and they solve a lot of challenges that currently a lot of teams have. And we mentioned a bunch of them already.

Mandi Walls: Awesome. Dave, anything to add there?

Dave Bresci: Our main pet peeve is TypeScript, so…

Mandi Walls: Of course, yes, could have been there. That was

Dave Bresci: The worst. The team, they were like, I don’t want to work on Backstage because I got to learn how to do another language and stuff. But hopefully if this gets addressed in some way, that could make things a lot better for us. Just making it easier for the people who are actually running and operating the service, and hopefully that makes a difference, but we’ll see.

Mandi Walls: Yeah. Definitely. Or they could just rewrite it in Go or Rust or something.

Dave Bresci: Let’s start with Rust.

Mandi Walls: Yeah. Awesome. So is there anything else before we leave that we haven’t talked about that we should mention for folks, any bug bears or anything else out there that we haven’t covered?

Tiago Barbosa: We just announced a new version of the plugin that actually supports Scope OAuth, which from a security perspective is the way that we recommend people to use REST APIs. And every other week we have been releasing new versions. There’s a lot coming…new cards. We are looking into a problem that they’ve actually mentioned today during the show, which is bringing existing services into Backstage and enabling the integration with PagerDuty. So there’s a lot coming in the next few months.

Dave Bresci: I’m just going to plug Tiago, he is been doing an awesome job. He is only had ownership of the service since the fall. And how many version upgrades, how many releases have you done, how many UIs? On the flight back from KubeCon, he sent me a picture of a drawing that he had for how the UI was going to look, and I’m like, this is going to be awesome. So thank you for doing that and thank you for working on that. It’s been great.

Tiago Barbosa: Yeah, thank you.

Mandi Walls: Yeah, it’s pretty awesome. All the notifications from NPM come into our community team email, and some days I’ll get in the morning and I’m on the East Coast Dagos in Lisbon, and there’ll be two dozen notifications in there. What are you doing, man? He’s going wild. So I’ll link in the show notes for folks where you can find all of this stuff. If you’re running backstage and you want to plug it into your PagerDuty instance, we’ll have all that stuff in there. If you know TypeScript and you want to help out, Open Source is always looking for friends, so we’ll add that in as well. So yeah, thanks very much guys. This has been a great discussion. I learned a bunch. I haven’t seen a lot of these things yet, so super exciting. And yeah, out there, if you are using the PagerDuty plugin for Backstage and you want to get in touch with us, we’re always interested in what you’re up to and Tiago would love to talk to you about what you’re doing and more ideas that you might have. And we’re always community-team@pagerduty.com. So that’s our show for this week. Thank you guys for coming on. We wish everyone else out there an uneventful day and we’ll talk to you again in a couple of weeks.

Mandi Walls: That does it for another installment of Pager to the Limit. We’d like to thank our sponsor, PagerDuty for making this podcast possible. Remember to subscribe to this podcast. If you like what you’ve heard, you can find our show notes at pageittothelimit.com and you can reach us on Twitter at page it to the limit using the number two (@pageit2thelimit). Thank you so much for joining us, and remember, uneventful days are beautiful days.

Internal Developer Platforms With Dave Bresci

Transcript

Show Notes

Additional Resources

Guests

Senior SRE Manager (he/him)

Hosts

Mandi Walls (she/her)

Tiago Barbosa (he/him)