Dependency Security with Liran Tal

Transcript

Quintessence Anx: Welcome to Page It to the Limit, a podcast where we explore what it takes to run software and production successfully. We cover leading practices used in the software industry to improve both system reliability and the lives of the people supporting those systems. I’m your host, Quintessence, or @QuintessenceAnx on Twitter.

Quintessence Anx: Today, we’re going to talk about dependency security. Many of us rely on open- and closed-source dependencies to make our application system services function. But how do you avoid accidentally introducing security issues with these resources, and what are potential risk if a vulnerability is introduced? Today we’re joined by Liran Tal, developer advocate at Snyk. He is a JSHeroes ambassador and a member of the Node.js Security Working Group. He is also the author of Essential Node.js Security, and a core contributor to OWASP NodeGoat Project. Liran, welcome to the show.

Liran Tal: Thank you for having me. Happy to be here.

Quintessence Anx: Awesome. We’re so excited. To get us started, can you tell us a little bit about your path to learning about dependency security?

Liran Tal: Yeah, sure. So I think getting started with that is probably hardly imagining today anyone building any open-source projects, or any projects in particular, without being dependent on open-source. I think this is primarily a shift that I at least can attest for myself, from maybe 10 or 15 years ago doing web development where if we needed to build something for the web, we’d just roll our own version of jQuery, and some plugins around it, and so on. It was doing a lot of vanilla JavaScript to build your things.

Liran Tal: But today I think it’s very hard to just have that mindset of, “Let’s build something,” because we have just an abundant years of open-source projects and packages that help us build our products. So this is basically just a little bit away from an npm package that would help you build something. So just that proper process, I think. People go into building react-maybe project, they’ll go and add React Router, and then Redox, and then redux-funk, and so on. And then pile up dependency after dependency to get a common functionality just going on. So I think the path into dependency security is, first of all, understanding that today we are very much dependent on either open-source maintainers and developers kind of helping us build our products.

Quintessence Anx: All right. Awesome. Thank you for all of that. Kind of hopping right out from in there, what’s the most common myth or misconception that you find yourself answering about dependency security? Common mistakes, that sort of thing.

Liran Tal: Probably one of the common one is that using a popular package is something that people don’t exactly understand the concerns of using it. Just the fact that it’s very popular is something that they depend on. Except that that’s always sometimes, perhaps not always, but sometimes it’s kind of like there are some issues there, and we can attest to some of them.

Liran Tal: So we’ll probably talk a little bit later on some of those packages. Maybe Lodash. Maybe EventStream, the security incident. Even React, and then some other packages that you might be using, but are not aware that some of them to a certain degree had some security issues or some production issues related to how those dependencies were managed in a way. So this is something that is a popular [inaudible 00:03:31] around just using a popular package.

Quintessence Anx: Oh, that makes sense. That people forget that they … Or maybe it’s better to say that they commonly think that if something’s popular, that someone must be checking it somewhere. But that might not be a valid assumption to be making.

Liran Tal: Right. Right. We can give some examples, right? Just understanding what’s the concept here that we’re talking about. So Lodash, for example, is a very good, stable project to a sense. It’s kind of like a key ecosystem project in terms of the amount of dependencies that are using Lodash for the JavaScript for the web ecosystem. But there’s just a single maintainer. John Dalton is maintaining that project. He’s a very good person, but he has a life of his own. So when security issues happen and when people kind of knock on that door, on the GitHub repository ratio, that’s basically asking for a lot from a person who is basically volunteering his time to build this.

Liran Tal: So we’ve had some of those security incidents happening, where a security vulnerability was impacting Lodash, but there was no security fix in time. So from the Snyk perspective on that, we had worked with John on listing a security fix for this, and we have provided the patches for that, but it took time until that release was rolled out. But at that time, the vulnerability was already public. So essentially there was a vulnerability report going on, but no actual security fix for people to upgrade to.

Liran Tal: So this is kind of the classic problem with open source, where there is a limited support in a sense. Kind of like you rely on all of those peoples and strangers to help you out, but it’s not always a done deal. So this is a very prime examples of just us relying on open source.

Quintessence Anx: Oh, wow. That makes sense. To kind of circle back to something you mentioned before about deadlines and the active security report, can you talk a little bit more in-depth about what kind of deadline, or what deadline means in this context for when you want to get a security fix out, aside from as soon as you can?

Liran Tal: Yeah. Yeah. For sure. So, it’s interesting. You’re pointing. That’s a kind of specific question, because managing security, both from a person consuming a software, managing the security fixes for which you should upgrade to, as well as from the maintainer side, is not the very straightforward kind of a process. I’ll give those two perspectives. So first of all, if you are a maintainer, and you maintain several maybe packages and dependencies that other people use, and assuming there was an actual disclosure process. A very responsible one where everything was laid out to you and the entire vulnerability and security issues were managed in a private manner. So this is now a private issue, and only you know about it, and the triagers, and so on.

Liran Tal: At that point, you want to release a security fix. But this is the catchy part. Right? If you actually release that in a non-SemVer, kind of friendly way, that is actually not a security fix that is easy to consume. We’ve seen some examples of that, where some maintainers were actually … For example, there was a vulnerability impacting the 2.X branch of a version, and the security fix was only released in a 3.X. So that is a SemVer major upgrade, and developers, to that different perspective, the other side of the spectrum here. When they want to consume a package upgrade, they would probably want to have the most minimal overhead, and risk, and concerns when they do that. And when there’s a jump between a two and a third version, that usually means … semantically that communicates maybe a breaking change, maybe API changes. So they would be very weary of just upgrading it, and they will need to add more testing and sanity to make sure that that is okay.

Liran Tal: Except if it’s a security fix, you actually, as a maintainer and as a user, you want to roll that out as fast as you can to reduce whatever security risk lies there. So that’s, for example, a problem where we’ve got to really understand there’s not enough standardization, and there’s not enough knowledge around the ecosystem to know how to actually just manage security releases or how to consume them properly.

Quintessence Anx: Before we get into our main question, when you’re talking about consuming the releases properly, can you just give us a quick primer, I guess, on what that would look like, really, for someone who needs to do it effectively but maybe isn’t as familiar?

Liran Tal: So I think consuming properly is probably a very long, I think, like a big picture of understanding all of the dependencies that you have, and understanding that when you consume a package, you’re probably consuming depending on the ecosystem for your packages. So, it’s naturally about making consumption smarter, or in a different way. You’d actually end up taking in whatever version you need, if you need a specific version. But I think it’s more about understanding and awareness of what you’re pulling into your project.

Liran Tal: So if you are, for example, going to install a Java package, we’ll take the Spring package for a web server or application framework, you’d be pulling in something like a few, maybe a dozen kind of dependencies. But if you were going to pull in an Express app, you’d be pulling in something like 48 dependencies all in all. So, that’s a lot. You need to understand, I don’t want to call it the attack surface, but you need to understand the software surface of all that you’re pulling into your project. Then this is all of your dependency mapped out. Right? This is 48 packages that you’re pulling in maybe with an Express install, so maybe there are 48 different people that maintain them that you are now dependent upon. Do you know exactly what kind of security practices they have undergone, both to preserve the package in a good and healthy state around security, as well as to their own account? That they did not be a victim of an account takeover, and so on.

Liran Tal: So there’s a lot of more risk, kind of like added surface to this problem, but it’s sometime it’s even more than that. Now that you have all of those packages … And the Express, by the way, example. It actually is a very good one towards the Express project, because most of those dependencies are actually managed by the Express team. And that’s a good thing. But then what you actually end up having is, now you end up having dependency on some of those packages, or maybe others. But then, how do you know when to upgrade, to which versions you should upgrade?

Liran Tal: There’s a lot of, I would say, concerns in terms of understanding how to manage dependencies in general that we’re seeing. We’re seeing things like bots in the ecosystem that help you do that. So I think kind of nailing how to do that in a proper way in terms of that you do not end up the whole day just merging pull requests to upgrade your dependencies, but rather you’re lowering the signal-to-noise ratio. That’s what we want to get into. Giving you the smart updates in that sense.

Quintessence Anx: That makes a whole lot of sense. I know you mentioned about pulling in a bunch of upgrades and mapping out the dependencies. Are there any other challenges you wanted to highlight for managing the dependencies?

Liran Tal: So I think SemVers are a good one. I think understanding that you probably want to pull in the most minimal SemVer versions. But I think also, there’s a lot into dependency management, if we want to go into that kind of area.

Quintessence Anx: Fair enough. So how about instead of that, or in addition to that, can you tell us a little more about what you think about attacker personas, specifically from the mindset of people who are trying to learn about who they’re protecting against? And it’s not always a malicious attacker, correct?

Liran Tal: True. Yeah. It’s not always, but let’s think about some of the problems that do come from that kind of perspective. Just to understand, I think, what are the challenges in managing dependencies for our project. So if we’re talking about how do you manage those, LOCK files or a very common way to pin the dependencies for specific versions. So you have reproducible builds, and you can share the builds across your teammates, and so on. That’s not really a new concept, but there’s an interesting attack vector in that sense, which I’ve gone into in our blog a few months back. And it seems pretty interesting in terms of how the community engaged with that.

Liran Tal: So a lot of people, you’d have just spaces and tabs, and I don’t know, semi-colons and non-semi-colons in JavaScript. You’d have these two camps where people would believe that LOCK files are actually helpful for a project, and those who wouldn’t want to dismiss LOCK files, and just install without a LOCK file, which basically means you are getting the same experience that a user is getting. So that’s how they kind of like promote why LOCK files shouldn’t be there. But assuming you are using it, and there are some cases where the two camps kind of agree. Where that’s usually an application and not just a specific library that you manage.

Liran Tal: So for most users, when they manage an application, they use a LOCK file to reproduce the builds. But there’s an interesting use case. So LOCK falls are a way for us, like we said before, to pin the dependencies. They’re like a machine-generated sort of file. A YAML format, the JSON format, whatever you want. It kind of helps us, or helps the NPM, the client, the package manager understand where to pull data in. Except, it’s a machine-generated thing. So when you create a pull request that says, “I want to upgrade a specific version of a package,” or maybe you added a new package, or maybe you removed it, you’re probably going to push in another file to that full request. Which is both the package manifest, but then the LOCK file that says, “This is the actual now state of the package dependencies.”

Liran Tal: Now what happens is, try to imagine that pull request in your mind. You see two files. Except when the files are actually pretty big, and that’s what happens with LOCK files, because there are so many changes and it’s really machine-generated. What happens is GitHub by default, for example, will go ahead collapse all of the data. So you actually just see the LOCK file there as a name, but you don’t really see the data. And if you would go in and actually expand it, you’ll see now everything.

Liran Tal: So this experiment that I did was actually to put in a pull request for a project. What I did is, no one actually inspect the machine-generated LOCK file. So I changed it from one dependency of MS, a dependency that is now downloaded probably millions of time a week off the NPM repository. Then I just updated so that the source of that MS package is actually downloaded from my own GitHub fork of that project. So at that point, I control the entire MS library. And if you’re just merging the pull request, you’re not taking a look at what is going on in the LOCK file, you don’t have any security policies in place specifically about the LOCK file, what will happen when you do the next NPM install, or your teammate or someone else? What will happen is you’ll pull in that malicious, seemingly malicious package of MS, which I’m now in control of. You may not even understand this until you actually go ahead and inspect the sources for all of those dependency tarballs and so on.

Liran Tal: So, that’s a pretty legitimate vector of attack. If someone wanted to push in something into your machine without you knowing it, and without they needing to have any control over your machine, or even over the actual package. Just kind of hiding it in a LOCK file.

Quintessence Anx: That’s both impressive and distressing, and I’m sure at least some of our listeners would agree. That actually brings me to another point. So people wouldn’t necessarily know or think to check that the source of the package hasn’t been altered in a particular way. Are there any other tips that people can use to get started? Again, common mistakes, or things that people don’t think to check for that they really need to be checking for.

Liran Tal: Yeah. So, I mean, I think that’s definitely now understanding there are tools to linked LOCK files and know those kind of issues. But are then probably more things that you should probably be aware of in terms of how you consume the packages. I want to actually take you further down that rabbit hole of understanding what are those security concerns, and even production concerns when you’re using a dependency.

Liran Tal: So before we even roll into those tips, let’s assume that you’re dependent on a package, and there’s this, it’s an academic research paper but I’m going to take maybe the security perspective of it, that was released a year back. It was going and comparing PyPI from the Python ecosystem, and then comparing NPM one against the other, and how many packages could be considered abandoned in both of them. And when you install, for example, a package for one of those kind of ecosystems, then how many packages are you also pulling in?

Liran Tal: So let’s take that example of that security paper, that academic paper actually, showing us that when you install a package on NPM, you get by default, well, by average, this 4., I think, 17. So it’s about four levels deeps of dependencies on an average install of an NPM package. But the thing is, think about what happens towards production. So there’s this Package A, which depends on Package B, and then on C, and then on D. And most cases, in most situations, is they will not come from the same person. It’s just people gluing stuff over each others’ code. And that’s okay. That’s the world of open source.

Liran Tal: Except what happens when that Package D breaks? Right? When this is not a direct dependency for anyone, so it’s like a bit under the radar. But when it breaks, it hurts because then the entire build process kind of break. And the thing is, it can break for a lot of reasons. This where I want to take it towards. The tips and the events around, these are dependencies that are much, much more around how do they affect our production dependencies. Right? Our production readiness.

Liran Tal: So, let’s think about it. First of all, there could be a breaking change in the API. So even without the dependency being broken, maybe there is no SemVer, but now there’s an actual change that is not reflected in SemVer. So, you’re pulling this in. Another issue that could happen is the package has just been compromised, which is not something that anyone would want happening.

Quintessence Anx: Right.

Liran Tal: Right?

Quintessence Anx: Right.

Liran Tal: The other thing is NPM, as kind of a mirror at the end of the day, may be unavailable. Right? So there are networking disruptions, or just the NPM specifically going down, or something like that. So all of those things could happen that would break a package that is somewhere down the tree, that you may or may not realize it. And all of those, probably at one time or another, happened. But let me ask. Let’s go to our real stories from what has been happening. That is, what are the chances that the maintainer will pull down a package? Right? Will pull it out, just yank it off of the registry, and anything dependent on it will not exist here anymore. It’s probably not really high chances for that to happen. Right?

Quintessence Anx: No, but it sounds like one of those things that is rare but severe. Yes?

Liran Tal: Indeed. Indeed. So, this exactly happened. Right? That’s kind of where we were leading with this. So, it’s open source. Everyone, to an extent, can do as they please. What’s the worst that could happen? So, close-off of that story. So indeed a package a few years back was actually pulled off. So this is somewhere around 2016 or so, where there was a package called left-pad, that was maintained by a person who was maintaining something like 300 other NPM modules. So, fairly overseeing the whole ecosystem. But there was some legal issues with some of his packages.

Liran Tal: So to cut the story short, that person refused to provide ownership for some package. So as an act of protest, that person removed all of his modules out of NPM. So all of those 300, including that specific package called left-pad. Except that left-pad package, which was later kind of frowned upon, that this is string left-padding of 17 or 18 lines of code, basically broke large parts in terms of build systems for projects like webpack, and Babel, and [JSES 00:19:58]. And a lot of big companies. Right? Enterprises in the form of the FAANG and all of those that were dependent on their center builds kind of broke. This is just because that kind of like shook the NPM ecosystem a bit. The thing is, the resolution of that was also showing how immature we are in terms of … That point in time, at least, we were, in not expecting that this will happen. Very naive in our usage of open source and dependence on it.

Liran Tal: So what happened is, left-pad as a package is now unpublished from the ecosystems. And there are millions of dependencies and projects that are using it in their builds, in their [inaudible 00:20:36], in their dev machines. People are just installing it. So someone else pushed in a new package, a new version of left-pad, 1.0.0, to fix everything. But there was actually no process there to actually validate that that person is okay, because that model at that point was, that module namespace was free.

Liran Tal: So you have to ask yourself, “What would have happened if that person would have pushed that package, but that person would have been a malicious user?” So now everyone would get their builds unbroken, very much will, but there’s now a malicious package in the build. Right? So not exactly what you’d want happening. So there was a whole lot of learning experiences from this happening. So this is definitely something that emphasizes how ready we should be in our production readiness towards consumption of open-source dependencies.

Quintessence Anx: That was amazing. Thank you so much for all of that, Liran. There are two questions that we like to ask every guest. But before we do, I just want to remind everyone, if you want to have some more amazing learnings, please head over Snyk’s blog. You’ll find posts from all of their knowledgeable people about topics related to dependency security and everything we’ve talked about here. There’s also the MyDevSecOps community, which is my mydevsecops.io. It’s a vendor-neutral community for developers who care about security. So if you’re getting started, or you want people to ask that are more knowledgeable than you, I would definitely recommend taking a look. There’s also [inaudible 00:22:12], which is an open-source package as a local NPM proxy for enterprises. We’re going to have links to everything in the show notes, so make sure you check out those. And back to you, Liran. We have the two questions. Are you ready?

Liran Tal: Yeah, sure. Thank you, Quintessence.

Quintessence Anx: Yeah. What is one thing you wish you had known sooner when it comes to running software and production?

Liran Tal: I don’t think I learned that too late, but I definitely went into that path of not understanding how crucial testing, security, performance, accessibility, all of those cross-cutting concerns are. I would say all of these are your part of your ongoing journey of software development. It’s either that, or you’ll face them later on unprepared, unfitted to kind of maintain the project in those concerns.

Quintessence Anx: Awesome. And of course, the opposite. Is there anything about running software and production you’re glad we did not ask you about yet?

Liran Tal: Yes. There is a security incident that I’m really happy we didn’t talk about.

Quintessence Anx: You’re going to leave us hanging on that one, aren’t you?

Liran Tal: Definitely.

Quintessence Anx: Well, best of luck with it. Thank you so much for joining us.

Liran Tal: Thank you so much for having me.

Quintessence Anx: Absolutely. This is Quintessence wishing you an uneventful day.

Quintessence Anx: That does it for another installment of Page It to the Limit. We’d like to thank our sponsor, PagerDuty, for making this podcast possible. Remember to subscribe to this podcast if you like what you’ve heard. You can find our show notes at pageittothelimit.com, and you can reach us on Twitter @pageit2thelimit using the number two. That’s @pageit2thelimit with the number two. Let us know what you think of the show. Thank you so much for joining us, and remember, uneventful days are beautiful days.

Dependency Security With Liran Tal

Transcript

Show Notes

Additional Resources

Guests

Liran Tal

Hosts

Quintessence Anx