Video Summary and Transcription
Today's Talk focuses on securing the software supply chain, particularly in the JavaScript ecosystem. The number of transitive dependencies in JavaScript projects can contribute to vulnerabilities. Attacks on the open-source supply chain have increased significantly, leading to initiatives to improve supply chain security. Accuracy in package manager dependencies is crucial, and caching and bundling dependencies can help achieve reproducible installations. Mitigating threats involves active scanning, creating profiles, and sharing information. Tools like NPM Audit Signatures can verify package integrity. Future developments include reproducible installations and powerful dependency queries.
1. Introduction to Securing Software Supply Chain
Today, I'll be talking about securing your software supply chain, specifically focusing on JavaScript or node supply chain. I have over 20 years of experience in engineering development design work, including consulting and open source contributions. I co-founded Themify and was the engineering manager for the NPM CLI and GitHub CLI teams. Let's take a quick look at the state of the ecosystem, focusing on package managers and their interaction with dependencies. The NPM ecosystem has over 2.3 million packages and billions of downloads per month, with transit dependencies being a major factor.
Hi, everyone. My name is Darcy Clark, and today I'll be talking to you about securing your software supply chain. Specifically talking about how you can secure your JavaScript or node or supply chain. If you'd like to follow along, the link to the slides is bit.ly. Or you can scan the QR code that you'll see here alongside the talk and hopefully the links and all the information you'll find is useful, feel free to share with friends. And let's dive in.
So a little bit about me. My name is Darcy Clark again. I've been doing engineering development design work for over 20 years. I've done consulting with a number of different brands, agencies, startups, large and small organizations. And I've also been active in the open source community for over 15 years. So you might know some of my work. I co-founded a company called Themify back in 2011, 10 or 11. I also most recently was the engineering manager for the NPM CLI and GitHub CLI teams, and was part of the NPM acquisition by GitHub back in 20. I'm based here in Toronto, Canada, as my hat will show you. And if you feel like following me, you can. I'm on Twitter, I'm D'Arcy, that's my handle, or you can check out some more information about me at my website.
So a little bit about what I've been up to for the last three or four years. I was managing the NPM CLI team as I said, and that team actually supported roughly 100 different projects or NPM packages, which accounted for roughly 2% of all registry traffic that we saw or put it another way, there was about 3 billion downloads a month for the projects and the portfolio projects that my team supported. And so let's take a little quick look at the state of the ecosystem as it is today. So in the ecosystem we have runtimes, we've got package managers, we've got languages and transpilers and pretty much everything else falls into that last bucket of build tools, bundlers, frameworks or more. And when we talk about the supply chain within package management specifically, we talk about the packages that are available on npm. And that really comes down to these bottom three areas, package managers, transpilers and then everything else. The area we're going to focus on today is the package managers themselves. How they interact with those other dependencies, the nuances you may see with them and essentially some of the cool new tools and features that are coming to package managers to hopefully help secure your dependencies. And of course, JavaScript is known for having a ton of dependencies. The NPM ecosystem as it stands today has over 2.3 million packages and sees roughly 200 and almost 220 billion downloads a month. And why is that? Well, it's not the direct dependencies. On average, the stat from the GAP state of the universe back in 2020 and 2021 outlines that JavaScript projects don't have that many direct dependencies, roughly 10 on average. But actually it's the transit of dependencies that we see being the majority of the bloat that comes in the NPM ecosystem.
2. Transitive Dependencies in JavaScript Ecosystem
In the JavaScript ecosystem, projects have an average of 683 transitive dependencies. Transitive dependencies are dependencies that are pulled in due to direct dependencies. These dependencies can account for 5% of vulnerabilities. (Source: Snigg's 2020 report on the open-source security ecosystem.)
JavaScript ecosystem. So on average, you see roughly 683 transitive dependencies in those projects. So what are we talking about when we say transit dependencies? This graph should give you a bit of an outline in terms of what we mean by transitive. So package A depends on package B and D and package B then also relies on package C. Package C is what we would call a transitive dependency, essentially a dependency that your root node or root project hasn't defined itself and has been pulled in because of one of your direct dependencies. It's actually estimated that the same 5% of vulnerabilities reside in those transitive dependencies that we're including, which is kind of crazy. This stat comes from Snigg's of the open-source security ecosystem from their 2020 report.
3. Increasing Attacks on Open-Source Supply Chain
There has been a significant increase in attacks targeting the open-source supply chain, with a 742% year-over-year increase. According to GitHub, there is a 59% chance of receiving a security alert in the next year. Many companies are working on initiatives and products to address supply chain security and increase trust in the MPM ecosystem.
So of course we see more jokes about this problem here. The average developer seems like they can't get away from vulnerabilities and the reporting of them anytime they run the MPM audit. Of course, this is a valid feeling. There's been an increase of almost 742% year-over-year increase in attacks targeting the open-source supply chain. And here's what this looks like when it's graphed. Just an insane increase, which doesn't look like it's ever going to stop. From the stats we have from GitHub itself, the state of the octoverse again in 2020 and there's actually a 59% chance of you getting a security alert in the next year, which is pretty big. There are many companies that are trying to spearhead initiatives and trying to introduce new products to address this, to address supply chain security, to look at advisories and sort of increase our trust of the MPM ecosystem.
4. Package Manager Dependencies and Accuracy
Let's take a look at an example of starting a new project and using a package manager to install dependencies. Different package managers interpret dependencies differently, leading to variations in the number of dependencies installed. Accuracy is crucial, as package managers have different interpretations of dependencies and may enforce different rules. False positives are acceptable, but false negatives are dangerous. Caching and bundling dependencies frequently can help achieve accurate and reproducible installations.
So let's take a look at an example of me starting a new project and what sort of happens once I begin to use a package manager to shape and begin installing dependencies. So here I've created a Create React app project, just running the normal installer, and it created a package JSON with a manifest that looks like this. It's got seven direct dependencies on Jeston, React, UserEvent, Web Vitals, et cetera.
We jump over and we look at what each and every package manager will actually install with this manifest. We see a wide variety in terms of the number of dependencies installed. It's roughly an 850 dependency difference between the smallest and the largest number of dependencies that we see installed. Again, this is with no specific configuration and no added configuration to the Create React app project. This is purely what the package managers are deciding to install and how they interpret the package manifest, as it were, for that project.
You might be wondering, wait a second. What's happening here? Accuracy is very important. In terms of package managers, they all have different interpretations of dependencies and the manifest file. Some may or may not install development dependencies, optional dependencies. They may or may not interpret the bundled or peer dependencies, and they may enforce special overrides or resolution algorithms that differ from each other, meaning that the accuracy of the graph is completely up to the package manager. There's this famous quote by Hercules of Euphrates, a Greek philosopher. He said, no man ever steps in the same river twice. The modern day equivalent, we could say that no package JSON ever installs the same way twice. Of course we can joke about this and say that pretty much no npm install is ever going to be the same. And this has to deal with in some cases a mutability problem. If you take anything away from today's talk, just be wary of the accuracy of the audit tools and the package manager tooling that you're using today. The fact is false positives are fine. They may cause churn and they might waste some extra time, but false negatives are actually very dangerous. So what can you do? Well, the best way to handle this is to cache and bundle all your dependencies as often as you can to get the most accurate and most reproducible installation.
5. Supply Chain Threats and Mitigation
Some of the biggest threats to the software supply chain are malware, typosquatting, dependency confusion, registry compromise, and account takeovers. Mitigating these threats involves active scanning, creating profiles, automation, and sharing information. Key heuristics such as names, download counts, versions, and publish dates help mitigate typosquatting. Policies and enforcement can also be used to protect against these threats.
So what are some of the supply chain threats that we see beyond accuracy and mutability? Some of the biggest threats to the software supply chain are things like malware, typosquatting, dependency confusion, registry compromise, and account takeovers.
So how can we mitigate malware? Well, taking an active approach to scanning on a regular basis and creating profiles about what malware does and doesn't look like is important. And, of course, the automation of those tools on the registry side, as well as in your private instances, is important. And letting others know what you've found is key to keeping us safe.
How can typosquatting be mitigated? Well, there are some key heuristics that we can use. Things like names, download counts, versions, and the publish date are all indicators of whether or not something is trying to hide and pretend to be a popular or safe package within the ecosystem. Of course, we can use policies and enforcement of those policies through tooling about the personas or heuristics that we've created.
6. Mitigating Dependency Confusion and Mutability
To mitigate dependency confusion, use scopes for internal packages and ensure registry configuration in npm rc files. Respond quickly to build failures. Mitigate registry compromise with lock file support and integrity checks. Mitigate account takeovers with login verification and 2FA enforcement. Eliminate mutability by removing or avoiding mutable package references. Use lock files and the before flag to enforce reproducible installations. Consider the accuracy of advisory tools and the NPM CLI's NPM Audit Signatures tool.
So how can we mitigate dependency confusion? Well, the use of scopes for internal packages is a great tool when you are either hosting a third party registry, or in general, trying to keep private code available to teams. Ensuring that the registry configuration is set within the npm rc files of all your projects is also key to ensuring that you don't try to reach out to a public registry and download some software your data intend to. Of course, respond quickly to build failures because they may allude to a misconfiguration within your projects.
How can we mitigate registry compromise? Well, the npm package manager and most package managers already have lock file support, which is one of the keys to ensuring that you check the integrity of packages that you've installed before and have seen before and ensuring that you've cached things like the integrity checks and SSRI information.
How can we mitigate account takeovers? Well, the npm registry and the github.com experiences have been slowly rolling out login verification and 2FA enforcement. This includes improved the 2FA login experience through web authn, and they've also made heavy investments into the support team and authentication workflows.
So, let's move on to mutability. We've talked about it a bit before, but this is one of the biggest areas of concern when we talk about supply chain security. Things like remote third-party packages, install scripts and more, all are the cause for mutable installations.
So, how can we eliminate mutability in our projects? Well, it starts with removing or avoiding mutable package references. So, within Package Manifest you can actually find references to distribution tags, remote tarballs and remote git repositorials. In all these cases, these are mutable references to packages, which will create issues if you try to have reproducible installations.
A big concern here is that the npm registry actually hosts mutable and immutable package metadata, which is not validated against the tarball. This is a huge concern and something that should be considered. As referenced before, the use of lock files actually helps to lock in the integrity value signatures and actually the tree shape of your projects. Using them alongside commands like npm ci will enforce the reification of the same installed tree time and time again.
Another tool that is often overlooked, but is a great use when we're talking about mutable and immutable states of projects is the before flag. Providing a date to npm install in the before flag will help you lock in to registry dependencies at a specific period of time. This will only work for registry dependencies or packages. You cannot work with third-party git dependencies or remote tarball references as they are mutable.
Let's take a look at the current state of the solutions and tooling that we have in the ecosystem. A lot of advisory tools, dependent bot, renovate, and various other CI integrations which provide you with some audit or insights about your projects and packages. Be mindful that this information is only as accurate as the package manager or the brain in which these insights and analysis is being done. And as I showed you before, that can be wildly different based on the tools that you use.
The NPM CLI has a tool called NPM Audit Signatures. Today, you can use this to check and verify that the signatures defined for the packages that you've installed are valid and that the integrity has not been corrupted in some way. Artifact signatures are also a bit of a red herring.
7. NPM Audit Signatures and Package Integrity
The NPM CLI has a tool called NPM Audit Signatures that allows you to check and verify the integrity of installed packages. Artifact signatures can be misleading, so it's important to focus on the contents of the package. Don't rely solely on standards, compliance, or certifications, as they may not reflect real-world usage. Be cautious of panaceas that claim to solve all problems.
The NPM CLI has a tool called NPM Audit Signatures. Today, you can use this to check and verify that the signatures defined for the packages that you've installed are valid and that the integrity has not been corrupted in some way. Artifact signatures are also a bit of a red herring. You can sign anything you want, but ultimately the contents of the package are what's most important. Of course, be mindful of standards and compliance. It's important to look at what the industry considers to be a best practice, but don't always take this verbatim. Standards and badging can also be somewhat red herrings. Don't take any kind of certificate or badging at face value. Often standards and tasks being used are decoupled from the real-world usage of dependencies and can give a false sense of security. Always be mindful of panaceas. Avoid if you can, anything that says that it is one solution to fix all problems.
8. Future State and Dependency Queries
In the future, reproducible installations within package ecosystems are being developed. Package distributions allow for defining multiple variants of the same package for different platforms, eliminating the need for post-install scripts. Dota introduces new policies and permissions, while NPM has made audit improvements and launched the Dependency Selector Synthesis. This language enables powerful and expressive queries for dependencies, providing metadata and attribute filtering. The NPM CLI's QuerySelectorAll method allows front-end developers to use this syntax. Selectors like samber and outdated exist, and there are open RFCs for audit queries and policies. Key takeaways: security is a team sport, NPM is recommended for package management, curiosity enhances safety, and sharing discoveries benefits everyone.
So take a look at what the future state holds. There is hope. People are working on coming with reproducible installations within the package ecosystems. One solution I've been working on is something called package distributions. I wrote the spec well at NPM, and it's the idea that you can define multiple variants of the same package, specific for different platforms, and essentially swap out one package for another, based on the conditions that are met for the system when installing. This eliminates the need for post-install scripts, and this provides a first-class package distribution support mechanism.
Dota has also introduced new policies and permissions, which help you lock down at runtime what programs do and don't have access to. Within NPM specifically, there's also been a ton of audit improvements. In NPM v.8.16, we launched something called the Dependency Selector Synthesis. This language borrows heavily from CSS. It's holding a number of similar redundant filter mechanisms that we've had in the past and helping to create a language that easily answers multifaceted questions about your dependencies and their relationships.
Here are some examples of just how powerful and expressive this language is. You can create various CSS-like queries for dependencies using the NPM query command. Specifically, you can dive into your package's metadata and ask for things like packages with specific licenses that you do or don't want, or specific versions that may or may not be compliant with your specific heuristics. Attributes are also available through the selector syntax, as well as metadata about the type of dependency it is. In the last example, we actually look for any package or dependency that has any install scripts, essentially creating a query of finding any potential mutable references within our dependency graph.
Just like the browser, we've made this syntax available through a method that hangs off an arborist dependency tree. The method is called QuerySelectorAll, which should be familiar for front-end developers. If you'd like to try out the selector syntax, you can by running npm query and providing a selector. In this example, I'm looking for any dependency that has a version less than 1.0.0. And then I pass that to jq and do some fun stuff while mapping over the version and the version name.
We hope that this tooling will help you find bugs better. Isolating them within the graph and ensuring that you understand what exactly is being installed. Notable selectors that exist today are things like samber and outdated. Not yet implemented, but hopefully soon will be the vulnerable status, as well as the CVE and CWE pseudo selectors. There's currently an open RFC for audit queries being added to the NPM audit command. This would allow you to filter down what you care about and whether or not those dependencies are vulnerable. Currently there's an open RFC called audit policies. This feature gives you the ability to define ESLint like syntax for policies. Choosing a DSS selector along with the type for the selector, improving the audit capabilities that we have today, and generalizing heuristic enforcement.
So what are the key takeaways from this talk? Security is a team sport. If you need a package manager, you should probably be using NPM. Staying curious will keep you safe, and sharing discoveries you find helps us all.
Comments