Video Summary and Transcription
The talk explores the dark side of open source, focusing on supply chain attacks and the need for improved security measures. It highlights the dangers of loading external code and the importance of mitigating supply chain risks. The talk also discusses the use of AI and LLMs in code analysis to enhance security. It emphasizes the challenges of sustaining IC maintained open source projects and the future of supply chain security. Lastly, it touches on the variations in open source definitions and the empowerment of the open source community.
1. The Dark Side of Open Source
Welcome to the talk on the dark side of open source. We'll explore malicious code and threats in the NPM and JavaScript ecosystems. I have experience in open source and cybersecurity, and now work on open source security at Socket. Socket helps developers and security teams find, audit, and manage open source software. Today, over 90% of applications rely on open source dependencies, making supply chain security crucial. The open source ecosystem is under attack, with software supply chain attacks affecting companies of all types.
♪♪ ♪♪ ♪♪ Hey, everybody. It's Firas, and welcome to this talk on the dark side of open source. I'm really excited to share with you some of the lesser-explored parts of open source. We're going to dig into some examples of malicious code and going to give you a sense for some of the threats out there in the NPM and JavaScript ecosystems. So let's get started.
So first off, a little bit about me. I started out in open source. I worked on some pretty popular packages, including WebTorrent and StandardJS, and I'm also a former member of the Node.js Foundation board. And so I really got to see a massive increase in the usage of open source within companies and in the community. Then I moved into more of a security focus. I taught the web security course at Stanford, and now I'm working on open source security at Socket.
So real quick, just a couple words on Socket. So Socket's a tool that helps protect your code from everyone else's, and we help developers and security teams to ship faster and spend less time on security busy work by helping them safely find, audit, and manage open source software. We have a ton of companies using us. A lot of these are actually open source projects, and we're protecting over a quarter million repositories today. So I'm really happy that we've been able to help protect the community to this degree.
Okay, so let's talk a little bit about our applications. So in the last five years, the way that we write software has really changed. It's undergone a really massive shift. Today it's really common to see applications where over 90% of the code comes from open source dependencies. So that means code that your developers, you know, you and your teammates didn't write. The average open source dependency actually has 79 transitive dependencies. So in this world where your application is built on, you know, thousands of dependencies, software security is not just about your code. It's about every piece of code that you depend on. And so in this talk we're going to be talking about open source dependencies because we're JavaScript developers, but, you know, this is actually a broader issue. If you think about there's this term software supply chain, and that really includes all the third-party code that you rely on, whether it's APIs, cloud services, and even dependencies like your operating system, really all the parts and the pieces that make up our software is what we talk about here when we talk about supply chain security. And unfortunately, you know, the open source ecosystem is under attack. We've seen software supply chain attacks surge in the past couple of years. There's headlines pretty regularly about different breaches and attacks, and these attacks impact all types of companies. You know, it's really just anyone who depends on open source, and I know you do depend on open source, will at some point be affected by one of these attacks just because of the scale of NPM.
2. Supply Chain Attacks and the Problem of Trust
The SolarWinds hack was a sophisticated supply chain attack that compromised SolarWinds and impacted thousands of networks. NPM packages, often maintained by individuals or small teams, can also be vulnerable to supply chain attacks. I'll present a real example of a recent attack that exfiltrated environment variables. As developers, we rely on trust in open source, but it takes too long to detect malicious packages and they're often not catalogued for future reference.
And so how many of you have heard of the SolarWinds hack? This was pretty big news a few years ago. It was a sophisticated supply chain attack that compromised a supplier called SolarWinds, and the way that it worked was that an attacker added malicious code into one of SolarWinds' software products, and then they did this by basically getting into the network of SolarWinds and adding their attack code into the SolarWinds product. And then downstream of that, they were able to get into thousands of networks of SolarWind customers, including U.S. government agencies and large corporations. And while it's pretty hard to kind of figure out the exact monetary damages of this attack, the costs associated with the investigation, the remediation, and the increased cybersecurity measures as a result of this was probably in the billions of dollars.
That's SolarWinds. That's a company that has a security team and a lot of effort to defend their products. Now let's talk about NPM packages, right, which are often maintained by individuals or small teams of volunteers. So sometimes software supply chain security can be kind of abstract, right? It's kind of like, what are we talking about here? So I wanted to make this really concrete for everyone and just really show you, what does a supply chain attack look like? So here's a real example. This is an attack that we detected a few days ago. And we're going to discuss, like, what's going on here. So let me help you out a bit. I'll highlight a few parts of the code here. Does that help? So now if you look at this, you can see, were a developer to install this package, this malicious code would immediately run in an install script, and it would exfiltrate or steal their environment variables, which can include, obviously, secrets, tokens, keys, and then it would send it to an attacker-controlled server. So you can see those three parts there. It's acquiring the network package, it's accessing the environment variables, and then it has this sort of obfuscated or hidden network request down there on that third line. And this is really how a software supply chain attack can lead to a breach at a company.
Now, this package was targeting probably Airbnb, given the name of the package, but honestly we don't really have all the details on what the goal of this package was, but the name is very suspicious, I'll just say that. And so fundamentally the problem here is that, you know, as developers we're using so many packages, but we just don't have the time to read every line of code in our dependencies. And so we're always trusting other people fundamentally, and open source is built on trust. And for the most part this trust is well placed. Most people are good. But there are a few bad apples. And unfortunately it does take us as a community a little bit too long to find these types of bad packages today. Right now we're looking at over 200 days to detect a malicious package as a community. And so, you know, this is pretty bad. This is from a research paper published in 2021. And the other big problem is when we find these malicious packages as a community, we report them and they get taken down, but they're often not catalogued and saved in any way. So they don't go into the typical vulnerability tracking systems like the National Vulnerability Database. They just get taken down and then no one knows whether or not they may have installed that package in the past.
3. Detecting and Preventing Supply Chain Attacks
Socket is actively detecting and preventing supply chain attacks in various open-source ecosystems, including NPM, Go, Python, and Java. Let's discuss a recent supply chain attack involving Ledger, a company that produces hardware security wallets for cryptocurrency. The attack compromised a JavaScript library published on NPM, highlighting the need for improved security measures. The library allowed websites to interact with the hardware device, and it required the installation of a connect kit loader package. The README mentioned loading the connect kit library from a CDN, raising concerns about the code's dependencies.
So we're trying to change this. At Socket we are detecting and preventing over 100 of these types of attacks every single week. And we're scanning all the most popular open source ecosystems, including NPM obviously, but also Go, Python, and Java. And then when we find a malicious package, we report it to the NPM registry and get it taken down so that we can protect the whole community. And obviously we're doing this to keep everyone safe and because it's the right thing to do. But also Socket is a commercial product as well where we sell some services.
Now I wanted to just maybe spend some time going over a recent supply chain attack that I think was particularly interesting. So this one comes from December 2023. So this was only a few months ago. And you might have even heard about this or seen the headlines. But let's get into it. So this is a real world example. And in this example, so there's a company called Ledger which makes a hardware security wallet that helps people store their cryptocurrency. And they have a JavaScript library that they published to NPM that was compromised back in December.
Let's talk a little bit about how this worked. I think it has a few lessons in here for us as JavaScript developers so that we can improve the security of our applications. So let's just go through it. So the first thing you'll notice about using this package is if you read the README, developers who are using this library, I should probably mention really quickly what the library does actually. So the library was used, if you're building a website and you want to talk to this hardware device, that you would install this NPM package to be able to interface with the hardware device in a more seamless way. So there are a lot of websites out there that are trying to interact with this hardware device that were using the NPM package to do so. And so the way you'd use the package is, the README had this line in it where it says to add this connect kit loader package as a dependency and then use it as below. So it looks pretty straightforward, right? You just import the package and then you call this load connect kit function. And then if you notice, there's another part of the README that says that this will allow your decentralized app, which is basically your website, to load the connect kit library at runtime from a CDN so that they can improve the logic and improve the product without waiting for people to release new builds. So right away that sets off my alarm bells because we see here that the code that we're depending on is actually itself depending on more code, but this additional code is on a CDN.
4. The Dangers of Loading External Code
If you're building a website and want to interface with a hardware device, you would install the NPM package. The package requires adding a connect kit loader package as a dependency and using it to import and call the load connect kit function. However, the connect kit library is loaded at runtime from a CDN, allowing changes to the code. This bypasses the lock file and can lead to malicious code being served to websites using the package. Authors should avoid hot linking to CDNs in NPM packages and using HTTP or Git dependencies that bypass the lock file.
So the library was used, if you're building a website and you want to talk to this hardware device, that you would install this NPM package to be able to interface with the hardware device in a more seamless way. So there are a lot of websites out there that are trying to interact with this hardware device that were using the NPM package to do so.
And so the way you'd use the package is, the README had this line in it where it says to add this connect kit loader package as a dependency and then use it as below. So it looks pretty straightforward, right? You just import the package and then you call this load connect kit function. And then if you notice, there's another part of the README that says that this will allow your decentralized app, which is basically your website, to load the connect kit library at runtime from a CDN so that they can improve the logic and improve the product without waiting for people to release new builds.
So right away that sets off my alarm bells because we see here that the code that we're depending on is actually itself depending on more code, but this additional code is on a CDN. So what does that mean? That means that that's code that can change. It's loading that code from a remote server. So let's look at what this actually looks like. So if you actually go into that function, you'll see that this function doesn't contain the logic of the actual functionality that I'm trying to use here. What it's actually doing is going out to this HTTP URL with a script tag and then running that code. And that code is whatever code happens to be served by that server at that point in time.
And so what this really means is this is a package that effectively bypasses our lock file. Like our lock file in NPM is designed to lock down the code that our app is depending on so that it doesn't change from build to build, right? So that it's deterministic. But when you have a package that's just going to go out and load whatever happens to be on this HTTP URL, you're going to have your app change out from underneath you. And that's exactly what happened here. So the attacker was able to get their code into ConnectKit, which this CDN here started serving. And then all the websites out there, which were actually, I believe, in the thousands, started instantly serving the malicious code without an update, without a new build, without a new NPM install. It just instantly updated all the sites. That's how script tags work. And that's, by the way, why script tags are so dangerous. You're just giving kind of remote code execution capabilities to whatever site you use there.
So what did we learn from this? I would say there's a bunch of things we learned. Let's go through a few lessons. So for authors, I mean, the main lesson is don't hot link to CDNs from within your NPM packages because you're going to be bypassing your user's lock files, which is not cool. Also, just related to that, sometimes you'll see packages that are using HTTP dependencies or Git dependencies or even, like, GitHub URL as a temporary workaround to fix a bug. You'll see, like, a fork. People will depend on a forked version. This is really dangerous because you're bypassing your lock file and you're giving the author of that package the ability to change the code out from underneath you.
5. Mitigating Supply Chain Risks
Bypassing the lock file and not revoking NPM access of employees are serious security risks. Auditing NPM access regularly and using GitHub Action workflows instead of giving developers NPM access can mitigate these risks. When using packages, avoid those that remotely load code and bypass the lock file. Spend time understanding the code in your dependencies and consider using security tools that detect supply chain risks and code quality issues. Additionally, use fewer dependencies, pin them, and avoid mutable references. Vulnerability scanning tools may not protect from all types of attacks, causing alert fatigue. The NPM audit tool has been criticized for not detecting certain vulnerabilities and has several issues.
This is really dangerous because you're bypassing your lock file and you're giving the author of that package the ability to change the code out from underneath you. The other thing was, obviously, they didn't revoke the NPM access of the employee here, which is a serious problem. So I would kind of advise everyone to audit their NPM access pretty regularly and have a checklist for when you're offboarding an employee so that you can make sure to not forget a step like removing their NPM access.
And finally, I would consider using a GitHub Action workflow so that you don't have to actually give developers access to NPM. Now, on the user side, you know, the most important lesson here is don't use packages that remotely load code and bypass your lock file. That your dependencies, you know, the only way you're going to find out that a package is doing something like this is if you actually look at the code of it. Obviously, that's a lot of work. But as much as you can, spend time looking at the code in your dependencies. Understand what your dependencies are doing. This is not just for security. This will make you a better programmer. You'll learn about, you know, different coding styles and techniques. And it's actually just a good idea in general to learn. But you'll also discover when you're using a low-quality dependency. And then the other option is you can just use a security tool that detects these types of risks. So not detect vulnerabilities, but actually detect supply chain risks and code quality issues like this one. And so general advice, also use fewer dependencies, pin your dependencies, avoid mutable references to dependencies that can change out from underneath you.
So there were, you know, some folks using a vulnerability scanning tool such as Snyk that weren't protected from this attack. Well over 12 hours after this package was compromised, these tools were still reporting that the package had no known security issues to, you know, all their customers. So if you rely on a vulnerability tool, you know, you have to be aware that they're also not going to protect you from this type of attack. And unfortunately, you know, this is, to me at least, I find this so frustrating because, you know, on the one hand, our tools are literally, you know, not detecting the attacks like this that we actually care about, these attacks that are affecting the ecosystem. And yet, despite that, they also are drowning us in alerts. I mean, they're flooding us with meaningless alerts all the time. You know, 60% of people have said that their, you know, the number of alerts, so alert fatigue is what it's called, has created friction between their, you know, developer teams and their security teams. And, you know, I'm sure we've all seen the NPM audit output where you install a package and you immediately see that it has, you know, a hundred-plus vulnerabilities, and then you just shrug and move on with your day. So it's so – I just personally find it really frustrating that we're – our tools are constantly yelling at us about security vulnerabilities, but yet they don't – they can't even detect the type of vulnerability we just discovered. And, you know, Danny Abramov, you know, kind of wrote this famous viral post a few years ago about how NPM audit was broken by design. I mean, he had some pretty harsh words about kind of the problems with NPM audit. He called it a stain on the entire NPM ecosystem and said it was completely broken. And, you know, while I might have chosen different words myself, I do think there are several problems with NPM audit.
6. Enhancing Security with LLMs
Security vulnerability reporting tools often send too many alerts about insignificant issues while failing to alert about dangerous packages and other security risks. Socket uses AI to identify threats by analyzing source code and providing plain English explanations of the risks. By utilizing LLMs, which are not easily fooled and can analyze more source code, we can greatly enhance security. Examples of obfuscated code and clever evasion techniques demonstrate the value of LLMs in detecting suspicious behavior and significant security risks.
I mean, it's both sending too many alerts and it's sending not enough alerts. And I don't mean to pick on NPM audit here. This is true of all of our security vulnerability reporting tools. So this – on the one hand, it sends too many alerts about stuff we don't care about, but it also doesn't send alerts about dangerous packages, malicious dependencies, typo squad attacks, and so on and so forth. So that's what we're trying to change with Socket, and we're trying to give this protection to the whole ecosystem.
And one of the things we found that really, really helps – and I think this is a really, really cool use case for AI, by the way – is that we can identify threats by passing source code into an LLM, and then that LLM can output a plain English explanation of the risks of the code. So, for example, that ledger attack we just discussed would get this output, you know, describing the fact that there's obfuscated code and there's likely malicious behavior and so on and so forth.
So in the last couple minutes here, I wanted to just show you a couple more attacks to give you a taste for just some other kind of fun security risks of different NPM packages that we've discovered. So let's look at this one here. So this is some code published to an NPM package. It's been obfuscated, so designed to hide the behavior of the code. You can see here, it's clearly – you know, there's a bunch of functions that have been renamed to be hard to understand. But let's highlight a few of the kind of interesting parts. So you can tell here it's downloading some type of an executable file, which is kind of sketchy. And then here, if you scroll down a bit, you'll see that it's loading something from the Discord CDN. It's using child process, probably, to execute that executable file. And you can see that there's also references to other executables and some HTTPS stuff. And so it turns out if you take this code and you put it into an LLM, and Socket does this, you'll get this amazing explanation that just tells you what the problems are with the code. So it'll tell you that it's highly suspicious and it has arbitrary code execution, downloading code from untrusted sources, and it's a significant security risk.
I just think this is pretty cool. If you haven't had a chance to play with LLMs, I actually do think they're going to make a really big difference in the way that we do security because they aren't fooled as easily as a human, and then they also can be run on more source code. They're tireless, unlike a human who'll get tired of reviewing code at some point. And then I wanted to show one other example of even more interesting obfuscation than the last example. So this is a package that we found where it's collecting a bunch of environment variables that it's attempting to steal, but I just thought it was interesting to see down here how they were trying to evade detection from certain tools, and you can see here that they're calling this type function, and then that's pulling some method off of the HTTP object there that it then executes. But let's take a deeper look at the type function and see what exactly is it doing. And I just think this is so cool, what this attacker did here. So let me draw your attention to down here where they're calling the prop getter function and passing in a series of 0, 1, and 2. Let's just focus on 0. So it calls the prop value function, which then calls this kind of filter operation, and what it's doing is it's actually treating the prop getter function as a string and then scanning the body of the string of this function to pull out the lines that start with slash slash, which are the comments. And then what it gets is it gets an array with three words in it, west, question, and Ireland.
7. Obfuscating an HTTP Request
The code slices different strings, reverses them, and spells out the word 'request.' This obfuscation was used to hide an HTTP request.
So then once it has that, then it uses these indexes down here to slice out pieces of these strings, and those are byte offsets into the strings. So let me just show you what I mean here. So it's actually pulling out between the range 2 and 4 on the first string, which is the letters ST. Then for the next string, it's taking 0 to 3, which is QUE. And then for the third string, Ireland, it's taking out R and E. So what could this be for? Why would it be slicing these different strings and pulling out these different letters? Well, I'll give you a hint. It then reverses the string. And so now take a look at the order there. What does that spell? Well, you have R-E-Q-U-E-S-T. It spells the word request. So all this code was literally just designed to return the string request. And if we go down here back to this original code, you'll see that it's basically just putting the word request there into the file. And so they were just trying to hide the fact that they were making an HTTP request.
8. The Power of AI in Code Analysis
The LLM and AI were able to figure out the code, identifying its purpose as malicious data exfiltration. Open source security goes beyond just vulnerabilities, and it's important to consider maintenance risks, unmaintained packages, and low-quality packages when building secure applications. Stay secure and keep evaluating your dependencies.
And so let's see what the LLM, what the AI says when it looks at this code. It turns out it actually was able to figure it out. So all this work that I had to do as a human to kind of understand this code was figured out instantly with AI. So it said here that this code sends environment variables as Base64 encoded data, and it even figured out that that prop value function that we just looked at is obfuscated and that this whole thing is used for malicious data exfiltration. So I don't know. Very cool, in my opinion. I think that's just really cool.
And so, you know, the attackers are very clever, but so are the defenders. And we have powerful tools now at our disposal to try to protect our applications from these kinds of attacks. So, yeah, I'm going to just leave you with one final thought. You know, vulnerabilities have a – the vulnerability system has a problem. It's really noisy. It doesn't really detect the types of attacks we've been talking about in this talk. And so I would just kind of encourage the community, encourage us all to take a broader picture of what does it mean for – you know, what does open source security mean? And there's really so many different sources of risk and sources of problems and packages. It's not just security, even. It's maintenance risks. It's unmaintained packages. It's low-quality packages. There's just so many things to think about when we're trying to build secure applications. And so, you know, I think expanding your horizon and thinking about these things will make you – make your applications not only more secure, but just more robust and overall better for users.
So, with that, thanks for the time. And I hope that you stay secure out there. Thanks again. Fantastic talk. I was waiting for that one. It was just as good as I was expecting. So, fantastic to have Feros with us today. And he's going to answer a few questions. But before we get to your questions, let's see what the poll has to say. So, he asked if you evaluate your dependencies before you install them, before you use them.
9. Developers and Dependency Evaluation
Developers often prioritize getting features out quickly over security. The prevalence of dependency checkers and SCA tools is amusing, but it's not surprising that 58% still choose to wing it and not evaluate their dependencies thoroughly.
And, honestly, when I voted, I did the first one because it was very funny to me. But I do actually use, like, a Dependabot every now and again. But, yeah, I mean, it's funny because they are so prevalent, actually, like the SCA tools and, like, kind of the dependency checkers. But does that surprise you, that number, 58%, still say, like, let's wing it? YOLO. Only live once. Look, I'm a developer. I know what it's like. You just want to get the feature out. You just want to use the code. Even if you care about security, it's just so enticing to just install and kind of move on. I'm not surprised by this number. But I am disappointed. Come on, I am disappointed in all of you. Do better. But, yeah, so, I've sinned with that as well. Sometimes I'm moving fast. I'm just like, oh, just check it out. Kind of hopefully there won't be too much fallout. But you can only do that, I feel like, locally. When you're in a company environment, it's less easy to do that.
Supply Chain Attacks and Vulnerability Tools
Traditional vulnerability tools are less effective in detecting supply chain attacks. The National Vulnerability Database, run by the U.S. federal government, only tracks vulnerabilities, not supply chain attacks. To prevent supply chain attacks, code analysis must be done proactively before use.
But let's ask some questions from the audience. And, folks, don't forget that if you want to ask Faros questions, this is the time. Go to slido.co and 0404 and drop your questions because you will only have this opportunity to tap into his wisdom.
So, someone asked, what are the differences with other competitors like SNAKE or, I guess, like I said, depend upon other kind of stuff oriented security tools? Yeah, yeah, so the market has a ton of these types of vulnerability tools out there. They basically, what they do is they look through your dependencies and they figure out whether you're using any packages that are known to be vulnerable. The problem with that approach is that they're less helpful at actually detecting a supply chain attack of the kind that we just went over in the talk. And the kind that we saw happen on Friday, by the way, which I know we're going to talk about. But if you haven't seen the news, there's a package called xzutils that was used by SSH. This is one of the scariest things I've ever seen.
But that type of attack where someone sneaks code into a package, that's not something that like the traditional kind of vulnerability scanner tools can find. And the reason is that those tools really just look up which packages you're using and compare them to data in a public database that is called the National Vulnerability Database. It's actually run by the U.S. federal government, believe it or not. And when they find a match, those tools will basically tell you, hey, you're using one of the packages on this list. And that's basically what they do. And unfortunately, it's both too reactive. You're kind of waiting around for that database to get updated. And the other big problem is that database isn't intended to track supply chain attacks and malicious packages. It's intended for vulnerabilities. And there's a distinction between those. Vulnerabilities are like accidental mistakes made by the maintainer of the package or by the contributor to the package. They're accidents that can lead to security issues if an attacker were to find and exploit the vulnerability.
But it's not a guarantee that you're going to get exploited if you are using vulnerable code because someone has to find it. Someone has to attack it. Whereas a supply chain attack or malware, this is something that's intentionally added code that is going to just trigger right away and attack you. And so in order to stop that, you actually have to analyze the code before you use it. If you wait until you use it and then you do a scanning tool like 24 hours later, it's too late. You know, you've already been attacked. So that's kind of the key difference is we're kind of the first tool that's proactive. That's cool.
Sustaining IC Maintained Open Source Projects
Maintaining sustainability for IC maintained open source projects is a challenging problem. While some fortunate individuals have corporate backers, most open source projects are maintained by random individuals who may no longer have the resources or motivation to continue. The landscape of open source has shifted, with smaller maintainers responsible for maintaining numerous packages. This shift has been facilitated by improved tooling and platforms like GitHub, making it easier for new projects to emerge. However, this reliance on individual maintainers poses its own challenges and burdens.
And also the fact that, like you mentioned, those types of vulnerabilities are exactly that, zero day. They're not in that database yet. So it's like you can't really know the term that they're going to happen. But on that same note, so what are your thoughts on the sustainability of I.C. maintained open source projects? Like after the story of the XE utils. Like how can we make it less thankless? How can we make sure that maintainers actually want to continue maintaining their projects burn out and everything that's associated with open source maintenance that isn't, I guess, community backed or like foundation back?
Yeah, it's a great question. Look, I spent like five years of my life trying to figure out how to do open source funding and make it sustainable. It's a very hard problem. You know, those who are lucky enough to work on open source with a backer like a big corporate entity that pays you to work on open source. That's the dream. Like, that's a great that's a great gig. I feel kind of jealous of those people. But but if you're if you look at like most open source, it's usually there's a lot of open source that we use that's written by just a random individual. It could have been that they worked on it while they were at a company. They left the company. They're still maintaining it because they feel an obligation to do that. There's there's all these these.
You know, it's actually it's really interesting that the world went from like if you look 10 years ago, it used to be you had these big open source projects. It had like hundreds of contributors working on them. And that was the primary model. And then now it's kind of inverted where you have an MPM and newer ecosystems like rust. You have the opposite where you have a single person with hundreds of packages that they maintain. It's kind of a complete flip. Right. And so that's kind of cool. It's it means that part of the reason that happened is that the tooling like GitHub and things like that got so good that it was easy to spin up new open source projects. And there's like it's kind of cool. We're making it easier for people to come in and contribute and all this good stuff. But the flip side is like you're depending on all this this code from all these these individuals and they're definitely under a burden. So, yeah, it's it's it's it's a problem.
Future of Socket and Supply Chain Security
Supply chain attacks on open source projects, like XC and Event Stream, highlight the need for better security measures. Integration with major platforms like GitHub and collaboration with package managers can enhance supply chain security. The team at Socket actively reports and helps take down malicious packages to protect users.
I mean, that's that's what happened with XC. And this happens all the time in MPM, too. I mean, in 2017 as early as 2017, there was an attack on a package called Event Stream where someone was maintaining it. Someone came along and in a very similar fashion said, hey, I'd love to help you out. Can you add me as a as a maintainer? And then they turned out to be, you know, a bad actor.
Yeah. Yeah. That's that's pretty scary. It's a scary thing to think about. What do you see of the future for like Socket? Would it be integrated with like kind of the big clouds or like the big, you know, version control systems, GitHub and others? You know, enable better supply chain security automatically. I mean, I would be happy to partner with any of the package managers that want to do this. We what we do is as a team today is anytime we find a package that is malicious, we report it to the package registry so we can get it taken down.
Open Source Perspectives and the Path Ahead
The identification and takedown of malicious packages is an ongoing effort. The question of why this is not part of the registry is important, and Socket is willing to partner to address this issue. The future of open source is complex, with different interpretations and evolving definitions. Open source can mean code visibility, community collaboration, or specific licensing. It is crucial to understand the context when discussing open source, as it encompasses various perspectives and practices.
And we do that with everything we find. And so that that hundred packages a week that we're identifying today that we're getting as much as possible. We're getting those taken down by reporting them and working with the registries, working with GitHub to get that code removed to protect people. But but it's it's a totally good question. Like, why isn't this just part of the registry? It's a good question. And I, you know, we'd be happy to partner to help with that.
And on the same note is like kind of the previous discussion a little bit more on kind of the future of open source. There's a lot of companies now that are changing their licenses and doing a lot of things like kind of that are a little bit scary in the open source space. Do you think what are your thoughts on kind of the future of open source and where everything is going? Yeah, I know it's a it's a that's a really interesting question. I think open source means different things to different people. Right. I mean, there's there's like the purists that think, you know, open source is like this definition, right? OSI. Like, this is exactly what open source means. And we shouldn't use the word to mean anything else. Unfortunately, language doesn't really work that way.
People can use words and, you know, and language evolves. And I don't believe OSI has a trademark on like the term open source. And so we often see is people people use it in different ways to some people. You know, to a lot of people, open source just means I can see the code. It's on GitHub and I can read it to other people. It means there's a community where I can I can open up a pull request and somebody will like look at it and maybe they'll accept my contribution. And there's like a kind of a collaborative, you know, environment to work on this code together.
You know, and then in other cases, it means like a very specific type of license that gives you certain rights. And so you got to really dig in when people talk about open source. Like, what are they actually defining it as? Because you have you see all kinds of different. You see like companies like Apple that technically are open source because they they they put the code out there, but they don't they don't host it on GitHub. They put it in a zip file on a random page on their Web site that nobody can even find. And like, technically, that's what it's like. They're sent to us, they compiled it. Yeah.
Exploring Open Source Variations and Forking
The definition and interpretation of open source can vary. Some companies change licenses, while still providing the core benefits that users care about. Understanding the context and meaning behind open source is crucial. Forking projects and changing licenses is a valid choice, but it can lead to feelings of betrayal in some cases.
Yeah. So is that open source? Like, I don't know. It doesn't feel like open source to me. It doesn't. But it I guess it technically is by the legal definition. And then you have these other companies that are like, we're going to change the license. But we're going to give like for ninety nine percent of people that use it. We're going to give them all the rights that they actually care about, which is to see the code, to collaborate on the code, to use the code. And yes, technically, it's not by the license, you know, by the legal definition, you know, an OSI approved license. But most people probably don't care. So I don't know. I don't have a strong opinion either way. I just think you always got to dig in and figure out what people actually mean by open source. And once you understand their their meaning, then you'll understand their position on different different things like that.
Yeah, absolutely. I guess just because you're here and you're the expert thoughts on forks as well. Like there's Valkyrie recently in Open Tofu after Terraform and the Redis fork. And I'm just wondering, like kind of your thoughts on like the big mega forks of known projects that choose to go down the kind of licensing change route. I think it's fine. I mean, that's like those projects are doing what's within their rights to do. They're forking the code. You know, it's great. I mean, I think that's that's awesome. I mean, there's no there's no the company's doing what they want. The community is doing what they want. It's great that we have the freedom to do we do these things. And, you know, I mean, I do think, obviously, that a lot of people sometimes there's I don't know all the specifics of the details of the Redis community or that type of thing. But I know that some folks often in those situations can feel like it was a bait and switch or betrayal of some kind. And that's probably valid. You know, so, yeah.
Empowering the Open Source Community
The community's ability to exercise their open source powers and make decisions is crucial. Let the market determine the value of different versions, whether community or company-led. Gratitude for the speaker's vision and participation in the discussion.
Yeah. I don't I don't know kind of all the all the feelings and all the different sides involved there. But in general, I would say, like, it's awesome that the community is taking matters into their own hands. And that's that's awesome. Yeah, that's open sources supposed to work. Yeah. Like, you know, you don't like what the direction is going. Well, that's the you have use your open source powers, the rights you have from lessons to go do that. And if the company doesn't want to do that for the future contributions, well, then we'll just let the market decide, you know, what do they want to use? Do they want to use the community version or do the is the company providing enough value on top to justify the non open source license? Like, let's just let's see, you know.
Yeah. Yeah, absolutely. I guess last.
OK, that was a really fantastic discussion, as always, for us here. You know, a great person to tap into. And I think your vision and kind of in the open source world is fantastic. Thank you so much for being with us and your fantastic talk was really wonderful. Cool. Yeah. Thanks for having me. It was really fun.
Comments