Video Summary and Transcription
Welcome to DevOpsJS 2024! We'll be discussing semantics and versioning schemas, particularly semantic versioning (SEMVR). There are concerns about the flaws in SEMVR and the need to embrace change in software development. Dependency hell in the JavaScript ecosystem has been addressed through semantic versioning and new capabilities. However, there are still issues with the SEMBR spec, including absent definitions and problems with build metadata. To improve versioning, we need to address missing definitions and consider a new spec for the future.
1. Introduction to Semantics and Versioning
Welcome to DevOpsJS 2024! We'll be discussing semantics and versioning schemas, particularly semantic versioning (SEMBR). I'm Darcy Clark, a software engineer with over 20 years of experience. Inspired by Rich Hickey's talk, I have some concerns about his views on Semver. Let's dive in!
Welcome, thanks for joining DevOpsJS 2024 and taking an interest in my talk today. We'll be diving into one of my favorite topics, which is semantics, and more specifically versioning schemas, the most popular of which being semantic versioning, or otherwise known as SEMBR.
First a little bit about me. My name is Darcy Clark. I've been a software engineer for over 20 years, developing both open and closed source software. Had a long career as a consultant, working with amazing brands, agencies, and large enterprises. I also co-founded a company called Themify about 10 years ago, which serves commercial WordPress themes, and it's still around today. I spent the last four years working on NPM at GitHub, the open source CLI teams, both the GitHub and NPM CLI teams. I'm building a new JavaScript package registry and client, a company I founded last year called Volt, and you can check out more if you'd like at VLT.SH.
So this talk was actually inspired by a talk by Rich Hickey. Back in 2016, he did a keynote called Speculation, in which he dives into software versioning and semantic versioning itself. If you haven't watched one of his talks before, I highly recommend going to YouTube and taking a look at some of his work. This talk specifically is amazing, and I think he's a great speaker with some awesome insights. That said, I do have a few issues with some of the key takeaways from Rich's talk, his speculation talk. As far as I can tell, no one has brought up any issues in the last seven years, so I hope I'm not alone.
2. Challenges with Semver and Embracing Change
Rich believes Semver is flawed and only accepts backwards-compatible changes. I believe software should reflect real life, embracing mistakes and change. Stagnation and excessive permissiveness can lead to bloated interfaces. Software versioning should anticipate and communicate necessary changes.
The first of which is that Rich quite broadly believes Semver is a bad spec. I'm not completely against him here. I think there's a lot of room for improvement, and we'll definitely dive into that a bit later. The second major statement and takeaway is that we should never be releasing breaking changes, or if we need to release breaking changes, we should do that under a new name. In other words, he believes the only acceptable changes to software should be backwards-compatible ones. And, of course, lastly, he's okay with the idea of software stagnating, which is in line with that second bullet.
For me, I believe the creation and versioning of software should mimic real life. Sometimes things change, and changes aren't perfect. We break things, and that's a part of life. We shouldn't be afraid to make mistakes, and we should be compelled to create environments where it's easy to learn with minimal external impact when we get things wrong. In the case of stagnation, it's a natural phenomenon, but it's not something that we should promote or think that's positive. Software stagnation is the same. Refusing to foster and maintain software means it will likely meet a similar end as in the real world. Death, or worse, irrelevancy. When we talk about the easing of constraints or creating a larger API surface, we begin to see ourselves again in an unnatural, uncomfortable territory. Being more permissive with software means over time you will end up with a bloated public interface that you need to support. This is a self-imposed burden which can only be reflected through, which can only be rectified through breaking changes. This is similar to how we, you might need to break bad habits in the physical world. But those end up extending your lifespan.
Lastly, I find it wholly unnatural to pollute our ecosystems with spurious namespaces. Our version schemas should be there to free us of the restrictive contracts we have with historical interfaces, so long as the project's underlying purpose has not changed. This point of view I have comes from my understanding that software and software versioning is messy, just like life. Software changes over time, and this reflects how we all organically learn and grow. Software changes can break things, just like in the real world. Not all changes are expected, and sometimes they break. But we must respect and accept that breaking changes are part of life and are part of growing. Having a schema in place which anticipates that as necessary, that as a necessity, is critical to creating a thriving ecosystem of versioned software. Sometimes changes may even take things away, which is another type of break and change. But again, this reflects real life. And ultimately, a software versioning specification is meant to codify the signals for communication of change.
3. Dependency Hell and the JavaScript Ecosystem
Semantic versioning is a solution to dependency hell, which has nine circles of problems. Sunbird didn't solve the first circle, but Node and NPM did. The JavaScript ecosystem has embraced semantic versioning and added new grammars and capabilities to avoid dependency hell.
It is a language in and of itself, and will only ever be as good as the definitions it encompasses. So let's take a look at semantic versioning. Reading a direct quote from the original Sunverse spec, in the world of software management, there exists a dreaded place called dependency hell. The bigger your systems grow, the more packages you integrate, the more likely you are to find yourself one day in this pit of despair.
As you can tell, semantic versioning is being touted as a solution to something called dependency hell. I actually posit that there are nine levels, or nine circles, which represent dependency hell. Just like in Dante's nine circles of hell, we have nine circles of dependency hell. If you know anything about Dante's nine circles, you know that the first circle is limbo. We can place in the ecosystem any ecosystem or user that chooses not to consume dependencies in this first level. Moving down, we get to lust, where you don't care what dependencies you get. Then we get to gluttony, where you don't care how many dependencies you get, so on and so forth.
In Rich's talk, he shows a diagram of a palm dependency graph from the Java world, where there are multiple dependencies having transitive dependency relationships which need to be satisfied. For most languages, this is a huge problem, because conflicts often arise when you can only have one version of any given thing in your project. This problem prevents many ecosystems from consuming dependencies, and they end up in a state of limbo. In terms of this dependency hell, I believe this was and is the first circle.
So, the story here is that Sunbird didn't actually solve the first circle of dependency hell for anyone at all. As I showed you, there's actually other circles of problems that we need to address and need to get to. But some smart people did, indeed, solve this problem. Back in 2010, Node introduced common modules, and NPM, which was the new package manager at the time. Because Node supports requiring nested modules, it means you can safely scope while still consuming conflicting transitive dependencies. There's no global context. You can both duplicate a dependency's references in the runtime while deduping in the cache. The focus here in JavaScript is heavy on the functional programming with as little side effects as possible.
So, what is the state of the JavaScript ecosystem today in regards to dependency hell? Well, we've done quite well. As you can see, the packages published to NPM far outmatch any other ecosystem. It seems like our community has truly embraced semantic versioning while also avoiding the first circle of dependency hell. The secret here, and spoiler, is that we don't actually use semantic versioning as it was spec'd. The library that is used in the NPM client and the registry adds net new grammars and capabilities that aren't actually in the semantic versioning 2.00 spec. This includes things like sets, ranges, comparators, and the corresponding operator definitions and logic to go with them. If we just used the semantic versioning spec as it's written today, I don't think we would have been successful as a community.
4. The Sixth Circle of Dependency Hell
We're facing the sixth circle of dependency hell, known as heresy. Our ecosystem lacks self-control in consuming dependencies, as shown by the excessive number of packages installed. Package managers interpret the spec differently, leading to inconsistencies. However, there's hope.
The new problem we happen to be facing is that we've landed in the sixth circle of dependency hell, which we call heresy. You might ask what happened to circles 2 through 5, and what you might not realize is that those are pretty much the status quo for any thriving ecosystem. As you can see here, circles 2 through 5 are considered to be the circles of incontinence, which just means a lack of self-control. I think this makes sense in the context of our current ecosystem's relationship with dependencies. We are lustful, gluttonous, greedy, and sometimes even angry.
Looking at the data again, this seems to align with how you expect our community to be growing and consuming dependencies. A great example of this lack of self-control is here on the right. You can see how many dependencies are installed by each package manager for a basic create React app project. This is not the most modern way to create a React project, but it's just to showcase two different things. Roughly 1,200 dependencies are getting installed just to initiate a small project, and 2, there's a huge discrepancy in the number of packages that get downloaded across these tools. You might be wondering what is going on here? Why is this happening? How is that possible? Well, package managers interpret the spec different across each of the tools. This means your dependency graph will be different based on what tool you're using.
There's no standard or at least no standard that is being followed consistently across our packaging tools. At this point, you're probably thinking to yourself, these are issues that seem systemic, and we're definitely in dire straits. But I promise you, there's hope.
5. The Problems with the Sembr Spec
Sembr has good intentions but is plagued with bad and absent definitions. The current spec is 11 years old and has issues with patch versions and builds. The official spec's definition of a patch is backward compatibility, which is practically impossible. The definition of builds in the spec was changed, causing major consequences. Sembr 2.0.0-rc1 is the most accurate spec for versioning.
At this point, you're probably thinking to yourself, these are issues that seem systemic, and we're definitely in dire straits. But I promise you, there's hope.
So let's take a look at the current state of the Sembr spec itself. Notably, Sembr and the people behind it have great intentions, but unfortunately, in my opinion, Sembr is plagued with both bad and strangely absent definitions. If you haven't already, go read the spec at Sembr.org. The current spec, Sembr 2.0.0 is roughly 11 years old and has 11 rules.
The basic normal version in Sembr is considered to be comprised of a major, minor, and patch version. The major represents incompatible API changes, the minor represents additional functionality or capabilities which are backwards compatible, and the patch represents fixes intended to address errors in the existing functionality. There's nuances to this where you might find normal version fragments in popular marketing materials like 1.0, 3.0. And it's well understood that the values are read left to right and the precedence of the versions is well defined.
So let's take a look at some of the Sembr extensions. So beyond the normal version, there are two extensions to the Sembr spec. Both pre-releases are defined with a dash, separating them from the normal version. And build metadata can also be appended to the end of the version, separated with a plus sign. Generally, all the components of a semantic version are as it's defined today.
With that basic overview, let's dive into the definitions that the current spec gets wrong. The first issue you may come across with spec is actually the definition of a patch. The official Sembr spec notes that patch versions should be backward compatible, which is practically impossible. The idea behind a bug fix is to correct behavior that was not intentional. Not that it is incompatible with previous behavior. The language here must be amended or else it codifies bugs as features and vice versa. Essentially negating the conceptual usefulness of a patch is obviously bad.
But there's an actual even worse and more important definition in the spec today that must be changed. That is built. When semantic versioning 2.0.0 was first drafted by Tom Preston Warner, he defines builds, not build metadata. Two years later, two folks from the NuGet ecosystem came and changed the purpose of builds, making them and their definition effectively useless. Sembr 2.0.0-rc1 is, as far as I can tell, the most accurate spec for versioning and encompasses common practice of generating builds or build variants. These breaking changes to the spec should have never been introduced and have caused subtle, yet major consequences. In the original Sembr 2.0.0-rc1 spec, builds functioned as the antithesis of prereleases. You could even call them post-releases if you wanted.
6. Issues with Semantic Versioning and Build Metadata
The problem is that versions with build metadata should be treated distinctly but aren't today. NPM's Sembr implementation defaults to stripping out build metadata, causing confusion in the ecosystem. Introducing a nonfunctional grammar for comments would be a clearer solution.
As you can see here, they are respected when considering the precedents or ordering of versions. When we introduced changes to the meaning of builds to build metadata, they took away a build's precedents, effectively making any version with build metadata equivalent to any other of the same normal or prerelease version. This creates a situation where there's no definition or way to order a version which includes build metadata.
In this example, six different pieces of software with different Git hashes and different architectures are all semantically equivalent. There's no way in the spec today to do anything with them. You would be wrong for thinking there was any criteria for how to match one of these or choose one of these over another.
This problem is made worse by the fact that npm's own Sembr implementation, which funny enough, creates its own additional grammars and logic on top of the spec, defaults to stripping out build metadata from a version. This means that today, there are no packages in our ecosystem that have build metadata, which is supposed to be still a valid semantic version value. So to encapsulate this, the problem here is the semantic version which includes build metadata should be treated distinctly but isn't today.
In fact, we have no way of delineating between two versions with it defined. But versions of software with build metadata are not metadata themselves. They're distinct pieces of software. We should instead, as we've previously defined, be treating versions with build metadata like builds which are the antithesis of a prerelease. It seems silly to say it, but if we wanted to introduce a nonfunctional grammar into a spec, it should have explicitly done for commenting. Co-opting the existing value and legitimate use case for builds only has confused the ecosystem and tooling offers. In this example, you can see what the introduction of a new value could look like with comments preceding a pound character and having no empirical value or effect on the precedence.
7. Missing Definitions in Semantic Versioning
Combining these changes, we address the missing definitions of semantic versioning, including context, sets, and subsets. NPM's nonstandard library provides solutions. Range operators and build sets help with version comparison and filtering. We also need to define ordered and unordered sets.
Combining all these potential changes together, you can see that a build now would have its relative precedence again and alphanumeric comments can still be defined but have no meaning. This brings us to the missing definitions of semantic versioning.
Most notably is a complete lack of context for describing software, or projects itself in the current Sanvers specification. Reading directly from the current spec, version numbers and the way they change convey meaning about the underlying code and what has been modified from one version to the next. Inherently, there is a connection between the versions and the context in which they apply.
In this example, I show what a context schema may look like by prefacing a normal version with an alphanumeric context, separated by an at symbol, which is commonplace. This definition will definitely look familiar because we see it all the time in our existing package management ecosystems, notably NPM JavaScript.
Another key missing definition is the concept of sets. It may seem pedantic, but not defining sets or how sets can be created, ordered or ordered, leaves fundamental gaps in our collective understanding of how to manage more than one software version. As mentioned, the current spec only defines precedence. In this case, 1.0.0 is greater than 2.1.6 and then 2.3.9. Precedence itself does not give us enough clarity or language to create and manage version sets.
This is where NPM's nonstandard library has historically come to the rescue and filled many of these gaps. This library handles everything but deriving context out of a version. If you dig in deeper, you'll actually find that NPM has solved that problem as well. The library NPM package arg will parse context out of nonstandard, yet more fully conceptualized version values. Example, react at carats 1, 2, 3. This library itself uses the previous node semver library under the hood.
So, what would a missing set definition look like for semver? Well, it would include software context and a definition for individual software version sets. What this would could codify is software's inherent sets. In these two examples above, you see the versions of react cannot and should not ever be compared against the versions of express. This is the first type of set, the contextual set.
Below the examples, you'll see definitions for the missing range operators included and often included in package managers, the most notable of which is star or xgrammar, which matches any version within a relative set. Here are the three missing sets for normal, pre-release, and post-release, also referred to as build sets. Defining these explicitly allows us to know when we are including or filtering out particular versions for comparison. These are helpful delineations as both a consumer or as a tooling author. Second to last, we get to the missing concept of subsets. In this example, you can see all the different subsets that live within each of the particular levels of a semantic version. These represent all the potential versions that could exist within the project's context. And lastly, we must define ordered and unordered sets.
8. The Future of Semantic Versioning
The current spec lacks the ability to handle unordered sets. Semantic versioning 2.0 is outdated with no chance of a 3.0 release. To avoid dependency hell, we need a new spec. Check out Sember.xyz for updates. New tools, infrastructure, and stewards are needed. Let's improve the language for continued growth and success in the ecosystem.
The current spec only infers that precedents may be used to order a set, but we must assume that we will be given unordered sets. What's next? We have all these new and potentially impactful insights, definitions, and amendments that add a ton of value and can bring further clarity to our shared spec.
What should we be doing? Well, unfortunately, semantic versioning 2.0 is more than 11 years old. Nothing has changed about it in that time. Amendments and improvements have effectively stalled out and seems like there's no realistic chance we'll ship a 3.0.
In order for us to take advantage of all the existing and new solutions we have for avoiding more of the circles of dependency hell, we need to codify these into a new spec so we don't become heretics. If you want to follow the work I'm doing in this space, check out the website, Sember.xyz. Of course, we'll also need new tools and infrastructure to operationalize this spec. I'm doing just that with my company VLT. As well, we need to introduce new stewards for the spec. ECMA is a great place for standards, and we need to find a new home or body to take care of this spec going forward. I believe that if we do things right, we improve the status quo for the language that we use to define our software that we will be set up for continued growth and success in this ecosystem long into the future.
Thank you so much for listening to me today. If you want to reach out or connect with me, my handle is at Darcy on Twitter or check out the new package registry I'm building at VLT.sh. Thanks.
Comments