This talk has been presented at React Day Berlin 2023, check out the latest edition of this React Conference.
Rethinking Bundling Strategies
FAQ
Tobias Cobbers is the creator of Webpack, a popular module bundler for JavaScript. He joined Vessel and contributed to improving Webpack for Next.js. Currently, he is working on TurboPack and integrating it with Next.js.
TurboPack is a new tool developed by Tobias Cobbers that aims to improve upon Webpack's features and integrate seamlessly with Next.js. It focuses on efficient bundling strategies and better caching mechanisms to enhance web development workflows.
Tobias Cobbers highlighted two main challenges in bundling: ensuring deterministic builds and managing small input changes to result in small output changes. These challenges are crucial for effective long-term caching and minimizing the impact of updates on bundled resources.
Long-term caching in web development involves storing web resources in a browser's cache to improve load times and reduce server requests. It utilizes techniques like immutable caching, where resources are cached without revalidation, and e-tag caching, which allows browsers to check if the content has changed before downloading it again.
Webpack addresses content hash dependencies by using a manifest file that lists all the chunk hashes. This prevents changes in one part of the application from affecting unrelated parts, thereby optimizing caching and minimizing the need to re-download unchanged assets.
TurboPack proposes improvements such as more efficient handling of module fragments and exports, reducing unnecessary code in bundles, and optimizing the generation of module graphs to focus only on used exports. This leads to faster builds and more efficient application performance.
Effective code splitting strategies involve isolating changes to specific entry points or pages, ensuring that changes in one part of the application do not impact others. This can be achieved through heuristic methods such as separating node module dependencies from application code to leverage long-term caching more effectively.
Video Summary and Transcription
1. Rethinking Bundling Strategies
I'm Tobias Cobbers, the creator of Webpack. Today, I want to talk about rethinking bundling strategies, focusing on two challenges in writing bundlers. The first challenge is long-term caching, leveraging the browser cache to store resources between deployments. The second challenge involves improving the current state of Next.js and Webpack. Let's dive into these challenges and explore how we can do better.
Thank you. Yeah, I'm actually talking about rethinking bundling strategies today, and my name is Tobias Cobbers. I created Webpack 11 years or 12 years ago, and two years ago, or three years ago, I joined Vessel and worked a little bit on Next.js, improving Webpack for Next.js.
Now I'm working on TurboPack and integrating Next.js with TurboPack. My talk is actually a little bit more general-facing, so I want to talk about a few things. I want to look at two different challenges in writing bundlers. We're actually looking at the magic in bundlers. So I grabbed two topics for that, two challenges that I currently or in the future will face with building TurboPack. And I want to go a little bit deep into that because I think learning these bundler magic can be important, even if you technically should not face it in your day-to-day job. The bundler should make it transparent and should not face you with all these challenges. It should just solve it magically. But I think it's still useful to know it, and you get some deep insight of that, and it may help you in a few edge cases.
First, I want to present these two challenges, and then go into the current state with Next.js and Webpack for that. And after that, I want to spend a bit of time rethinking that and how we can improve on that, what we can do better in the future, and what we actually want to do on TurboPack with these challenges. A little disclaimer first, I mostly work with Next.js, Webpack, and TurboPack, so everything is from the perspective of these tools. And there are still other tools outside, and they have similar things, different implementations. And although most of the ideas are not really new, it's more inspired by other tools and yeah.
The first topic is mostly about long-term caching, which is really not very known by many people. And so what is long-term caching at all? So long-term caching means we want to leverage the browser cache, so the memory cache in the browser to store our resources, and especially between deployments. So there are basically three levels, or three practical levels of leveraging browser cache. The first one is max edge caching, where you just specify my resources are valid for two hours, and you don't have to check that again, and you can just use the cache for two hours. But in practice, it's pretty much unsuitable for our case of application, because we might have a critical bug fix to fix, and we want to deploy something, and we don't want to wait two hours until the user actually gets a bug fixed. So we don't want to use that at all. And what we want to use is like e-tech caching, for example. And e-tech caching means basically when the server responds with the resource, it sends a special header, e-tech, which usually contains a hash of the content, and then the browser stores that in this cache, and basically, in the cache. And you also want to specify three valid dates, so like the next time the browser wants to use the resource, it just does a new request for that, but it includes a special if-not-match header, which includes the e-tech, so the hash of the content, and then the server might, if the resource didn't change in the meantime, it might respond with a special status code, like, hasn't changed, you can just use the cache, and you don't need to download it again. And that basically always works, that's great. But it always also re-validates the request. So it basically sends a new request, you have to pay the round-trip, but you don't have to pay the download cost. So it's good, but you can do better.
2. Handling Immutable Caching and Content Hashes
The best way to handle caching for static resources is through immutable caching, where the browser can cache the resources indefinitely. To ensure consistency, a unique URL with a content hash is used, allowing for easy updates without breaking the cache. To achieve deterministic builds, the bundler must generate the same output for the same application, while also ensuring that small changes result in small output changes. However, handling content hashes becomes more complex when there are references between different parts of the application. Webpack and Next.js have made progress in solving these challenges, but the issue of content hashes remains.
The best one, I think, is at least for static resources and for that stuff, is immutable caching, which means you send cache control immutable and a few other headers, and that means that the browser can cache it forever, never have to do a round-trip, never have to request it again, just can store it forever, usually one year or something.
But it only works, basically, if it stores it without re-validating forever, you basically can't change the content of the resource, because if you change it, then it might be inconsistent, and browsers might have still it cached, it doesn't work.
So usually you tackle that by making the URL of that unique in a way that it never changes. So usually the thing is that just add a content hash into the URL, you might saw that with file names having this hash attached, and that makes the URL that unique that it will never change and if you deploy a new version, it will just get a new URL with a new hash.
Yeah, that would be the best one. So how do we face that from a bunch of levels? So the challenge can be solved with a few different techniques. So one thing is we want to make the bundler in a way that it's generating deterministic builds. So a build should, if you build the same application, it should just generate the same output asset so that the cache can actually be used. If you would generate different output assets, then you can't use the cache. But you also want another property. You want this property that even if you do a small change to your application, which you usually do, like in every pull request or whatever, you want a property that a small change results in a small output change. If you only change one module, you might only expect one or few chunks change in the output bundle. And yeah, that's sort of the way that we can generally use our browser cache. Now we want to use this immutable caching thing, so we won't just put a content hash on every source or every file name we emit from the bundler. It sounds pretty easy. You just hash the content, add it to the file name. But it gets a little bit complicated because there are actually references between the different things in your application. So like an example, HTML references your chunks, the chunks reference each other, maybe for async loading and that stuff. And chunks also reference assets, like images, fonts, that stuff. And so that's where the problem comes in. So yeah, so we basically solved these first few things with Webpack in the current state with Next.js. So to make deterministic builds, we just be careful implementing that and try to avoid absolute parts, basically avoid absolute parts. And to make it independent of these changes where you clone your repository to a different directory and all that stuff. And that's pretty easy, actually. And the more difficult one is this property of small input change, small output change, where you have to consider every algorithm to make it actually not having this whole application effect. Like module IDs, we can't really number them one by one, we have to... Because if you number it one by one, inserting one module at the start would rename all the modules not to the property we want. So making usage of hashes to generate module IDs, and also to chunk your modules into chunks, you have to make it deterministic in a way that small changes were turned into small output changes. It's also relevant for optimizations, like mangling and that stuff. In general, we solved a few things, but let's look into this content hashes problem.
Comments