Rethinking Bundling Strategies

We take a look at different challenges and decisions when bundling code for web applications. We look at how these are commonly solved and why we need to rethink them.

This talk has been presented at React Day Berlin 2023, check out the latest edition of this React Conference.

Watch video on a separate page

FAQ

Tobias Cobbers is the creator of Webpack, a popular module bundler for JavaScript. He joined Vessel and contributed to improving Webpack for Next.js. Currently, he is working on TurboPack and integrating it with Next.js.

TurboPack is a new tool developed by Tobias Cobbers that aims to improve upon Webpack's features and integrate seamlessly with Next.js. It focuses on efficient bundling strategies and better caching mechanisms to enhance web development workflows.

Tobias Cobbers highlighted two main challenges in bundling: ensuring deterministic builds and managing small input changes to result in small output changes. These challenges are crucial for effective long-term caching and minimizing the impact of updates on bundled resources.

Long-term caching in web development involves storing web resources in a browser's cache to improve load times and reduce server requests. It utilizes techniques like immutable caching, where resources are cached without revalidation, and e-tag caching, which allows browsers to check if the content has changed before downloading it again.

Webpack addresses content hash dependencies by using a manifest file that lists all the chunk hashes. This prevents changes in one part of the application from affecting unrelated parts, thereby optimizing caching and minimizing the need to re-download unchanged assets.

TurboPack proposes improvements such as more efficient handling of module fragments and exports, reducing unnecessary code in bundles, and optimizing the generation of module graphs to focus only on used exports. This leads to faster builds and more efficient application performance.

Effective code splitting strategies involve isolating changes to specific entry points or pages, ensuring that changes in one part of the application do not impact others. This can be achieved through heuristic methods such as separating node module dependencies from application code to leverage long-term caching more effectively.

Tobias Koppers
Tobias Koppers
32 min
08 Dec, 2023

Comments

Sign in or register to post your comment.
Video Summary and Transcription
The talk discusses rethinking bundling strategies, focusing on challenges such as long-term caching and improving the state of Next.js and Webpack. It explores handling immutable caching and content hashes, optimizing asset references and page manifests, and addressing issues with client-side navigation and long-term caching. The talk also covers tree shaking and optimization, optimizing module fragments and code placement, and the usage and relationship of TurboPack with Webpack. Additionally, it touches on customizing configuration and hash risks, barrel imports and code splitting, and entry points and chunking heuristics.

1. Rethinking Bundling Strategies

Short description:

I'm Tobias Cobbers, the creator of Webpack. Today, I want to talk about rethinking bundling strategies, focusing on two challenges in writing bundlers. The first challenge is long-term caching, leveraging the browser cache to store resources between deployments. The second challenge involves improving the current state of Next.js and Webpack. Let's dive into these challenges and explore how we can do better.

Thank you. Yeah, I'm actually talking about rethinking bundling strategies today, and my name is Tobias Cobbers. I created Webpack 11 years or 12 years ago, and two years ago, or three years ago, I joined Vessel and worked a little bit on Next.js, improving Webpack for Next.js.

Now I'm working on TurboPack and integrating Next.js with TurboPack. My talk is actually a little bit more general-facing, so I want to talk about a few things. I want to look at two different challenges in writing bundlers. We're actually looking at the magic in bundlers. So I grabbed two topics for that, two challenges that I currently or in the future will face with building TurboPack. And I want to go a little bit deep into that because I think learning these bundler magic can be important, even if you technically should not face it in your day-to-day job. The bundler should make it transparent and should not face you with all these challenges. It should just solve it magically. But I think it's still useful to know it, and you get some deep insight of that, and it may help you in a few edge cases.

First, I want to present these two challenges, and then go into the current state with Next.js and Webpack for that. And after that, I want to spend a bit of time rethinking that and how we can improve on that, what we can do better in the future, and what we actually want to do on TurboPack with these challenges. A little disclaimer first, I mostly work with Next.js, Webpack, and TurboPack, so everything is from the perspective of these tools. And there are still other tools outside, and they have similar things, different implementations. And although most of the ideas are not really new, it's more inspired by other tools and yeah.

The first topic is mostly about long-term caching, which is really not very known by many people. And so what is long-term caching at all? So long-term caching means we want to leverage the browser cache, so the memory cache in the browser to store our resources, and especially between deployments. So there are basically three levels, or three practical levels of leveraging browser cache. The first one is max edge caching, where you just specify my resources are valid for two hours, and you don't have to check that again, and you can just use the cache for two hours. But in practice, it's pretty much unsuitable for our case of application, because we might have a critical bug fix to fix, and we want to deploy something, and we don't want to wait two hours until the user actually gets a bug fixed. So we don't want to use that at all. And what we want to use is like e-tech caching, for example. And e-tech caching means basically when the server responds with the resource, it sends a special header, e-tech, which usually contains a hash of the content, and then the browser stores that in this cache, and basically, in the cache. And you also want to specify three valid dates, so like the next time the browser wants to use the resource, it just does a new request for that, but it includes a special if-not-match header, which includes the e-tech, so the hash of the content, and then the server might, if the resource didn't change in the meantime, it might respond with a special status code, like, hasn't changed, you can just use the cache, and you don't need to download it again. And that basically always works, that's great. But it always also re-validates the request. So it basically sends a new request, you have to pay the round-trip, but you don't have to pay the download cost. So it's good, but you can do better.

2. Handling Immutable Caching and Content Hashes

Short description:

The best way to handle caching for static resources is through immutable caching, where the browser can cache the resources indefinitely. To ensure consistency, a unique URL with a content hash is used, allowing for easy updates without breaking the cache. To achieve deterministic builds, the bundler must generate the same output for the same application, while also ensuring that small changes result in small output changes. However, handling content hashes becomes more complex when there are references between different parts of the application. Webpack and Next.js have made progress in solving these challenges, but the issue of content hashes remains.

The best one, I think, is at least for static resources and for that stuff, is immutable caching, which means you send cache control immutable and a few other headers, and that means that the browser can cache it forever, never have to do a round-trip, never have to request it again, just can store it forever, usually one year or something.

But it only works, basically, if it stores it without re-validating forever, you basically can't change the content of the resource, because if you change it, then it might be inconsistent, and browsers might have still it cached, it doesn't work.

So usually you tackle that by making the URL of that unique in a way that it never changes. So usually the thing is that just add a content hash into the URL, you might saw that with file names having this hash attached, and that makes the URL that unique that it will never change and if you deploy a new version, it will just get a new URL with a new hash.

Yeah, that would be the best one. So how do we face that from a bunch of levels? So the challenge can be solved with a few different techniques. So one thing is we want to make the bundler in a way that it's generating deterministic builds. So a build should, if you build the same application, it should just generate the same output asset so that the cache can actually be used. If you would generate different output assets, then you can't use the cache. But you also want another property. You want this property that even if you do a small change to your application, which you usually do, like in every pull request or whatever, you want a property that a small change results in a small output change. If you only change one module, you might only expect one or few chunks change in the output bundle. And yeah, that's sort of the way that we can generally use our browser cache. Now we want to use this immutable caching thing, so we won't just put a content hash on every source or every file name we emit from the bundler. It sounds pretty easy. You just hash the content, add it to the file name. But it gets a little bit complicated because there are actually references between the different things in your application. So like an example, HTML references your chunks, the chunks reference each other, maybe for async loading and that stuff. And chunks also reference assets, like images, fonts, that stuff. And so that's where the problem comes in. So yeah, so we basically solved these first few things with Webpack in the current state with Next.js. So to make deterministic builds, we just be careful implementing that and try to avoid absolute parts, basically avoid absolute parts. And to make it independent of these changes where you clone your repository to a different directory and all that stuff. And that's pretty easy, actually. And the more difficult one is this property of small input change, small output change, where you have to consider every algorithm to make it actually not having this whole application effect. Like module IDs, we can't really number them one by one, we have to... Because if you number it one by one, inserting one module at the start would rename all the modules not to the property we want. So making usage of hashes to generate module IDs, and also to chunk your modules into chunks, you have to make it deterministic in a way that small changes were turned into small output changes. It's also relevant for optimizations, like mangling and that stuff. In general, we solved a few things, but let's look into this content hashes problem.

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Building Figma’s Widget Code Generator
React Advanced 2022React Advanced 2022
19 min
Building Figma’s Widget Code Generator
This Talk introduces Figma's Widget Code Generator and demonstrates how to build a FigJam widget using it. The speaker discusses the implementation of voting functionality, avatar functionality, and remove vote functionality. They also explain how the Widget Code Generator plugin works and how to access properties and modify names using the Figma plugin API.
Start Building Your Own JavaScript Tools
JSNation 2023JSNation 2023
22 min
Start Building Your Own JavaScript Tools
[♪ music ♪ by The Illuminati plays)] I see a common thread across any project I work on. Different developers are making the same mistake and we have preferred ways of doing things. Preventing mistakes and sharing best practices are great reasons to look at tools like linters and in particular ESLint. Let's write our first rule together. We're just scratching the surface of building our own tools, which can have a massive impact on improving the developer experience.
Advanced linting rules with ESLint
TypeScript Congress 2023TypeScript Congress 2023
10 min
Advanced linting rules with ESLint
Tibor Blanesy from Sonar presents advanced techniques for linting with ESLint, including the use of ControlFlowGraph to detect errors in code. The algorithm is based on liveness analysis, which identifies live variables at any point in the program. Additionally, the talk covers the computation of block sets using the difference between outset and kill set unionized with genset.
How not(!) to Build Real-time Apps
Node Congress 2024Node Congress 2024
10 min
How not(!) to Build Real-time Apps
Today's Talk discusses different approaches for implementing real-time updates in server-side applications, including application-level updates and polling. The drawbacks of polling include inefficiency and complexity at scale. Adding extra infrastructure, like messaging systems, can ensure scalability but introduces operational overhead. Prisma Pulse is a system that simplifies change data capture, providing an easy setup for subscribing to database changes and solving scalability issues.
Is Bun 'Actually' Faster?
React Advanced 2024React Advanced 2024
24 min
Is Bun 'Actually' Faster?
Welcome to React Advanced London. Today's agenda: what is bun? Features, benchmark, demo, key takeaways. Bun provides a better developer experience, faster CI, and decreased app file size. Building bun on top of Zig prioritizes performance, stability, and compatibility. Install bun with npm or download the binary from the official website. Bun is a comprehensive development toolkit with a fast and efficient package manager. Learn how to benchmark CLI commands using Hyperfine with flags. Compare different package managers. Bond is fast and provides a few features in its package manager: install, add, remove, update, link. Test the bond test runner against vtest and jest using hyperfine. Discover the features supported by the test runner and the importance of bundling in JavaScript applications. Learn about the main.js code, the performance comparison of different bundlers, and the features offered by Bun, including loaders and macros. Explore macros in Bonn and how to optimize the code by including the type macro. Discover the benefits of Bonn, such as reducing CPU usage and fast installation of node dependencies.
Building a Network Stack for our Browser Extension
Node Congress 2024Node Congress 2024
19 min
Building a Network Stack for our Browser Extension
The Talk discusses the development of the Jam browser extension, which is a bug reporting tool. It explores the challenges of messaging between different execution environments within a browser and the need for message chunking to overcome size constraints. The Talk also explains how the development team rebuilt the system using a TCP/IP network stack approach, which allowed them to solve messaging difficulties similar to networking problems. The benefits of this approach include a smoother rollout, simpler debugging, and a focus on feature development without worrying about messaging constraints.

Workshops on related topic

Build React-like apps for internal tooling 10x faster with Retool
JSNation Live 2021JSNation Live 2021
86 min
Build React-like apps for internal tooling 10x faster with Retool
Workshop
Chris Smith
Chris Smith
Most businesses have to build custom software and bespoke interfaces to their data in order to power internal processes like user trial extensions, refunds, inventory management, user administration, etc. These applications have unique requirements and often, solving the problem quickly is more important than appearance. Retool makes it easy for js developers to rapidly build React-like apps for internal tools using prebuilt API and database interfaces as well as reusable UI components. In this workshop, we’ll walk through how some of the fastest growing businesses are doing internal tooling and build out some simple apps to explain how Retool works off of your existing JavaScript and ReactJS knowledge to enable rapid tool building.
Prerequisites:A free Retool.com trial accountSome minimal JavaScript and SQL/NoSQL database experience
Retool useful link: https://docs.retool.com/docs