If you just go with an AETH approach, and just put content hashes and everything, references in the normal way, then you get this property where you have an asset here, like an image, include a content hash, so it's basically hashing the content, the file name is based on an image, ABC, with a hash. That means, because AsyncChunk2 references in this case, we have to embed the URL of this asset into AsyncChunk2. And that basically means the hash of asset becomes part of AsyncChunk2. And AsyncChunk2 is also hashed and gets a name, and that basically happens for everything.
Now the important thing, if something changed, like I changed this font file to have more subset of fonts, whatever, then this asset, of course, it changes, get the new URL, and the problem is now, the new URL needs to be embedded into AsyncChunk2, and that means this chunk also changes and gets a new URL, and that needs to be embedded into AsyncChunk1. So basically the problem is that the change bubbles up your dependencies, your reference graph in your application, and in the end it means the whole graph will invalidate just because you changed the leaf of the graph. And that's not the property we want, we want something else. And Webpack has a solution for that.
So what we do in Webpack is instead of putting references of chunks in other chunks, we just put out all the file names, all the URLs, all the content hashes of chunks into a manifest, which is embedded into the Webpack runtime, and reference that from that. In our graph it looks like that, where you are... Basically all the chunk hashes are in the manifest, so the chunks don't reference each other directly, it's basically referencing them indirectly instead. And that changes the property where, if you change asset, it still bubbles up to AsyncChunk2, but then it stops bubbling up to all the other chunks. It basically only bubbles up to the manifest file that changes, and HTML will change, which will always change. But we basically can keep all the chunks, the unrelated chunks cached, and that gives a lot of benefit for hashing.
But there are still issues with that. An example in a multi-page application, where if you have many, maybe thousands HTML files, and you want to client-side navigate between these files, which is what you usually want in an example in an XJS application, you need one runtime, because then you want to share modules between all that stuff, and you want to sustain between that stuff. So you basically have the problem that you have a single runtime manifest file, and now you see the problem. Everything is referenced, all the chunks in your thousand pages are referencing the manifest file, and that gives you the problem, like, if you, on page B, you changed your image, it bubbles up to AsyncChunk4 in this case, and then it bubbles up to the manifest, and that will invalidate all your HTML files. And that's okay, it works, and we used to do it for, like, years, but I think we can do better.
So I spent a bit of time on this very obvious change you can do here. Instead of doing a global manifest with all this stuff, you just make a pair page manifest. And that changes the property that now your pages are isolated between each other, and you don't have hash dependency between each other. Now you can change one page, or change a module on one page, and it only invalidates one of the HTML files. That is a simple change, but it has a big impact. You still have the property of, like, initial page load is fast and cached and content-hashed. But now you have independent pages. Pages can be independent, there are no hash dependencies between each other. That has benefits for long-term caching, but it also has benefits for build caching where you can just cache HTML files between deployments if they don't change. But there's a trade-off. Now you have to do something for client-side navigation.
Comments