Video Summary and Transcription
This Talk discusses scaling a React app without micro-frontend and the challenges of a growing codebase. Annex is introduced as a tool for smart rebuilds and computation caching. The importance of libraries in organizing code and promoting clean architecture is emphasized. The use of caching, NxCloud, and incremental build for optimization is explored. Updating dependencies and utilizing profiling tools are suggested for further performance improvements. Splitting the app into libraries and the benefits of a build system like NX are highlighted.
1. Scaling React App Without Micro-Frontend
Welcome to my talk on scaling your React app without micro-frontend. I'll discuss the problem of scaling up a codebase, my journey to fixing it, and what I've learned along the way. When a codebase grows, the CI becomes slower, causing a slower feedback loop for developers and unhappy users. In my case, the build time reached over 30 minutes, which was unacceptable. I tried over-engineering the CI, but it became difficult to manage. Then, I discovered Annex and learned how to use it properly to only rerun things that have changed.
Hi everyone, welcome to my talk on scaling your React app without micro-frontend. Very quick disclaimer first, this talk is about scalability of your codebase on a developer perspective and not on user-facing performances. That's it for the disclaimer.
Hello again, I'm Jonathan Wagner I'm an engineering manager at DataUK and I've been working for the past 4 years on a bit more than 10 projects in production and for the last 8 months I've been working with NX, which is a build system helping you build faster and do things faster. That's going to be the core of the talk here. Let's have a look at what we'll talk about.
First of all, discuss a bit of the problem, what happens when you scale up codebase and then my journey to fixing it and what I've learned along the way, what I've tried, what didn't work, what worked. So first of all, the problem. I mentioned, having a codebase that grows, that means your CI becomes a bit slower and when it becomes slower, it means you have a slower feedback loop, which means your developers are unhappy, it takes longer to develop features and in the end your users are unhappy, which we definitely don't want.
So when your codebase grows, you have type checking, you have ESLint, you have maybe some dead code tests, some unit tests, end-to-end tests, a lot of testing and the build time and then everything's taking a bit of time and it adds up and in my case it added up to more than 30 minutes and that's my trigger. Ideally I want my site to be 10 or 15 minutes, when it reaches 30 minutes it means something's gone terribly wrong and I want to address it. In this case, we cannot even see the buildable time, it's just skyrocketing, it's going to 3-4 hours, which means it's costing a lot of money to the company and just frustrating for everyone.
Let's try to look at a more precise example here, so we have a growing codebase which means we may have maybe 800 tests. Imagine you have a pull request where I do one line change, you push everything and the CI runs everything again, so that means you have to wait 20 minutes, maybe 30 minutes for the test to pass. Does it sound normal to you that a one line change triggers 20 minutes running the CI? I don't think so. And nx doesn't think so either. From the nx docs they actually say, the duration of the invoked operation should be proportional to the size of the change, and something very strong. It seems simple but it's quite tricky to put in place. It involves a bit of caching, a lot of caching and then putting all that together properly. But I didn't know that at first.
So obviously I tried to just over-engine my CI and that meant doing a lot of parallelisation, putting some tests up to skip frontend or backend tests, depending on what was changed. And that meant writing a lot of custom rules, which were tricky and introduced a few regressions. And even like switching TypeScript compilations to something based on Rust like swc. And doing all of this, I got some improvements, got down to 20 minutes each but it meant I had a tricky CI to manage. We had 27 different jobs and it meant understanding how they all played together, which ones were conditional on which, and it became just tricky to manage and to maintain. So that's the start of where I wanted to go afterwards. I think we spent hours and hours optimising the CI. Where do we go from there? How do we make it faster without overengineering it more? Here comes the journey. Discovering Annex, discovering a bit more about how to use Annex properly, what the secret trick is, and how the caching puts everything and stitching everything together. As I said before, the main idea we want to aim for is only rerunning things that have changed.
2. Annex and the Concept of Libraries
Annex does smart rebuilds, only on the things that have changed and have been affected. It uses computation caching and helps you generate things so you don't have to do them every time. The secret trick is to use libraries everywhere. A library is simply a folder where you can execute operations specific to the code it concerns. Libraries help organize the code in the folder structure and can be easily generated. When a change occurs, Annex only tests and relints the affected projects, following the core principle of testing only what has changed. Libraries also promote clean architecture by forcing developers to consider where to put their code and prevent spaghetti code.
That's basically what Annex does. It does smart rebuilds, only on the things that have changed and that have been affected. For that, it uses computation caching, and it helps you generate things so that you don't have to do them hand in hand every time. Basically, Annex was already set up on the project. We weren't using it for the build system. We were just using it for the monoreport management. And when I realized how much it could do, that was the open door to so much more.
So, now that we know Annex can do all of this, how does it actually work? I've mentioned a secret trick. It's actually not so secret. It's advertised everywhere on the Annex documentation. The idea is to use libraries everywhere. You might wonder, what is a library in Annex? It's simply a folder where you can execute a few operations, like testing, linting, basically anything you want. It's just about the code that it concerns. So, if we look at, for example, our project, we had here the front-end, which was the app that was actually deployed and used by our users. And then everything else here below are libraries. So, the app is using libraries. And then it gets deployed and the libraries are there for a way of organizing the code in your folder structure. And the thing is, you can generate them easily so that it's just a common way to create a new one and doesn't cost you much. And what we can see here as well is that, in pink, it's highlighting the projects that have been affected by change. So, in this case, I made a change in the design system and it shows me that a few libraries have been affected and the app as well. So, when I run my test or when I run the lint, it's only going to test and relint those pink projects, so the libraries and the app. It's not going to touch anything else, because it hasn't changed. So, it doesn't need to test it, and that's the beauty of it. That's the core principle.
The concept of libraries might still be a bit blurry. So, let's have a look here at what it means concretely. So, this is an example repository from NX, where you have basically a couple apps, cards, products, and then a few libraries. And if we zoom in on the libraries, it's actually just a nested config, a jest config, and then a TypeScript config, and then all your source code in the src folder, and it's as simple as that. There's not much more, and it's everything you need to be able to run your operations in each of those folders. The side bonus that you have from using libraries is that it forces you to have a clean architecture, because you have to ask yourself the question, where should I put my code, where should I put it into the same library, should I create a new library? It's a question that because you ask yourself, it forces you to trigger a discussion, think about it, put it in the right place, and it saves you some time and prevents spaghetti code from happening in the future.
3. Caching, NxCloud, and Incremental Build
So NX provides advanced caching that looks at source files, operations, and runtime options. You can put the cache in your CI, but it may not be optimized and cannot be shared with local developers. NxCloud, developed by the Nx team, sets up caching and task execution optimization easily. Incremental build allows reusing library outputs, but I struggled to achieve the expected performance. Building from source took about 60 seconds, and adding incremental build added 20 more seconds. A custom webpack config may have affected the results.
So it's basically making you win time just by using NX, but this concept of libraries, it's very nice to say like this, but in order to work, as we saw here, in order to know that something has been built already, you need to store that information somewhere, and that's where caching comes in. So NX provides you with some advanced caching where it's looking at all the source files, the operation that you're running, so if it's test, you have the the key being test, and you can have the build or anything else, the app that you're building, or the library, and then some other runtime options or configurations that you have for React, Chest or something else. And so it means NX has a big table with a hash key, and then the output it's supposed to give you in terms of files and in terms of console output in your domain.
So knowing all of this I knew there was some caching. We had a cache folder, and I thought to myself, okay, can I put the cache in my CI and use it just like this? The answer is yes, you can do that. It's I guess the first step you can try, and then it means you download all the cache every time you've been a pull request. In some cases it can grow and grow and grow to up to a gigabyte, which means it takes one or two minutes to download and unzip, and sometimes you you might spend those two minutes downloading and unzipping, and in the end you don't even need the cache because it's not relevant to the pull request you have. So it's not that optimized, and it cannot be shared with the local developers either and you can do distributed task execution. I'm mentioning all this because that's something that NxCloud does, so NxCloud has been developed by the Nx team as well and it basically sets all of this up very easily so that you just have to run a command in your in your terminal. It prepares everything, you don't have to sign up, and it just works in the ci. It works locally for you whenever you build something and another developer tries to build it, it gets the same cache because it's shared between developers and it optimizes the order in which it runs tasks so that everything gets faster. So basically you should use NxCloud, it has an amazing feature, it's easy to set up and it brings so much so much benefits and it's so simple to use.
But so we've looked at the libraries, we've looked at the caching to stitch everything together. Let's try to see what I've tried and where I actually struggled in all of this. So here comes the learnings. I've played a bit with incremental build and with trying to split an existing large codebase. So let's have a look at what happened here. So the concept of the incremental build is that you want to reuse the outputs of your libraries. So the initial behavior, like the default behavior is that when you want to build your frontend app you use a tool like webpack and Babel to transform your TypeScript files into JavaScript and then your JavaScript files into another version of JavaScript understood by your browser. And then perform some minification to make it a bit smaller, some tree shaking to remove what you don't use and that's basically all the bundling happening from within webpack so that you have a small package that can release to new users. The new behavior suggested by the incremental build from NX, instead of building everything when you need it you build incrementally each library. That means everything the first time you you want to build it you build all your dependencies and then when you make a change let's say to the design system you rebuild the design system, the UI, the web testing, the frontend domain and you reuse the previous build from the other ones and you basically bundle everything back together in the frontend app. It's supposed to go faster and that's what NX tells you on the documentation where they show this graph where in blue we can see the build time for a normal build and then for a cold run with the incremental and then another run this time warm for the incremental build. I've tried to set this up and I didn't have quite the same results. Basically, what we can see here is that for the building from source it was taking about 60 seconds cold and warm, no differences, and then when adding the incremental build I had 20 more seconds. It seems a bit normal because I have to build all the libraries first but then I expected that when it's warm I don't have to be building the libraries I just have to build the app again. So I would expect this line here to be much faster. I looked into it a bit more and I discovered a few things. In my configuration we had a custom webpack config where we were setting up some more bubble loaders, some CSS loaders and others. The assumption I made is that I was basically building my libraries independently beforehand and then when building the app it was taking the output of my libraries and building that again.
4. Optimizing Build Time and Profiling
Doing everything twice didn't provide the expected benefits, so I explored other options. Updating our dependencies, specifically BrowserList, significantly reduced build time from 12 minutes to six minutes. Additionally, NX's profiling tool helped identify areas for further optimization, such as splitting the design system into two separate systems.
So basically doing everything twice. Hence the 60 seconds we had here. So I tried to do something smart and just exclude from the bubble section this folder, so the output for my library so that it wouldn't build them again, and I saw some improvements. So we're around 15 seconds here between the first incremental try and the second one, which is a good start but given the time I've invested in making all the libraries buildable and then understanding the problems we had with Webpack and trying to optimize that doesn't seem like a good investment because basically I spent hours on it, I had build errors everywhere and don't see any benefits.
So I'm sure my Webpack config can be optimized much better and I can surely get it to look like what NX has, but is it really worth hours and hours to optimize it more? In my opinion, not yet, maybe at some point but so far we have other ways to make our build faster. One of them actually, which we discovered along the way, was just simply updating our dependencies. So this is like a funny little story about BrowserList, which basically uses a dependency called canIuse which decides which polyfill your browser needs depending on which version you want and by updating BrowserList, it updated canIuse which basically said we don't need those polyfills, we don't need to build for ES5 anymore and that means we just need to build the ESM format and that took down our build time from 12 minutes to six minutes which took us a couple hours and gave us way faster benefits than the incremental builds I mentioned just before.
Another thing I discovered along the way was a profiling tool from NX where you can see and zoom in on what's happening behind the scenes. So when triggering a build of basically everything, you can see all the dependencies here and you can see in what order they happen and what happens in parallel and it can show you if you should be putting more threads in parallel or which one you should be splitting. Like in this case, we can see the built, the design system is quite big. It's taking six seconds. Maybe the next candidate to be split into two design systems and this time, we could have like a design system specific for the forms and another design system for everything else. So it's giving you insight on what you could refactor next and what you could be doing. So yeah, very good tool.
5. Splitting Your App and Benefits of a Build System
To split your app effectively, aim for about 20% of code in the app and 80% in libraries. Starting with a new project makes creating libraries easy, but it becomes challenging with an existing project. Moving components one by one and updating imports is crucial. Adapt the splitting process to your teams, starting with one library and allowing it to grow. Adding NX to a large project takes time and effort, but it's worth it. Having a build system like NX alleviates the pain of refactoring and improves CI. Start splitting your app now for easier management. Contact me for any questions.
Next up is how to split your app. I mentioned having lots of libraries. It seems very simple and easy like that, but what kind of proportion should we be looking at? NX actually recommends to have about 20% of your code in the app and 80% in your libraries. What that means is basically that put as much possible code and then you play with your libraries. It's a bit like what you do when you use external libraries like for npm, but in this case, it's your code that you've written, so it's a bit safer.
That sounds very nice said like this, but how does it look when you actually do it on your project? Let's say you have two scenarios. When you start on a project, you can create new libraries very easily and you have a command to automate that. It's very simple. You can create libraries for features, for your UI elements, for data access, utilities, and as many as you like. You could have a thousand libraries and it would just be so fast. You need to build a tiny proportion of something when you change your app. But it becomes much harder when you have an existing project.
Let's say you have a few hundred thousand lines and you want to start splitting it down. I tried to do that, and I had a flow of a few pages that I knew I wasn't touching and I was testing and building again every time whenever I made a change. So I wanted to take that out and basically not test it again because I knew it wouldn't change. So I tried to do that, just creating a new library, move the code into that library. And then I saw errors everywhere. It wouldn't build, it wouldn't compile, it had import from the library to the app, which meant some things were cyclic, which meant problems everywhere. And made me realize that if you want to split it, you have to split the leafs of your tree first. So instead of moving all the pages, I had to move the components one by one first. And every time you move one, you have to update the imports, fix everything, make a commit, try, make sure everything works. So it's not a simple job of just, okay, I'm going to create two libraries, split my code in two, and I'm going to benefit from Linux straightaway. You're going to have to go around it bit by bit. And my recommendation for this is actually to try and adapt it to your teams. So if you have domain teams, for example, like let's look at the Spotify app, where you have podcasts, you have radios, you have playlists, each of those contain a lot of different features, but the first step would be that each team works on their library, and then they might have some overlap, they might have more than one library, but the idea is that they start with just one, and then it improves and it grows. So yeah, starting with a large project, adding Linux to a large project, you get some benefits straight out of the box, but to get the full power, it takes a lot of time. It doesn't come just by snapping your finger. It's going to be a lot of refactoring, a lot of effort, and a lot of training your teams. So yeah, just be aware of that. But if you just have a quick conclusion on that, having a build system is the next step after optimizing your CI. When your code base grows, it's basically taking the pain away from refactoring it and improving your CI. NX is a very good candidate for that. It's not the only one, but in my experience, it's been a delight to use. When you have a large app, splitting it is hard. But the faster you start, the easier it gets. So my advice is you should start now, and let me know if you have any issues. Thank you, everyone. Feel free to contact me if you have any questions, and see you in the Q&A.
Comments