It could be, for instance, a design system, which is a common one, but also some authentication logic libraries or things like that, where a lot of our projects depend on it. And so if we change that, we will hit very often a worst case scenario where defected kind of projects are almost entire graph, because like that entire project depends on that. And so what is being done there usually is to distribute. And so rather than just basically parallelizing things as much as possible within the same machine, what we're trying to do is distribute them across different machines.
And so here you can see like here, distribute them across machine one, machine two, machine three. What you can see here though, is it is a very uniform kind of distribution, right? And so that's usually what happens if you manually set it up. Because distribution is hard, because like you need to code it somehow. And so a very straightforward, potentially naive way of doing it is, you just cut it by the different tasks that you have, and then basically run it through different machines.
And so here you can see like, the different running times might lead potentially to a low efficiency, because some tasks might take a long time, and therefore the entire run takes longer, but other machines are already idle, because their task, like let's say the linting was quicker, and they're already done. There's also the number of machines is kind of static. So usually you define the number of machines, and then basically that is the number that is kind of like nailed or fixed there. And you need to kind of tune it over time as your monorepair structure obviously gets bigger and grows. And there's also complexity potentially in spinning up those machines, depending on your project or CI provider that you're using.
And so very often what I see is like, scripts that are being done on CI. For instance, NX has a programmatic API, so you could compute those affected nodes programmatically, and then kind of try to shuffle them in some intelligent form, where you kind of distribute them across different machines and dynamically even generate pipelines. So this is an example for GitLab, which allows you to dynamically spin up nodes, and therefore have some sort of distribution going on. But it's a very static thing as you can see, because it's kind of hard-coded into your code, and cannot really adapt to a changing monorepair structure. So this needs a lot of maintenance.
And so having seen some of these difficulties that team face in creating these custom scripts, we've been looking into what we call NX agents, basically to help with that distribution, specifically to set up the machines, but also with the dynamism of the distribution itself, so that you don't need to tune it continuously as your monorepair structure grows, but the distribution would happen dynamically based on the number of tasks that's being run, but also based on the kind of running time of these tasks based on historical data basically.
And so the NX agents tries obviously to leverage the project graph, because it has access to the NX graph behind the scenes, and so it knows how the projects are being structured, what dependencies there are, which is a major important point when you distribute, such as to distribute them in a way that is efficient. And you describe what you want to run, so you don't describe how you want to run them, but you rather just say, I want to run the affected task build and linting, and 10-to-end tasks in a certain way, but you don't define how they're being distributed.
And one big part that comes with the first version of the NX agents is like the whole dynamic scaling aspect, so spinning up more machines depending on the PR, but also fine-grained distribution and also flaky task rerunning. Now, we specifically focused on tests and 10-to-end tests right now, but in theory, we can detect flaky tasks and automatically rerun them on CI. So let's have a look. How does it describe what you run look like? Well, all you need to do basically for the distribution to activate is basically this line. So specifically the dash-dash distributes on, where you give the information, I want to distribute my tasks that are being run right after on 15 machines, in this case, of the noted Linux Medium plus JS. And so those are in nomination of names of machines based with a certain characteristics, which you obviously can define based on the needs that you have. It's kind of like a Docker setup almost. And then you run the actual commands. And so here you can see how you describe what to run and not how.
Comments