English versionEN

How to Simplify your Codebase

Large, legacy codebases often suffer from tangled dependencies, lack of modular boundaries, and monolithic "barrel" files that bundle together many modules. This makes the codebase difficult to understand, modify, and scale. In this talk, we'll explore strategies for "untangling the barrel" and simplifying a complex codebase to prepare it for migration to a monorepo architecture.

We'll cover techniques for:

- Analyzing your code for cyclic dependencies

- Tools to help refactoring the code

- Establishing coding guidelines and automation to control codebase complexity going forward

This talk has been presented at Node Congress 2025, check out the latest edition of this JavaScript Conference.

FAQ

'Crappy texture' refers to a problematic software architecture where there are no clear module boundaries, leading to code repetition, cluttered flows, and difficulties in making changes and refactoring.

Common signs include unclear module boundaries, code repetition, cluttered flows, slow developer velocity, difficulty in refactoring, and unexpected bugs.

Reasons include time pressure, changing requirements, incomplete understanding, technological changes, and human factors where developers lack perfect knowledge.

Dead code increases cognitive load, creates confusion, adds maintenance burden, and can negatively impact performance and application size.

A barrel file is an index file from which multiple functions are exported, simplifying imports but potentially increasing import surface and cyclic dependencies.

Tools like linters can help identify unused code, but for exported functions, programmatic approaches using tools like TypeScript's AST and TSmorf can be more effective.

MADGE is a tool that can find circular dependencies in a codebase and generate call graphs to help visualize dependencies.

A good layered architecture ensures modularity, clarity, faster development, easier maintenance, and scalability.

Improving architecture can involve eliminating dead code, breaking cyclic dependencies, and reducing reliance on barrel files for clarity and efficiency.

Cyclic dependencies cause tightly coupled code, increase complexity, make refactoring difficult, and can lead to runtime errors.

Tally Barak

35 min

17 Apr, 2025

Comments

Video Summary and Transcription

This talk focuses on the challenges of working with large codebases, such as unclear module boundaries, code repetition, and cluttered flows. Dead code is identified as a major problem that adds clutter and cognitive load to developers. The abstract syntax tree (AST) is introduced as a tool for systematically accessing code. TSmorf is recommended for finding and removing dead code by traversing and manipulating the AST. Cyclic dependencies are discussed as another issue, and MADGE is suggested as a tool for identifying and resolving circular references. Barrel files are presented as a method for organizing code and resolving cyclic references. The process of replacing imports and removing empty calls is explained. The key takeaways include the importance of a good layered architecture, eliminating dead code, breaking cyclic dependencies, and reducing barrel files.

Available in Español: Cómo Simplificar tu Codebase

1. Understanding the Problem with Large Codebases

Short description:

This talk is aimed for people working on large codebases. The signs of bad architecture include unclear module boundaries, code repetition, and cluttered flows. These issues make it difficult to make changes, slow down developer velocity, and lead to unknown bugs.

So this talk, guys, is aimed for people that are working on large codebases. And large codebases that have been around for a few years and probably worked by multiple developers might look something like what might remind you of something that looks like here. And then you go to your large codebase and you go and you make a minor change that hopefully shouldn't do anything. But this is what's happening. Everything crashes. Your CI is red, your tests are failing, maybe the app doesn't build and so on.

And the technical term that is describing this situation is called crappy texture. And this is a sign that there's a problem with the architecture of your application. What are the signs that we see for bad architecture? The first one and I think the more critical one is that there are no clear module boundaries. We are not sure what module is doing, what it is responsible for, and so on. There are a lot of code repetition. Just because it's not clear what is doing what, you might find yourself writing the same code in different places. Hopefully, it works the same, but more likely it is doing different things. Cluttered flows.

If you're trying to understand a flow in the system, you might find yourself browsing through tens of files and trying to understand what is calling what, how does it go, where are the ifs and so on. And that makes it very hard to make changes. It slows down the developer velocity. It is hard to refactor. If you want to make change, even to try and improve something because you're trying to make it slightly better, you find yourself yak shaving and you can't really make the change. And eventually, you get a disease that we all suffer from, bugs that you have no idea where they came from.

2. Addressing the Problem of Dead Code

Short description:

Time pressure, changing requirements, incomplete understanding, technology maturity, and human factors contribute to the problem of large codebases. Instead of ripping everything off and starting from scratch, I suggest addressing specific issues. In this talk, I will focus on the problem of dead code, which adds clutter and cognitive load to developers.

Why is this happening? What are the reasons? The first one, and I think the most one that is common is time pressure. You know the term of let's make it quick and dirty. I can tell you that from a lot of years of experience of software. It is never quick and always dirty, and it's usually bringing a tech that takes a long time to fix. Requirements change. This is normal. This is part of the regular flow of software development, that you get a new requirement and then it changes. No longer we need the previous one, but maybe we don't have time to come and rethink about the whole way that we implement it. So we are just patching the software with different things that in different places saying, okay, this is what we are going to do. Incomplete understanding.

This is also happening. When you start, there's a new feature. Maybe you are still a relatively young company. You don't fully understand what is the feature going to be and what are the obstructions that we are going to, that we need in order to make it clear. Technology matures. You have seen that. You are working on, you start with a certain version of let's say Node and then it grows and all of a sudden you have ESM and you have async await and that sort of thing, but obviously you don't have the time to go back and refactor the whole codebase. And the last thing is, let's call it the human factors. Also developers are probably not born with all the knowledge to do the things the correct way, to write the code perfectly, and over time we mature and we see more and more things that we know how to improve.

Okay, so yeah, we have a problematic codebase, Tali. What do you suggest to do? Should we just come and rip everything off and rewrite from scratch? Well, that could be nice, but it's very likely that you're not able to do that. So instead, let me suggest something else. My name is Tali Barak. I'm a software architect at Ubiquit and this talk is called Log Stock and Barrel. The name will reveal itself later on and it is talking about restructuring large codebases. The first problem, I'm going to tackle three problems that can help you with creating a better architecture for your application and the first one is going to be about dead code. Dead code is code that is never executed. It exists in the system because someone put it there five years ago or so and no one have noticed that it is no longer being used. Why is that a problem? Because again, this is clutter. It creates a cognitive load on the developers.

3. Understanding Dead Code and the AST

Short description:

Dead code creates confusion, maintenance load, and impacts performance and size. An example of dead code is a function that is never used, even if it's exported. To systematically access our code, we use an abstract syntax tree (AST).

It creates a lot of confusion. You're trying to do a search for something and you find five functions that are answering it and then you start inquiring only to find out three hours later that this code is actually never used. That's obviously also a maintenance load for the same reason you're going and trying to fix something and then you realize, well, this is not doing anything. And also performance and size. If you're bundling or if you need to build and so on, the dead code is there. Your bundler or your application, your runtime is looking at it. It's trying to interpret it. But all of that is being done for nothing.

Let's look at an example of what a dead code looks like. Okay? So let's assume I have this file, somefile.js. It can also be JS and I have a function foo and this function is doing something, whatever. Somewhere down in the same file I'm calling foo maybe in multiple places and so on and everything looks great. The next thing I do is as part of the maintenance, the code changes, I remove these instances. Now this is easy because if I have linting on my application, it will actually yell at me and say, look, this is unused. You should remove it. So that works great. That will eliminate some dead code. But what happens in this case? What happens if this function is being exported? Even if it's not used on the file, my linter will be totally silent and will not tell me whether it is used or not. Okay? Because it doesn't know. Maybe it's another file. How do we fix it?

4. Using AST and tsmorf to Find Dead Code

Short description:

In order to access our code systematically, we use Abstract Syntax Tree (AST), which is the basis for every compiler. It represents our code as a tree of different items, such as function declarations, export keywords, identifiers, blocks, and more. Using an AST viewer, we can explore the details of each item. However, working with the AST directly can be challenging due to its tree structure. To simplify this, we can use a tool called tsmorf, which provides APIs for traversing and manipulating the AST. Let's import the tsmorf project to write the function that will help us find dead code.

In order to access our code systematically, we are using something that is called AST. This is the basis for every compiler. It's an abstract syntax tree. And this is the way our compiler or language service or everything is understanding our code. So at the left, I have a very simple function. Which basically doesn't do anything. And if I use a TypeScript AST viewer, and by the way, TypeScript AST, every time I'm talking about TypeScript, because this is a superset of JavaScript, you can use it also for superscript. It doesn't go the other way around. But if you're using JavaScript, it's perfectly fine. So this is my function. And this is the way the compiler sees it. They see it as a tree of different items. So we have function declaration, export keywords, identifier, blocks, string literal, and so on. Everything is defined here. And if we focus on a certain item, in this case I'm focusing on log, which is an identifier, I can see a lot more details about exactly where it starts, where it ends, how many children it has, and so on. And you can use this abstract syntax tree. This is something that TypeScript compiler, for example, is using for submitting, and also we have that for JS. But it is very hard to work with. It's a tree. And we know that when we need to traverse tree, we need to visit all nodes and so on. And for that, we have some beautiful magic. This is a tool that is called tsmorf. This is the URL where you find it. And it's actually an abstract of the APIs that are using to traverse AST file. So set up the navigation and manipulation of the TypeScript AST. As I said, also JavaScript. And this library wraps the compiler API. So cool. Let's use tsmorf in order to write the function that is going to find out that code. So the first thing we do, we need to import the project.

5. Finding and Removing Dead Code with TSmorf

Short description:

To find dead code in your project, you can use tsmorf and the AST. Start by adding all the source files you want to check, and ignore any files you don't want to include. Iterate through the source files and identify the functions in each file. Use the 'find references as nodes' utility in TSmorf to find where each function is being used. If a function has no references, it is likely dead code and can be removed. This process can also be applied to variables. Additionally, AI can be used to automate code modifications, but be aware of the default use of the TypeScript compiler and consider using TSmorf for better results.

Also, the source file is a good type, but it's important. So you have the source file, the project, and we create a new project, a new tsmorf project.

The next item is we are going to add all the source files that we want to check. So we are adding everything. In this case, again, it's everything of TS under source, but obviously, you can adjust that to your specific needs. You can have multiple locations. It can get an array. So that's you get the project with all the source files that you have imported. By the way, you can also ignore things like you don't want to check your test files, for example. So you can ignore test.tf.

Next thing, we are going to iterate through all the source files that we have seen previously. And for each source file, we are going to get all the functions that exist in this specific file. Then we are iterating on the functions in the file. And what I'm doing here is I'm using a great utility. This is the kind of utility that TSmorf excel in, which is called find references as nodes. And that will give me all the references of where this function is being used. And then in this case, if the references node length is zero, meaning there are no references to this code, or like I, in this case, I can just console log and specify which file has the function and what is the function of the name. And later on, you can either automate the removal or you can do some additional checks, make sure that it is correct. And manually or because it's basically just removing one function. So maybe you want to double check and then manually remove it. So that's an example how we can get rid of dead codes. I just want to mention here that we can do that also for variables, not for function. TSmorf has all these utilities we can check only for the exports and not all the functions that are defined in the file. Because ESLint will find the local one. We can enhance it if we are working in OOP, Object Oriented Programming kind of way. Then we can extend it also for classes, for methods, for members. And by the way, this is where tree shaking is not doing a great job. So if you have a class with, I don't know, 30 or 40 methods and half of them are not being used, this is a big pain on your performance. And the last thing is I said that AI is not very good at going through your whole code base, but AI is extremely good in writing this kind of function, what is called code modification. So you should use it, but you should be aware that by default it will use TypeScript compiler, so you should use TSmorf and then it will give you a great result.

6. Identifying and Resolving Cyclic Dependencies

Short description:

Cyclic dependencies in your code can cause tight coupling, increased complexity, and runtime errors. MADGE is a tool that can help you find circular references and generate call graphs. Using MADGE, you can identify and resolve circular dependencies in your code. Another option is to use the Enforce Module Boundaries ESLint plugin in Nx workspace to report on circular references.

And obviously you can do a lot more. And we will see another example. Okay. So that code is one problem.

Here is another one which is, I must admit, is even more painful. And this is cyclic dependencies. So this is your architecture, or this is actually a very small portion, because in a large code base you probably have hundreds of packages. And in a good architecture, the calls should be unidirectional. It means that you should call something that calls something, usually it's well organized in layers, so the top layer can see the layers below it, but bottom layers cannot look at the layers above. This also helps with abstractions.

And this looks very well, but then someone goes and adds a call from package 5 to package 3, and as you can see on the right side here, we actually created a circle, a circular dependency. And the circle of trouble is causing the code to be very tightly coupled. It's increasing the complexity inside the code. It's very hard to refactor, and it can also occur runtime errors. This is an example. It's quite simple. You see here that we have two modules, function B is calling, sorry, function A is calling function B, and function B is calling function A. Usually this is not how it's going to be in your code, it's probably going to be five different modules.

Okay. Here I have a very short suggestion. Use MADGE. MADGE is a great tool that is doing, can find these circular references. It can also generate graphs of the calls in your application, but I have to say that when you have a large code base, these graphs are usually not very usable. You might need to project them on a billboard in order to be able to see something. So here is an example. You run npx madge, minus C means find circular. You give it an entry point, it will process the file, and it will point out, I know there are four, but I just wanted to show one example. So common services, MongoDB is calling the MongoDB helper, which is calling the config helper, which in turn is calling back the MongoDB file. So that's one option. Another option I just quickly mentioned here, if you're using Nx, and we won't discuss Nx, but if you're using Nx workspace, it also has a great ESLint plugin that is called Enforce Module Boundaries, which will report also on circular references.

7. Resolving Cyclic References and Using Barrel Files

Short description:

To resolve cyclic references, extract shared functionality and call it for both models. Barrel files, which serve as index files, can hide internal structure and facilitate refactoring without impacting external consumers. However, they can increase the import surface and lead to performance issues. I used the tsmove tool to fix an example by extracting functions to a separate module and creating a domain-specific index file. When moving files, be aware of potential issues with IDE-generated imports. To update imports across multiple files, use the runReplaceImports function in tsmove.

The solution here is, to be honest, a bit complex. So if I have this one, I will probably need to extract the shared functionality and then call this shared functionality for both models. It's easier drawn than actually done. It can be quite complex to understand what is the functionality.

Cyclic references. Now let's switch to the barrel file, and this is where the name come from. A barrel file, the logstock and barrel means everything including everything, and the barrel file is basically an index file from which you export multiple functions. So it will look something like that. I have function A, B, C, whatever I'm exporting it, and then in your file, you are going to import these functions just from the index file and use whatever you need. When we are talking about barrel files, it has a really good point. It will hide the internal structure. So not every module that is importing from this module needs to know the internal structure, and it gives you some freedom to do some refactoring inside without impacting the external consumers. But it also has downsides. The bad is it increases the import surface, so you are trying to import a single function and you end up with importing 500 files. This can have huge impact on performance, and because barrel files import a lot of files, you have a better chance to get cyclic references because you are just importing so many files.

I want to show an example of how I use tsmove again in code in order to fix an example. So again, I have here a slightly different version of my import going from a different export, and then in my file, I am importing function 1, 6, and 8 from this index file. And now let's say I looked at the code and I said, okay, so function 7 and function 8 from module D, I want to extract them to a separate module. I want to take them out from this utils and have them in their own module. And I created a domain D index file with module D, and I export only these files. I also moved the files. Now, people might come and say, look, when you move the files, your IDE can actually fix all your imports for you. True, but when you are working with large code bases, it's probably going to create very bad references, especially if you are using a shortcut for path. It might use a relative path instead of the shortcut for path. Okay, so now in my file, I want to change the function 1, 6, and 8 from utils and split it to two imports, one still from utils and another from function 8. If this is only on one file, you can do that manually, no problem. But the problem is that you probably have hundreds of imports for this utils all over the place, and going one by one can be really tedious. So what do we do? We create a function that is called runReplaceImports, okay, using again tsmove. It will get four parameters.

8. Replacing Imports and Removing Empty Calls

Short description:

The function receives four parameters: project path, file to change, old specifier, and new specifier. It iterates through the source files, filters the import declarations, removes the specified import, checks if the new import already exists, and adds the desired import. If the old import becomes empty, it is removed.

It will get four parameters. The project path, where do I want to change the file, the name of the import, the old specifier and the new specifier, what function I want to move from where to where. Now I'm creating a new project and importing all the files from the project path like we've seen before. Then I iterate through all the source files, and for each source file, I'm replacing the imports. And later on, I save it and I can run this function. Obviously, this is just the wrapper.

The key here is what do I do in the replace import. So again, let's look at what we want to do. I want to take from some file the utils, and I want to split it to two different imports. So, and here at the bottom you can follow what I'm doing in the code. So the first thing I do is I get the import declarations from each file. I'm going through each file and get the import declarations. I'll probably get a lot of imports. I'm sorry about that. And then all I want to do is I want to filter them. I want to filter the import declarations to find just the one with the old specifier. Our old specifier is utils, and that means I will get only this line here.

And now for, yeah, and now for each import declaration, what I'm going to do is I'm going to find the named import, the import that I want to move. In our case, this is function eight. If I found it because it might not have it, I'll get the name and I'll remove it from the import. So I will get something like that. Function eight was removed. Next, I am, I want to create a new import. I want to check if the new import already exists. So I'm checking the new specifier, see if it already exists. If I find it, all I need to do is add the import that I want to move in function eight again. I just want to add it. So you can see here that I already had the sum function, and now I got the function eight. If it doesn't exist, what I do is I actually add a new import declaration, function eight from domain D, and I have my new import. And the last thing I want to do here is if the import declaration, the old one, the one that I got on the utils, now has a length of zero, the named import, meaning it is an empty one, I want to remove it so I will not have empty calls in my file and it will be gone.

9. Key Takeaways for Building Large Software

Short description:

We need a good layered architecture for building large software. Clear layers promote modularity, code clarity, and development velocity. The three tips mentioned are eliminating dead code, breaking cycling dependencies, and reducing barrel files. Thank you.

So quick take away from this talk. We want to have a good layered architecture. This is the way to build large software. We want to make sure that we have clear layers. Each one is seeing the one beneath it, but is not, is not unaware of what is above it. Good architecture helps modularity, clarity of the code, development velocity, when it is, we can quickly add more items, and of course it is easier to maintain and to scale.

And the three tips that we mentioned, eliminate dead code, you'll get a linear code base, break the cycling dependency for modularity, and reduce barrel file for clarity and efficiency.

And that's it. Thank you so much, Guy. Thank you for attending my talk. Bye.