Video Summary and Transcription
Today's Talk explores the complexity in code and its impact. It discusses different methods of measuring complexity, such as cyclomatic complexity and cognitive complexity. The importance of understanding and conquering complexity is emphasized, with a demo showcasing complexity in a codebase. The Talk also delves into the need for change and the role of refactoring in dealing with complexity. Tips and techniques for refactoring are shared, including the use of language features and tools to simplify code. Overall, the Talk provides insights into managing and reducing complexity in software development.
1. Introduction to Complexity
Today, I want to talk about complexity in our code. Our jobs are complex as we have to model real-life things in programming languages. Let's explore what complexity means and how it affects our code.
What's up, everybody? My name is Phil Nash, and I'm a developer relations engineer at Datastacks. Datastacks is a company behind AstroDB, which is a serverless vector database that you can use to build your generative AI applications. Again, my name's Phil Nash. If you need to find me anywhere on the internet, I will be Phil Nash, and Twitter, Mastodon, whatever social network you're using at the moment, even LinkedIn, my goodness. Anyway, what I want to talk to you about today is complexity, complexity in our code. So let's start by talking about what I mean by complexity. You see, our jobs are complex. We often have to model real-life, real-world things the medium of JavaScript or TypeScript or any other kind of language. And the world itself is inherently complex, and then we just add to that by trying to turn it into code. And code, therefore, itself is inherently complex.
2. Measuring Complexity and Cyclomatic Complexity
Our job is managing complexity, and we need a method of measuring it in our code. Let's examine the complexity of functions through examples. Cyclomatic complexity, invented in 1976, gives a score to a function based on its flow breaks, such as loops and conditionals.
What we want to avoid doing is actually just adding any more complexity to our applications than the problem that we're trying to solve demands itself. And ultimately, this means that our job becomes managing complexity. As long as we can keep on top of this complexity, then our codebases stay sane, easy to understand, and easy to work on over time and over changes of team and all sorts of things like that.
So our job is managing complexity. However, last year at some other conferences, I gave a talk in which I looked into the top five issues in JavaScript projects that could be discovered by static analysis, and in at number two was the fact that the complexity of functions in our projects was just too high. And so that's a problem. That's why I wanted to give this talk to get over that. And so really the question is, what is too high? We need a method of measuring our complexity in our code. And so how might we go about that? How do we measure complexity?
Well, first of all, let me, if I showed you a piece of code, if I showed you a function like this, a sum of prime numbers, and asked you how complex is it, just have a think about what you might answer and how useful that would be. There is obviously some complexity here. We've got some loops. We've got some conditionals. We're dealing with prime numbers. That's going to be complex. And then we've got this other function. This is a get words function. We pass in a number and it returns words like one, a couple, a few, many, or lots. It's a big switch statement. How complex is this? Well, there have been ways to measure complexity that have come, been invented over the years. Back in 1976, cyclomatic complexity was invented. Cyclomatic complexity gives a score to a function based on a couple of things. Mainly it adds scores for when a function has a break in flow. That is, when there is a loop or a conditional for the most part. And so if we look at our sum of prime numbers function, cyclomatic complexity actually scores one for being a function. Everything starts as one. So it starts up there. And then we score one for this first four loop. There's a second four loop which scores one. There's a conditional. And then there's another conditional at the bottom.
3. Cognitive Complexity and Nested Scoring
Ultimately, the cyclomatic complexity and cognitive complexity have different outcomes. While cyclomatic complexity measures the number of paths through a function, cognitive complexity considers understandability and reading code. It increments the score for each break in flow and also keeps track of nesting. The sum of primes function is used as an example to illustrate these concepts.
And so ultimately this function scores five. Complex? Fairly complex. We'll find out. The get words function, however, this also scores one for being a function. And then because it's a switch statement, it scores one for each case in the switch. And there's four of those cases. So ultimately, get words becomes five as well. And I'm not sure I would agree that get words and sum of primes are necessarily equivalent in complexity. So perhaps there's something we can do about our cyclomatic complexity. About our cyclomatic complexity that could be better.
The cyclomatic complexity is useful. It measures the number of paths through a function. And that's really useful if you want to know how many tests you need in order to cover that function and all the functionalities or potential branches. But it doesn't really cover how we measure sort of understandability, how we as humans reading code actually understand it. And so the fine people at Sona back in 2016 came up with a score called cognitive complexity. This is a score that was more intended to target how we think about code and how we read code and score it that way. And so cognitive complexity is scored in a similar way to cyclomatic complexity. It's still going to come up with a number at the end of the day, but this time it increments the score every time you come to a break in flow. But it also keeps the idea of a nesting score in the back of its mind. And every time you increment, you increment one plus the current nesting score. And that makes a different, comes up with a different outcome.
And so if we look back to our sum of primes function, what we see is we don't have to score for being a function anymore. The minimum score here is zero, but we score one for this first for loop. And then we increment the nesting score by one. So when we get to the second for loop, it scores two and the nesting scoring currents again, we get to the conditional. And that score is three because it's nested twice and is then a break. Then we'd actually add one more to the nesting, but then we don't have any more breaks inside there. So we kind of take one off for dropping out the conditional, take one off for dropping out the loop. We hit the next conditional.
4. Conquering Complexity and Demo
The cognitive complexity of the sum of prime function is eight, while the get words function scores one. Understanding the complexity is crucial. Thrashing about and building a complex stack of considerations is something we want to avoid. To conquer this complexity, we need to identify where it lies. I will demonstrate this with a quick demo using the AstraDB client written in TypeScript. I added complexity to the insert many function, exceeding the threshold set by Sonalint.
We hit the next conditional. So that scores two because it's nested once. And then we eventually drop all the way out until the final score is eight. So the cognitive complexity of the sum of prime function is eight, but our get words function, which I think we agreed is less complex earlier, now only scores one. It's a big switch statement. We have to consider one value and a number of things to do with it, but we only looking at one value at any one time. And so it scores one. And I think that makes more sense for the way that we think about things, the way we understand code. For me, the model of this was sort of like a stack of things in your brain that you have to consider whilst you're reading bits of code. And as you go through something like sum of primes, you end up adding things to the stack as you go through.
We end up having to know and understand at all times that we have two looping variables and we're inside of a conditional, and that's fairly complex. Like you can get worse than this, obviously. And then there's some thrashing about as well as we drop out of the stack and then we add things back to it. And we have to keep kind of working out what's on and off our brain stack, what we actually have to kind of consider at this point during the, kind of, understanding part of the code. And then once it finally all pops off, we're going to breathe a sigh of relief because we got to the end of the function. But thrashing about and building things on a brain stack like that is the kind of stuff we want to avoid. And so, how do we conquer this complexity? That's the rest of this talk is going to be like beating that. And step one to conquering that complexity is understanding where the complexity is. I've given you this score of cognitive complexity, but now we have to be able to actually apply that to our code. And so, here's a quick demo just to show that. So, what I've done is I jumped into this is the open source AstraDB client written in TypeScript for sort of dealing with AstraDB instances. Turned out there wasn't actually anything that was that complex in here. So, I actually had to add a few things. So, I picked what was the most complex function and then added more stuff to make it worse. So, please don't blame the other people who wrote this wonderful library. So, what I did is I looked at the insert many function. And I have in VS code here, I have running Sonalint, which is what I was using to look at cognitive complexity before. And this tells me that this function has a complexity of 34 over the 15 allowed. And so, that's the sort of threshold that Sonalint picks on cognitive complexity. And it's got 13 locations in which we add complexity.
5. Analyzing Complexity and Need for Change
We add things to the stack and analyze the complexity. Understanding the need for change is crucial. Changing complex code is only necessary if it needs to be modified.
We add things to the stack. And you can see, actually, if you go to, if we scroll all the way down and hit quick fix, you can hit show all locations for this. And it brings out these numbers next to the code where it's adding complexity. So, there's this big kind of conditional around the outside. If there are documents that we're dealing with, then we'll carry on. And that scores one. But then everything else becomes nested inside it. So, this conditional is plus two. This conditional is plus three because it's nested twice. This one is plus four because it got nested inside this loop as well. And so, being able to see this complexity is super useful. Step two, then, is to do nothing. Which I like. I'm a lazy developer. And if I don't have to do anything, that's brilliant. But the point here being, there's no point in changing something if it's complex if you don't need to actually change it.
6. Dealing with Complexity and Refactoring
Complexity is only an issue if you need to change the code. In an old code base, leave complex code that doesn't need to change. Refactoring makes it easier to change complex code. Split change into two stages: refactoring and then making the fix or adding the feature. Tests are crucial to ensure the code's functionality remains the same after refactoring.
That complexity is only difficult for us to deal with when we have to understand and change that code. Complexity is only an issue if you need to change the code. And so, if there are different in the code base, there are different parts that kind of change more often than others. Some of it is actively being worked on. And that is the stuff that we do need to pay attention to. But in an old code base, there's probably bits of code that won't ever change. And if they are complex, but they don't have to change, we can leave them be.
It's actually a risk to change code that doesn't need to be changed just for the idea that we're going to make it better somehow. If it already does the job that it's supposed to, then we can leave it until it does need to change. And then, if it does need to change, I recommend we clean up that code as we go. And what do I mean by that? Well, I think this when we find that we have to change some code and we find that it is overly complex, that is a good opportunity to refactor that code first and make it easier to make that change in the future.
So, we can split this up into two things. We have to change the code. We have to fix a bug or add a feature. But it's too complex. If we refactor it first, it becomes easier to change to fix that bug or to make that feature once it's refactored. So, refactoring, I'll remind you, is improving how a piece of code works without changing what that piece of code does. That means we get the same result after a refactor is beforehand. And that's important. It's important also that we are stepping we are splitting this kind of change up into two stages. Refactoring first, where what the function does doesn't change. And just how it does it. And then later, we make the fix or later, we add the feature. That's when it actually changes its behavior.
So, how do we know we didn't change the result of a piece of code? Well, hopefully you're already screaming at the screen, tests. We absolutely need some test coverage here so we can have confidence that once we have refactored something, it remains doing the same thing that we were doing that it did beforehand. And it's also important to note that tests in this case must ensure that they test what a function does and not how it does it as well. If a test relies on the internal structure of a piece of a function or a piece of code, and you change the internal structure, then those tests are going to break even if you didn't break what the function does. And so, tests must ensure they test only what the function does and not how it does it. And so, if you don't have tests yet for a piece of code that you need to refactor and need to change, then your step 0 in this process is writing those tests, getting the test coverage.
7. Refactoring Tips and Techniques
Tests must cover existing behavior. Tips for refactoring code: reduce nesting, invert conditions and exit early, collapse structure, extract helper functions. Examples of JavaScript features to help with refactoring.
And it's important to note again, those tests must cover the existing behavior. If we are fixing a bug in a piece of code, but we decide to refactor it first, then those tests should cover all the existing behavior, even if it's wrong. Because if we are changing the test and the code at the same time, then we can't know if the refactor worked. We need to write the tests that ensure that the code does the ensure that function does the same thing once we've refactored it. And only then do we change the test and change the code to fix the bug. And then at that point we can refactor.
For the last part of the talk, I wanted to go over a few little tips that are going to help us refactor code in the face of this idea of complexity and understandability. Cognitive complexity, quite obviously, as a score, punished nested code. The more things we put onto the brain stack, pushed onto the brain stack, I suppose, the more complex things got. And so, reducing nesting is going to help you deal with that and just pop some of those things off of the brain stack and leave you with less to consider as you read through a function.
The things we're going to cover for this are inverting conditions and exiting early, structural collapse, extracting helper functions and just a couple of little JavaScript features that are going to help us out with this as well that are a bit more modern. And so, inverting conditions and exiting early. This goes to the first thing I showed you in that insert many function in which I said there was this big conditional around the outside that's checking that the length of the document's array is greater than zero. Otherwise, it just kind of returns an empty object, really, an inserted count of zero. This big condition around the function that then has everything else inside it, obviously, increases that nesting. And if we invert the condition, that is if the document length is zero or less than zero, I guess, it's not really going to happen with an array. But if the document length is zero, then we can immediately return our kind of zero object and then discount the idea that this is a problem anymore. The function can then just carry on just safe in the knowledge that we are dealing with a list of documents and that's fine. Inverting and exiting early just, yeah, pops things off the brain stack and means we don't have to consider that later on in the function. Just pop it off the brain stack. That's what I'm going with.
Structural collapse is sort of similar. We're trying to reduce the number of conditionals, especially the nesting conditionals, if we can kind of squish them together. And so, in this case, insert many actually can deal with an options document, options object that may have an array of vectors, and then it's going to try and zip those vectors and documents together. But if the length of the vectors and the length of the documents isn't the same, then that's an error, because they're not they're not going to go together. But this is like nesting conditionals here. And what we could do is squish that kind of thing together to check that if we have vectors and if the vectors are not equal to the same length as the documents, then we can throw an error. And so, this actually allows us to, in this case, return early via an error rather than just carrying on nesting further and then we can let the rest of the code carry on. Squishing those kind of things into one conditional allows us to, yeah, consider it dealt with a lot quicker and just pop it off the brain stack.
Extracting helper methods, I think, is actually the most useful thing here.
8. Extracting Helper Methods and Naming Behavior
Extracting a helper method is great for reducing repetition and naming behavior in code. By creating a separate function to handle a specific task, we can simplify the code and improve readability. This also reduces nesting and allows us to understand the function without needing to dive into its implementation details.
Extracting a helper method is great for if you have repetition in your code during a function and you don't have to repeat code, you can turn it into one function. But I also think it's really useful for naming behavior. You see in this little extract here, we're looking through and saying, okay, if we have the vectors array in our options object, then we're going to loop over the documents. And for each document, if there is a vector for it, then we are going to turn that document into a document plus a vector. But we have to work all that out by reading the code. And actually, if you were to take the internal part out, which is to say, like, we're turning this document into a document and a vector and extract that into a function that just says add vector to document, then this helper function is quite simple. We only are dealing with a document and then maybe a vector and we kind of either return the document and the vector or just the document. And that's a really small amount of context to deal with. And then if we go back to the original function, we've sort of named that behavior. The document becomes a vector and a document together. And we don't have to necessarily dig into that function to understand what it's doing. And so, what we've done is reduced a bunch of nesting, but we've also named the behavior so that we can understand it without reading it. And then in the actual helper function, the context is much smaller, so it's easier to understand the function itself when it's standing alone. We can test it independently. And we can just it's easier it's also less complex itself. It allows us to pop a whole bunch of stuff off the brain stack in that case.
9. Language Features and Tools
Nested objects and optional chaining operator simplify code by avoiding undefined checks. The knowledge coalescing operator reduces conditional and ternary operators. We explored cognitive complexity, reducing nesting, refactoring, and using tools like Sona cloud, Sona cube, and Sona lint. To enhance ESLint, install the Sona JS plugin. Thank you for watching!
And then finally, some language features that I think are really good for this. If you've ever dealt with nested objects and digging down to find a property in the it's deeply nested, you've probably written one of these chains of object.first and object.first.second in order to not hit an undefined accidentally in the middle somewhere. And an optional chaining operator is super useful for that. Simplifies the entire line of code and drops out all those kind of Boolean conditionals. Really useful. And secondly, if you're dealing with say in this case we're trying to assign to chunk size either something that's in options.chunk size or a default in this case. And so we're checking that the options.chunk size is not null or undefined. And if it is either of those two, we get the default. And if not, we can set it to the thing. And that's a whole mess of conditional and ternary operator going on there when really we could use we can use the knowledge coalescing operator which is effectively the same as saying not null and not undefined. And the top line here will set chunk size to the default only if options.chunk size is null or undefined. And if you were doing so within the object itself, it's even easier with the nullish assignment operator which says we'll only assign this a default thing. This is the question mark question mark equals. We'll only assign this default if chunk size in the original object is null or undefined. Super useful. Really tightens up pieces of code in in in those times and those in this context. And it just helps you pop it off the brain stack.
So, to recap, we had a look at what is complexity. Basically our job is basically code. We looked at cognitive complexity as a way to measure complexity. And we then looked at conquering complexity by reducing nesting, by refactoring first, making sure we have test coverage, refactoring, and then making the change that we need to make. There are some tools that are going to help you with this. So, Sona cloud or Sona cube can scan your code as part of your CICD pipeline and will pick up things like this cognitive complexity issue. Same as Sona lint. That's free to install into your IDE and it will show things as I was demonstrating earlier. And if you are using ESLint, that doesn't have cognitive complexity as a score. But you can install the ESLint plugin, Sona JS, which does, and will add that to your ESLint as well. Really useful. And that's all I've got time for here.
Again, my name is Phil Nash. I'm a developer relations engineer at Datastacks. And thank you so much for watching.
Comments