Road to Zero Lint Failures: Tackling Code Quality Challenges at Scale
From Author:
Lint rules enable us to uphold code quality and minimize errors. It can positively impact developer productivity and happiness especially when working in a massive application with multiple teams working together. But what if your large scale application contains thousands of lint failures over the many years it has been running in production? This talk will explore actionable strategies for effectively addressing lint failures at scale so that we can once again rely on lint rules to ensure consistent code quality and streamline development processes, leading to a more robust and maintainable codebase.
This talk has been presented at React Summit US 2023, check out the latest edition of this React Conference.
FAQ
Lint roles at LinkedIn ensure consistency, bug prevention, and maintenance across the codebase. They help scale guidance to all teams, covering the latest code standards, preventing common mistakes, and maintaining deterministic code during migrations.
LinkedIn runs Lint roles in three stages: during development, pre-commit, and pre-merge. This approach alerts developers early to any Lint rule violations before involving other engineering team members.
LinkedIn faced scaling issues due to their extensive codebase, which included over 24,000 files and 100k commits, and an initial count of over 7,000 Lint failures. The increase in Lint failures was exacerbated by new code and Lint rules that did not address existing issues.
To reduce Lint failures, LinkedIn implemented measures such as blocking commits with Lint failures, setting Lint failure limits for teams, and enforcing standards through a two-step review process. Additionally, they introduced a rule requiring that all existing errors be fixed before a new Lint rule could be enabled at error severity.
LinkedIn used a combination of carrots over sticks, providing technical support, recognition, and a good developer experience. They offered immediate support for queries, shout-outs for cleaning up Lint failures, and tooling that simplified the identification and ownership of errors.
LinkedIn developed a tool that broke down ESLint failures by team and rule, updated nightly. They also used Checkup, a node runner that could analyze the entire codebase nightly and provide structured outputs for easy tracking and visualization of Lint failures.
The initiative to reduce Lint failures led to significant improvements in perceived code quality, with a 30% increase reported in quarterly surveys. The effort involved 55 unique contributors and resulted in the cleanup of over 6,000 Lint failures.
Video Transcription
1. Introduction to Road to Zero Lint Failures
Today I'll be talking about our journey from thousands of Lint failures to zero. Lint roles ensure consistency, bug prevention, and maintenance. We run our Lint roles in three stages – during development, pre-commit, and pre-merge. For context, our codebase at Linton is over 80 years old. We have over 24,000 files and over 100k commits. This is compounded by having a lot of Lint failures already existing in the codebase as we are starting this process.
Hi, and welcome to Road to Zero Lint Failures, I'm Chris Ng and I'm a Senior Staff Software Engineer at LinkedIn. Today I'll be talking about our journey from thousands of Lint failures to zero. So why Lint roles? Lint roles ensure consistency, bug prevention, and maintenance. It allows us to scale guidance to all the other teams within LinkedIn to ensure that everyone is covered with the latest and greatest code standards, and also edge cases to prevent common mistakes, and ensure that the code base is consistent and deterministic when we're doing migrations.
We run our Lint roles in three stages – during development, pre-commit, and pre-merge. This way you are alerted when you are violating a Lint role as early as possible before involving other engineers. So what's the problem, right? Let's add all the Lint roles. I think this is a common conception, unfortunately in real life we face scaling issues.
So code quality and scale. For context, our codebase at Linton is over 80 years old. React was version 0.13 at the time this repo was created. We have over 24,000 files and over 100k commits. This is compounded by having a lot of Lint failures already existing in the codebase as we are starting this process. I believe there was over 7,000 Lint failures when we started the road to 0. This is also coupled with an ever increasing amount of Lint failures being introduced to the codebase due to either introducing new Lint rules or introducing new code which does not fix existing Lint failures.
2. Road to Zero Lint Failures: Incentives and Tooling
We introduced rules to limit the introduction of Lint failures and implemented a two-step review process. However, these measures did not completely eliminate Lint failures. To address this, we implemented a new rule where every Lint rule must fix all existing errors before being enabled. We also focused on providing incentives for developers to fix Lint failures, such as technical support, shout outs, and a good developer experience. We made it easy for engineers to fix Lint failures by providing tooling that identified errors and their owners.
So ideally, you come to a campsite and you leave it better than you found it. That's kind of like the campsite analogy. The problem is what if you come to the campsite when it already looks like this, pretty dirty, lots of Lint failures. Are you very incentivized to clean this up? So what we found out was that a lot of people are not very incentivized to clean this up.
So we started adding some rules to limit the amount of Lint failures that are getting introduced to the code base. We started blocking commits when there are Lint failures and the files are changed. We set limits to certain teams, say you can have 10 Lint failures for your particular team. We've done a two-step review process where there was a group of people who were enforcing, for lack of a better term, the standards in the code base. Some of these work and some of them didn't work, but didn't quite get us to zero Lint failures.
So let's visit the analogy again, right? How does this fail our analogy? So if we block commits, we have this two-step review system, but your manager now tells you, hey, you need to land something really, really fast. What do you think is going to happen, right? What we saw happen is that people code around the problem either by asking someone for an exemption or actually physically coding around the problem by introducing a new class or something like that, and then ship it, avoid all the issues, get it to production, get it to members as fast as possible, and get the impact that you want. This really ruins the campsite analogy. That's why we introduced a new rule where every Lint rule that's introduced must fix all existing errors before we can enable it as an error severity Lint rule. That means we stop the bleeding. There are no new Lint rules that are getting added that makes existing files hard to maintain. But then, as a Lint author, you're like, what if there's 1,000 errors? I have to fix all 1,000 errors? I don't know how to fix all these existing errors. I don't have time to do this. What if I break something, cause a product issue? What we kind of like figured out was that these are existing issues that people who work on the codebase data they have to deal with when they see a Lint failure, and they're not incentivized to fix it. And so the Lint author kind of needs to be in the same page as the Lint or the people who are getting Linted. So the road to zero Lint failures, it's all about incentives. That's kind of like a story here. The way we ran it was kind of a carrot over a stick. We provide people with lots of technical support. Every question would get an answer as soon as possible, most within the same hour, but we try to keep it within the same day. We provided shout outs when someone cleaned up a Lint failure. We provide a very good developer experience. We really targeted the actual developer fixing their Lint failures, as well as giving them visibility once they've fixed some Lint failures in the codebase, as well as recognition. The way we made it easy for engineers fixing Lint failures was providing tooling. We identified the errors, we identified the owners, we kept this list up to date and made it very simple to use, not a new tool for an engineer to learn. Just kind of the list for them to see what the failures are and who are the owners.
Comments