Building software is all about hitting the right spot between features and quality. But we rarely talk about how to measure quality. Let’s look at how a gamification system (points & leaderboard) within a GitHub action helped developers on my team care about quality!
How to Use Gamification to Improve Quality on Your Project
FAQ
Jonathan Wagner is an engineering manager at Theda UK with over four years of experience as a tech lead, having worked on more than 10 projects in production.
The main topic of Jonathan Wagner's talk is using gamification to improve code quality in software engineering.
According to Jonathan Wagner, the three hard things in software engineering are caching validation, naming things, and prioritizing code quality.
Jonathan Wagner refers to code quality as everything that includes technical debt, maintainability, refactoring, and finding the right balance between delivery and quality.
Jonathan Wagner suggests creating a new standard, training the team on this standard, and using tools like ESLint to automate the process and prevent bugs in new code.
Jonathan Wagner's team initially set a maximum number of warnings that could not increase and used ESLint to suppress warnings by converting them to errors. They also implemented a gamified system to motivate developers to fix warnings.
Jonathan Wagner used an open-source tool called Klinter to automatically add ESLint disable comments and manage ESLint configuration.
Jonathan Wagner created a GitHub action that posts a comment on pull requests showing how many points a developer earned for fixing warnings. It also includes a leaderboard to track performance and incentivize participation.
After implementing gamification, Jonathan Wagner's team reduced the number of errors by 235 over three months, averaging 78 fixes per month. However, they also faced initial challenges with bugs in the system.
Jonathan Wagner believes that aiming for zero errors in legacy code may not be practical as it could introduce new bugs. He suggests that the goal should be to improve the new code to a standard where no further fixing is needed.
1. Introduction to Talk#
Welcome to my talk on using gamification to improve quality. I'm Jonathan Wagner, an engineering manager at Theda UK. I've always struggled with finding the right balance between fast delivery and code quality.
Hi, everyone. Welcome to my talk on using gamification to improve quality. To quote about myself, I'm Jonathan Wagner. I'm an engineering manager at Theda UK and I've been working as a tech lead for the past four years on about 10 and more projects in production. I can tell you I've always struggled with finding the right balance between pushing for fast delivery and having a good code quality on my projects. I've seen both extremes, like project with 100% code coverage and project were just going to stretch production without testing anything. So, it's always been a struggle and I want to talk to you about all of this.
2. Understanding Code Quality#
Code quality is a crucial aspect of software engineering. It involves maintaining technical debt, ensuring maintainability, and prioritizing delivery and quality. It's important to address both quick fixes and root causes, prioritizing the quick fix first and then investing time in preventing future issues.
Something classic that people say in software engineering is that there are three hard things. You have caching validation, naming things and prioritizing code quality. What do I mean by code quality? Let's dive a bit more into this. It's everything that contains technical debt, maintainability, factoring and so on. So, it's finding the right balance between delivery and quality. But it's also deciding when to do the fix for the root cause versus the quick fix. Ideally you want to do both, but maybe in the right order. So, prioritize the quick fix and then invest time in looking to the root cause and preventing the issue from happening again. But that's the first question of prioritizing.
3. Improving Code Quality#
To improve code quality, start by creating a new standard for new code, automating its enforcement using tools like ESLint. Address legacy code by motivating teams to prioritize its improvement. I'll share a story about how we approached this, starting with a project with 1500 warnings. We used gamification and CI to incentivize reducing warnings, but encountered challenges with simultaneous pull requests.
How do you improve it? It can be quite complex. I started to develop a theory on this and I'm going to try to explain this to you. So, let's start by trying to split the problem into smaller parts. So, let's say you have a code base and you want to improve on it. The first thing you can try is to look at the new code you add. And then after that, focus on the legacy code.
So, first of all, the simple part, the new code. You can start by creating a new standard, training your team with this new standard and then making it hard for people to write bad code. That's an important step, that step you can automate. And to automate it, you can use tools like ESLint. It's not the only solution, it's definitely not perfect, it doesn't catch everything, but it helps prevent bugs. And often when analyzing the root cause of a bug, you can identify that an ESLint rule could have prevented it. So it's a good occasion to add a new rule, train your team on it, make sure they know how to fix it and bit by bit do something about it. And that means a new code you write has better standards and hopefully, you improve the legacy code as well.
But this legacy code, it's hard to decide when to look at it or not. And it's even harder to motivate everyone in your team or multiple teams, if you have to look at it. So that's when it gets tricky. What do you do there? So let me tell you a little story about how I approached the problem and explain some other things I've learned along the way. So little story, we'll explain you in what state we started in, how we played with the CI and gamification and then what happened and what kind of results we had. Initially, we had a project with about 1500 warnings. So quite a lot of warnings and this number was decreasing very slowly. So every time developers were adding features, it was known that they shouldn't be adding new warnings and Bluecross would be blocked by the tech lead or the developers in the number increase. That means there was a place in the code where you can say, okay, this is the number of max warnings. And if it changes, it has to go down, it cannot go up. But in some cases, we broke the deployment pipeline. That would happen when two amazing devs would want to decrease the number kind of at the same time with a different pull request. So let's say the first developer fixes two warnings. The max warning is now 1498. And the other developer fixes three different warnings. That means he gets a max warning down to 1497.
4. Resolving Merge Conflicts#
We encountered an issue where merging code with a different number of warnings caused unexpected breaks. To avoid this, we decided to remove warnings and only allow errors. By modifying the YesLimConfig and using the Klinter tool, we successfully eliminated all errors and prevented future merge conflicts.
First one merges. Everything is green, all good. Second one merges. Didn't replace beforehand. Everything was green without any merge conflicts. And boom. It breaks. Why did it break? It's because we now have minus five warnings instead of the expected minus three or minus two that we had before. And that means everything is broken. Someone has to fix. People are not sure why it's broken. You might not have the alerting. Might take forever to fix.
So we want to avoid that at all cost. And one way to do this is to basically remove warnings. Let's say we don't want warnings anymore. We just want errors. That's one way to look at it. That's what we tried. So basically I went to the YesLimConfig. We pressed all the warnings by errors and overwrote the one that was defaulted by the plugins we had. And we went from 1500 warnings to zero. But then we had the same amount of errors and that meant RCA was broken.
But thank god we had a little tool that already existed which is a LimConfig generator called Klinter. It's open source. You can use it as well. And it helps you automatically add YesLim disabled comments everywhere you have an error. And that means we don't have any more errors. The CR was clean again and we never had any more merge conflict like this one. So first step, quite simple, straight forward, fixes everything.
5. Automating Warning Decrease and Error Reduction#
I automated the process of decreasing warnings by creating a GitHub action that posts a comment on pull requests, providing points earned and rankings. It takes less than 10 seconds and does the job well. After some initial bugs, the system ran smoothly for the next three months. We started with around 1600 errors and, despite a few bad weeks, managed to decrease the number of errors by 235 in three months. At a rate of about 78 errors per month, it would take us 4 years to reach zero errors.
But then we have the problem of decreasing our warnings. So that's when I started thinking about, OK, let's try to automate this. Let's try to put kind of a little incentive in place. Let's try to make it like a game with a leaderboard. So I went on it, spent a little quick encoding and came up with this.
So it's a GitHub action. Could be adapted to SQL CI, GitLab or any other CI tool you use. It's quite simple. It's basically posting a comment on your pull request and it's telling you how many points you've earned in the pull request. How many points you've earned since the beginning of the week and your current rank for this week. And you can see the podium, the full leaderboard and the explanation about how to earn points. Basically what happens is, it's taking the git diff on the pull request and then counting how many lines you added contained a nested disavowal and how many lines you removed that contained a nested disavowal. Then based on this, we get the score. We compute the score for everyone as well and print the leaderboard. No need to store anything anywhere, it's just computed there every time you open a pull request. It takes less than 10 seconds and does the job just well.
So I put that in place, I was super proud of myself. We released it, first week, so many bugs. Lots of pull requests had zero points when people should have had points. People complained, people were unhappy, I worked a bit on it. And after that, it was a smooth ride for the next three months.
So here's the data for the three months on the project. So we started out around 1600 and then you can see a few new recipient rules added. And overall, it looks like it has increased quite a bit. But let's zoom in a bit more and see what happens. So first, zooming in on the 3000s and then adding some baselines, we can actually see that apart from a couple of weeks in April that were a bit bad at the beginning and at the end, we had some good times. And there's a bit more computation on that. If we forget about every rule we've added, we actually decreased our number of errors by 235 in about three months. So that's a rate of about 78 a month. And assuming we started, we now start at 3500, it would take us 4 years to get down to zero.
6. Striving for Zero Errors and Ensuring Code Quality#
We can aim for zero errors in code, but when dealing with legacy code, fixing everything may introduce new bugs. It's important to find a balance and prioritize improvements. To ensure code quality, we can enhance the tool, block pull requests with errors, show potential points, and track the weekly decrease in errors. Exploring alternatives like tests and incentivizing developers can also be beneficial. Ultimately, the goal is to make coding fun and encourage contributions to improve quality.
We can do a bit more math and be like, okay, 78 per month. If we have a team of say 35 developers, that's about each dev fixing one error every two weeks. It's definitely not a lot. You can probably expect them to be fixing like two bugs in a week. That means divide by four that number, that means in one year we could get down to zero. So is it good? Is it bad? What do you guys think? That's the kind of question that we were asking ourselves and that I can explain what I've learned.
First of all, should we aim for no errors? Is this something we should be... Should we fix everything? Should we go down to zero? My opinion is that when you touch legacy code, you might introduce new bugs because the code is working. You might not be well tested and by touching it, you just increase the chances of introducing new regressions. So aiming for no errors means probably inserting new bugs in your code base. It's like the complete opposite of what we want to do in the first case. So maybe that's not actually the goal. Maybe we'll just be like normal to have a slope that is quite good at the beginning and then after a while it stabilizes and it's normal because the new code is up to standard and you don't need to fix anything more.
Something else that you can think of then is how do you make sure that you get into the situation as best as possible and how do you detect when you get there. So maybe you can improve the tool. Maybe we can have new features like making sure we block the pull request to prevent people from adding new errors. It could be showing potential points by looking at the files that have been touched. So if we look at each file and how many warnings they have then we can say okay had you fixed everything in there you would have won 20 points instead of just three. Then we can also show the total weekly diff so that every week you can actually make sure that the number is decreasing and not staying average like we had in April. And maybe there are other alternatives. Maybe we shouldn't be just looking at easelint. Maybe we could be looking at tests and make sure that when we test files we earn points as well. Maybe we could also incentivize people a bit more like people are earning a prize when they get first place or you could also like put the leaderboard on the tv for everyone to see it at all time and make sure that it's present in everyone's mind that it's a priority. But then that depends on, is it actually a priority? But then most importantly, was it fun to code? Definitely. I had so much fun coding this and i really hope that you guys start trying it, contributing and yeah setting it up on your project playing with it, opening pull requests and improving quality everywhere. So that's it.
Check out more articles and videos
We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career
Workshops on related topic
It’s a tale as old as time - collaboration between developers and business stakeholders has long been a challenge, with a lack of clear communication often leaving both sides frustrated. The best developers can deeply understand their business counterparts’ needs, effectively communicate technical strategy without losing the non-technical crowd, and convince the business to make the right decisions. Working at a consultancy, I’ve both failed and succeeded in architecting and “selling” technical visions, learning many lessons along the way.Whether you work at a product company, are a consultant/freelancer, or want to venture beyond just being a developer, the ability to convince and clearly communicate with stakeholders can set you apart in the tech industry. This becomes even more important with the rise of GenAI and the increasingly competitive developer market, as problem-solving and effective communication are key to positioning yourself.In this workshop, I’ll share real-world examples, both good and bad, and guide you through putting the theory into practice through dojos.
Comments