The second example I will show is the bugs distribution per service or per module. On a backend service, I can have lots of different microservices, and if I show the distribution, then there might be a case like 50% half of the bugs are coming from one specific service. What does it mean? There is something wrong with this service, right? Because there are too many services, but most of the bugs are coming from only one of them. Or if you are talking about a UI product, a web application, then like let's say most of the bugs are from the login screen, not from the other pages and other screens, but just from the login screen, which is supposed to be a tiny screen, a lot of bugs are coming. So there is something wrong there, right? Just go and check your tests, if they are capable of finding the bugs or not. So this might give you some ideas.
Similarly, we can show the distribution of bugs per their type, like if they are functional or security bugs, reliability or understandability bugs, so that we can improve maybe our documentation or improve security tests, and we can take some related action items. What else? We can show the bugs detection progress. Like this sprint we detect 5 bugs and next sprint we detect the 20 bugs. Again, just focusing on one very narrow metric, narrow monitoring, like this sprint we found 5 bugs. So what? This doesn't have a proper meaning, right? But if I accumulate with the other monitoring activities, like normally, every sprint, in average, I found 5 bugs or 6 bugs. But in this specific sprint, if I find 30 bugs, then obviously there is something wrong. Maybe I forgot to execute the test cases. So I can go and check my CICD jobs, pipelines, if tests were executed or not. Similarly, I can show the resolution duration as well. Not fix detection, but the resolution duration.
And maybe combining these two created versus results. And I can see the gap in between, right? If the gap is increasing, it means the bugs that I am finding is getting higher and higher, where the bugs are resolved, are getting in a relatively lower pace. Then it means the technical depth will increase. So maybe it might be time to take some action item and freeze the development activities and focus on fixing the bugs. And not only the charts or the pie charts or the trends, but also I can show some tables, like maybe the ages of the bugs, like for how long they are waiting for fix, or for how long they are fixed and waiting for validation. This time I can go and check the QA engineers, right? The development team fixed it. And why aren't you still verifying or validating? So last but not least, of course, we should talk about, I used this term a few times, but escape bugs are very important because it is super expensive, if some bug is already in production. But if I find already in testing, then it will be relatively easy, right? Because it is not the working function yet in production. So I can easily change the architecture or the design. But if it's already in production, it will be too late, too expensive. So please consider monitoring or tracking these escape bugs and try to understand the root causes, why were they escaped to production environment. So in the last section, of course, one of the most important things is trying to do all this analysis, trying to improve our productivity by using machine learning. So I told in the beginning, it is very similar to our biological learning, right? How do we learn the patterns? Like if I give you a sheet, Google sheet or Excel sheet or anything, any document, any type of document, where I have lots of bugs, what can you do? You can observe, right? For example, you can see what is the author of the bug, what is the title of the bug and you can get some observations. If this person is opening a bug, most of the time it is a very critical one.
Comments