But in my experience, visual regression is also an extremely flaky test category. You probably know this reason when each literally each second screenshot, each second commit, has some visual regression noise and like we're all are humans and we are getting used to this starting ignoring, auto approving and so on and so forth. And this is a problem because once it become flaky, it lost the value.
So today, I'd like to discard this problem by diving into visual regression, the hood and try to get this knowledge and use this to build more reliable visual regression. So under the hood of visual regression always contain four simple steps. Firstly, you need to load a page, then you need to make a screenshot, compare it with previous approved version and see the difference. Looks pretty easy, but each of these steps has its own hidden problems and I'd like to discuss it.
So first of all, you need to load the page, but that's not enough to just load the page using your favorite browser-based test runner like Cypress, right? You need to make this page predictable, and this is a problem especially when you are not using visual regression services, because when your page is not in the stable state you can easily get a lot of noise, like for example, here. Most of the screenshots have sections that are changing from time to time, like the inline videos or changing carousel by timeout, and everything, all of this, can easily broke the visual regression process. Also, animation, times, random values can easily break this, so we need to be careful about this, but that's not everything we should care about.
Also, the different UI is possible even when you are running the same code, but in different operation system or in different browsers. Just because the different layout systems or different, like, operation system itself, can produce layout shifts or different default view, so this will break our code. And this is a real problem, which is perfectly solved by Visual Regression Services, but it gives a lot of problem for people that are trying to make the Visual Regression by themselves. Visual Regression Services solves this by loading your HTML, and not the screenshot, but HTML, running this HTML with all the styles in the specified browser, and only there to make a screenshot and compare it. But you can get the same level of predictability by running all of your tasks, and only run your Visual Regression tasks in Docker. It can be even reasonable to make a specific separate amount of tasks only for Visual Regression and run it only in Docker, even unproof it in Docker. And this will make you confident that your tasks are running in the same environment, and does not give a lot of noise and layout shifts in between local machines of developers. But there's also an interesting middle between these two approaches. There's a project called Visual Regression Tracker that gives you an ability to run these tasks inside the Docker in the self-hosted service, gives you an interface that allows to approve the screenshot, and is giving you the same level of predictability as Visual Regression Services, but self-hosted. I'm sure this project will make future of Visual Regression.
But then you need to make a screenshot, right? But which one? And here is a problem, because I'm constantly seeing, especially in the Cypress community, that people are giving default Cypress resolution, or some small resolution that are honestly not used by nobody in the world, yeah? We need to ensure that we are testing our UI over that resolutions that are used by our users. You can easily get this information from any analytics tool. For example, here is the stats of my personal website, and you can see that most of my users are using this weird resolution of some tablets, and honestly, I'm not testing my website over this resolution, and you probably know how it can be easy to lose some visual effect when this resolution is not widely popular, or too big, for example, and you need to ensure that you are testing over this resolution that's used by your users, and that's actually weird that by default, visual application tools and services are not using the most popular resolutions, like for example, Full HD. And the reason of this that we are doing screenshot testing over small images is that the comparison of screenshots is really slow. In order to compare two images with a Full HD resolution, you need to iterate over 2 million pixels, calculate the difference between each one using a specialized formula, and only then save the difference. It's a pretty hard and not performance-friendly task for computers, especially when you're trying to do this in JavaScript. And that's why I created and I'm right now working on the library called Odiff that allows you to do the image comparison not in JavaScript but in native, more performance language and save you a lot of time and allow you to test the screenshots that you want and make it fast.
So we probably are out of time. So let's discuss a conclusion and a key to painless visual regression. You need to ensure that your tasks are running in the same environment and you need to ensure that you don't have any unstable content on your page even if you're using visual regression services. And you also need to test your UI over that resolution that are used by your users and not just fast or performance-friendly for some service. And that's it. I'm happy to be here.
Comments