Video Summary and Transcription
This Talk introduces Replay, a time travel enabled browser dev tool for debugging test suites. It emphasizes the importance of collaboration between product and QA teams to maintain and improve test suites. The Talk demonstrates how Replay's DevTools can be used to debug test failures and analyze test suite health. It also highlights the benefits of using Replay for reproducibility and collaborative debugging. Additionally, it discusses integrating Replay into CI and the cost considerations associated with using the tool.
1. Introduction to Test Suite Triage
At the end of this talk, all of you can become JS Timetribes as well. This talk is called stop triaging your test suite. If you have code, there are going to be bugs. And if you have bugs, you're going to have flaky tests. I come from replay.io. We're building the first time travel enabled browser dev tools.
At the end of this talk, all of you can become JS Timetribes as well. So before we begin, raise your hand if you've got a browser test suite. Good. Okay. Raise your hand if you've got more than 10 tests. Keep them up if you've got 20. All right. 50? Do we have 50? Okay. Do I hear 100? Right, right, right. Any thousands? Ah, shame on all of you.
What about Selenium? Who uses Selenium? Playwright? Damn! Puppeteer? Okay, okay. Cypress? Sweet. Who has flaky tests? I thought that might happen. Who's fixed their flaky test this year? Past six months? Yay! Three months? Who's fixed their flaky test in the past week? Yes! Yes!
So this talk is called stop triaging your test suite. Triaging for me is when you set up rules like ah, if it's flaky more than this amount, as in failing for longer than this amount, we're going to have to start skipping them. We have to get somebody somewhere to fix it. And it hurts me. Everybody at scale has set up some set of policy about their flaky tests. And have disabled certain of our flaky tests. And that hurts me. But the reality is if you have code, there are going to be bugs. Even this slide has a bug. Who sees the bug? What's the bug? Yes, there's a second I. Three I's. There should not be a second I. But there are bugs everywhere. And if you have bugs, you're going to have flaky tests. So, I come from replay.io. A bunch of us from Mozilla started replay three years ago. And we're building the first time travel enabled browser dev tools.
2. Observations on Test Suites
We've been talking to hundreds of teams about their browser tests. Tests are more difficult at scale. Everybody blames their tests. The flakiness comes from the application. Small QA teams struggle to maintain the test suite. It takes a whole team to buy in and improve them reliably.
We're also the ones with the bucket hats. If you want a cool duck bucket hat, come by later. We've got them. We'd love to give them to you.
And over the past couple years, we've been talking to hundreds of teams about their browser tests. So, I thought I'd share a couple observations before we jump into some demos. Really, really simple talk. I want to do three demos.
So, first observation. Tests are more difficult at scale. You would think it would get easier at scale, but we've talked to the biggest companies in the world, and, yeah, Facebook and Apple and Google are struggling, and they've got all the tests. So, as it gets bigger, it gets harder. That is sad.
Second, everybody blames their tests. They're like, ugh, I can't write stable tests in Cypress. I'm going to switch to Playwright. It's going to be flaky there, too, because the flakiness comes from the application. Yeah, sure, there's some bad tests because of the code, too. But there's a lot more application code and backend code that's causing instability. We've got to understand that.
This one is personal to me. I see these small QA teams trying to maintain the test suite, and they can't. And then I see these other teams, sometimes at really big companies, where one dev is like, we've got to have a test suite. Okay, sure. They bring it in, they write a lot of tests. The devs look at the test, and they're like, it's kind of failing. They don't want to touch it. So, it's like, I'm that one person to try to keep this thing alive, and they're holding on and they bet their career on, like, hey, I think we should have tests. But people start to turn on the tests and blame the tests. It takes a whole team to buy in, but in order to buy in, you've got to have a way to actually improve them reliably.
3. Understanding Test Suite Health and Demos
Product and QA should work together to understand both the test suites and the application. The health of a test suite can be measured by the number of tests and contributors. End-to-end tests can be maintained easily and have a positive ROI. The best test suites contribute to delivering value and shifting with confidence. Now, let's move on to the demos.
Otherwise, flakiness is tough. Product and QA should be involved. I see all these teams where it's like the QA team or the product team. It's never like, hey, can we work on this test suite together? Because the QA team understands the test suites, and the product team understands the application. You kind of have to understand both in order to have a great test suite.
I believe you can measure the health of the test suite with two graphs that nobody looks at. The first is how many tests do you have, and is that going up? Or is that, like, plateaued? Or, in many times, going down because you added the tests? And they're slowly being skipped, disabled, deleted, et cetera. And the other graph is how many people are contributing to the test suite. Yeah, you had all these devs excited by the test suite, but now there's kind of a pain, and it's slowly going down. It's on that one person, like, keep it alive. If you can get a test suite that is growing as your application grows, and is getting more and more contributors, because more people care about the test and see the ROI, you are healthy.
The state of end-to-end tests does not seem healthy to me. Yes, it is possible. We have been helping teams improve their test suite and, as a result, have been happy with their test, adding more tests. It is possible to do it. I feel like I have to put that in at times, because there's so much anxiety around the end-to-end test suite. Oh, we have to get off the end-to-end tests and use unit tests, because those are more maintainable. If you can create a great end-to-end test suite, you can maintain it even easier than a unit test, because they're simple and they should be simple, click, type, et cetera. And then, lastly, for all the VPs of engineering out there, of the hundreds of teams I've talked to, the teams that are delivering the most value for their users also have the best test suite. The test suite is what helps them shift with confidence. So there really is an ROI on the other side.
All right. On to the demos. So I've got three demos today. The first one is our little Hello World eCommerce app. It's a little hoverboard. And you go to buy something, and you can't buy the hoverboard. That is the entire app. So this test ran, let's see, it ran two days ago. But with Replay, when we're in Replay DevTools, it says if the application is running locally on your computer right now, and you've got the Cypress app open, the panel here lets you see what the tests look like at any point.
4. Debugging with Replay DevTools
You can see the elements that were selected. When I click the special 'jump to code' button, it takes me into Replay DevTools, where I can debug the application as if it's running live. Understanding the problem and fixing it is easy once you figure out what went wrong. That's replay in a nutshell.
You can see the elements that were selected. Everything should just work, but there is this special button here, jump to code. And when I click it, it takes me into Replay DevTools, and I'm paused in the component that's handling that click. And now that I'm paused here, I can see that the click kicked off a fetch. So I can jump forward and I can hover on, let's say, the form data and see the value. I can see what form.action was, which is where the API was kicked off. I can jump forward in time. And when I jump forward in time, I can hover on response and this was not OK. Jump a little further forward, hover on the error message, see that too. I can debug the application as if it's running live and it just failed for the first time. Of course, you've got React DevTools, you can spec components, and in this case, you can also find a fetch and see the original fetch that was kicked off, the request, and response body. And once you understand the problem, actually fixing is pretty easy, but the key is being able to figure out what went wrong. So that's replay in a nutshell.
5. Analyzing the Dropdown Behavior
This app is Metabase, which offers a tool for querying databases and writing SQL queries. The dropdown in the test shows different options depending on whether it passes or fails. By inspecting the component and source code, we can identify the function responsible for determining the dropdown items. Adding a console log helps debug the issue, revealing that the test failed and only one item is displayed.
This is our Hello World app. Let's get to the more interesting ones. So this app is Metabase. And Metabase offers a tool that lets you basically query your database. So it's good for writing SQL queries.
And when the test fails, this dropdown here is showing one element. But when the test passes, the dropdown, ooh, it's so fast, testers are so fast these days. The dropdown has two. See how it says like use original value and custom? That's what's good. But in the failing case, it only says use original. So let's figure out why.
I'm going to jump into this test. And if I want to find the component that we're talking about, I could jump on this click, which is going to take me to the dropdown, which we're opening. You can even see there's a toggle here and everything. But a better thing to do would be to use React dev tools, find the component right here, scroll up because Metabase is massive. So all of these components are wrapping the thing that is this one field map settings component. You can see there's a field here, a table, all that good stuff. I'm going to jump to the source code. You can see that the three options for the dropdown are original, foreign, and custom. We don't care about foreign, but I'm going to grab this variable and search for it in the code.
So the first thing here is a little bit random, so I'm going to jump past that. And now we have this function called get available mapping types. And this is the function that the component uses to figure out which dropdown items to include in the component. And so what I'm going to do is just add a console log because console logging is the best debugging. Add the console log, rerun. In our case, you can add a console log by clicking this plus button and you're done. So I click it and there are console logs right there. But I'm going to change it to log the one variable that actually matters, mapping types. And this thing is the list that shows up in the dropdown. So here we can see that's just one item because the test failed and it's only showing one item.
6. Analyzing Test Failure
If it's a mappable numerical thingy, include it; otherwise, don't. The remapping variable is empty, so we don't have what we need. The field has no values in remapping. The data was empty, and the network request had no response.
But if I jump over to this other test where it passed and I find the same file and I find that same function get available mapping types. There you are. And add the same console log. There are two. And it's original and custom. So then the question is, like, OK, cool. It's only doing one thing. Why is it only doing one thing? Well, there is this function here where it says, like, if it's a mappable numerical thingy majiggy, great, include it, otherwise don't. So let's find where this has mappable thing is defined. OK. It's right above. And this thing is looking at the remapping variable and saying, hey, does it have any keys that do these things? Yada, yada, yada. All I really care about is remapping. And I can see the answer is not really. The map is empty. And because the map is empty, we don't have what we need. And we can just quickly see that the field is this field and it has nothing in remapping. And if we want to see where the field is getting us values, well, they use redux. So let's just check here. Sure. And in here, and in here, we have a field. We have a fetch. I believe there should be field values in here, too. This is kind of random to me. Field values. And sorry, the data was empty. No values. And then the network monitor will show the same thing. So we can go from, like, hey, the React component did this thing for this reason to redux, got this data, but it was incomplete to the network request, and the response had nothing in there. Working backwards.
7. Analyzing the Test Suite Dashboard
This is an example of a test in Replay's test suite dashboard that was failing but showed as passed. The test asserts that there are five flaky tests, but they all appear as passed. By using React DevTools, we can investigate the test and find the issue. Adding a condition to the test.source path helps identify the problem.
All right. Example 3. So this one is going to seem really meta. Because what we're looking at here is Replay's test suite dashboard. And this is a test that we wrote ourselves that was failing all of last week, which I believe we fixed yesterday. But I'm here, not there. So I'm not 100% sure. I just think that's the case.
And the other thing that's meta about this test is the test is asserting that there are five flaky tests. And it says there are five flaky tests right here, and it says there are five flaky tests up there in the yellow badge. But all of these results show up as passed. So what we're saying here is the test for showing failed results is failing because it thinks the test passed, but it really failed. I'm sorry. There's, like, many levels of inception going on here.
But we can do the same thing. We use React DevTools. We find the test result list item right here. We look at the test under question. The test had a result of flaky, so it believes it was flaky, but we still showed it as passed. We can jump to the test right here. This part's a little bit weird, so bear with me. If I add a const log, and I log the test.source pass, you're going to see a lot of these things, like 180 of them. Way, way too many renders. But I want to find the right one, so I'm going to go to react, and I'm just going to grab this one, the authenticated comments02. I just don't want to type, so bear with me. Comments01. Great. Okay. And now I'm going to add a condition. Test.source path equals this thing.
8. Reproducibility and Collaborative Debugging
And now we're only going to see the ones we care about. The test was flaky, but this label that's being passed down says it passed. With replay dev tools, we get to solve the reproducibility problem. Tests that fail one percent of the time are difficult to debug because they fail one percent of the time, and often not on your computer, only in CI. If you can get a recording, you can debug it as if it reproduces every time. And with replay, because it's a time machine, you can work backwards. Flaky tests are difficult because timing issues are really difficult to understand. But when you have a time machine and you can work forward and backwards and narrow in on the problem, some of the hardest bugs become easy.
And now we're only going to see the ones we care about. Great. And I can pause here, and I'll see the test was, in fact, passed. It should have been failing. Thank you. Bugs everywhere. Much better.
Okay. And the test was flaky, but this label that's being passed down says it passed. It's lying. Why is it lying? Well, I'm running out of time, so if you want to know why it's lying, I can share the replay with you later. Or it's open source, so you can just check it out yourself. Going back to the talk. By the way, this talk, which I can share, just has a link right there.
So, what do we talk about? With replay dev tools, we get to solve the reproducibility problem. Tests that fail one percent of the time are difficult to debug because they fail one percent of the time, and often not on your computer, only in CI. If you can get a recording, you can debug it as if it reproduces every time because you have it right there. And with replay, because it's a time machine, you can work backwards. The car has crashed into the tree. You can rewind the clock to when the car was careening and swerving before it hits the tree. We can start off with the React component that has a drop down that looks bad and work backwards to the state that caused the component to render the way it did and then where the state came from. We can collaborate. We can work as a team. You can start debugging a flaky test and say I got this far. I'm going to drop some comments and at mention other people on the team who can look at it. QA can ping you. You can collaborate. And I can't stress this enough. Flaky tests are difficult because timing issues are really difficult to understand. But when you have a time machine and you can work forward and backwards and narrow in on the problem, some of the hardest bugs become easy.
9. Understanding Replay and Time Machine Replay
And that's replay. Is replay an individual testing framework or is it always still reliant on something like play right and so on? Replay is a browser. We for Chrome and taught it how to record and deterministically replay later. One actionable recommendation to improve end to end test suites is to add replay. Internally, the time machine replay works by capturing all the communication with the OS and giving it the value from before.
And that's replay. And I hope you can all stop triaging your tests and have good tests.
One question is just trying to understand the framing of what replay is and how we talk about it. Is replay an individual testing framework or is it always still reliant on something like play right and so on? Oh, oops. Replay is a browser. There we go. We for Chrome and taught it how to record and deterministically replay later. Great. Thank you. That helps. Glad we got that one out. That's all good.
Okay. What is one actionable recommendation that teams can adopt right now to improve their end to end test suites? Add replay. Wow. Okay. Well, we are quick firing these, aren't we? This one is actually a really interesting question if you can talk somewhat about it. Internally, how does the time machine replay work? The key is to be able to replay. So, if I gave you a function, Fibonacci, you've got the function. You've got the input. You don't need to record anything. You just run it again. If you change Fibonacci to instead of taking the input, read the input from a file, that's what you need to record. Next time you run, it thinks it's reading from the file, but you've just captured that one piece. It's like MSW, but for software. What we've done with Chrome is we've taught it how to capture all the communication with the OS so that later it thinks it's running on your computer. It thinks it's yesterday. It thinks it's making all those API calls. Every time it makes an OS system call, we capture it and then give it the value from before. Yeah.
Using Replay for Test Debugging
You can always ask follow-up questions through Slido. Do you have to start the application locally to debug tests? Does replay need access to the source code? What is replay? To use replay in CI, tell your test framework to use the replay browser. Every test run creates a recording. Decide which recordings to upload and get a URL. Put the URL in a PR comment. Open replay dev tools in any browser to debug.
That helps. Thank you. And then, once again, you can always ask follow-up questions through Slido as well. So, yeah.
I think this one might have been answered in framing replay as a browser, but do you have to start the application locally? There's something about the workflow, a bunch of questions. Even the following one, it feels a little bit like this. Do you have to start the application locally to be able to debug tests? And the next one is, does replay have to have access to the source code? Do you have to publish the source code to use replay? I think these are all in the same realm of question.
What is replay? Someone's got it. Yeah. So, in order to use replay, let's say in CI, you have to tell your test framework to use the replay browser. Okay. I'm not going to use Chrome. I'm not going to use Firefox. I'm going to use replay Chrome. Here's the browser. Use it. Once you do that, every time that the test runs, it's going to create a recording. So, browser tab opens, browser tab closes, new recording file. When all the tests have run, you have all these recordings on disk. You decide which ones you want to upload. Maybe you just want to upload the failing ones. I don't really care. When you upload it, you get a URL. We put that URL in a PR comment. It's like, hey, your test failed. Want to click? Click. Then you get to replay dev tools. That's what I was showing you. But that means in any browser, Safari or Firefox or Chrome, you can open replay dev tools. It's going to be talking to the replay backend.
Replay Browser and Debugging Features
Replay is a browser that captures interactions with the underlying OS, allowing you to gather more information for debugging. It runs on your machine and can be downloaded from replayo. By clicking record, saving, and obtaining a URL, you can view the replay with a cloud-based browser that enables debugging features.
And in the replay backend, we have thousands of Docker containers. And each Docker container has a browser that thinks it's running on the original device. It's just replaying. This is almost like injecting yourself in a point of this process where normally you'd go off and use a standard browser, let's say. And then it's able to spy on every activity that happens within that test.
Yeah. Yeah. Sure. Cool. I feel like, again, you covered this a little bit, but it has a lot of upvotes, so I do want to talk about it explicitly. If you just had to surmise, what are the advantages of using... You can add a console log. If you look at Cypress replay or Playwright trace viewer or any observability tool out there, they're never going to let you pause at a line of code and inspect the state. You'll never be able to add a console log on line 10 and see the logs because they're not actually replaying. They're capturing the DOM and showing you the fancy video, which is cool. There's value in that. But it's not actually spinning up a browser that can replay and pause at a line of code.
Can we just take a few steps back on what is replay once again? Right. So it's a browser. It basically will capture all of the interactions that are made with the underlying OS, so you can capture a bunch more information and use that as part of your debugging practice. Replay itself runs on your machine. You download it and you run it. There's something in there that isn't quite sitting. Replay is also a service and a company that needs to exist. There's something in there that for me, at least, isn't quite locking in knowledge-wise.
Sure. So how would one go about setting up replay for a project? So to simplify it, anybody can go to replayo, click download, and get the replay browser. And when you use it, there's a little like record button in the top right. You click record, you do some things, click save, you get a URL, and then you can view that replay. Now when you're viewing that replay, there is a browser in the cloud that is replaying that session and letting you do all the debugging features.
Replay Browser and Collaborative Debugging
Replay is a browser that can replay and offers retroactive debugging. It runs on your machine and in the cloud, allowing you to share and save debugging actions with others. This collaborative approach aims to make debugging more accessible and inclusive. While Replay is based on Chromium with minor modifications, it cannot capture other rendering engines.
What that means is you get the retroactive debugging as part of that service. But at the core, it's a browser that can replay. We have replaying browser OS internals, but you don't think about that. You think about it in terms of what's in the network monitor. Can I add a console log on 10? Those kinds of things.
Interesting. I think that makes sense. Cool. And so really, the service side of it is the fact that it then also runs in the cloud. That's the key. And in theory, following on to the next question, can replay record my debugging actions too so I can share and save them with others? That's where commenting is really nice. You're debugging, you're like, ooh, this line is interesting, this line is interesting, this line is interesting. Great. Drop some comments. And then later you can be like, what was I doing? Oh, that's what I was doing. And you can share that with others as well. And obviously that is part of the appeal of having a service that kind of... I mean, it's kind of weird it's hybrid. It's like it runs on your machine, but to really unlock the power of it in a team setting, you also want it available. I think of it like Figma. There are a lot of designers before Figma who are kind of working their own sketch file or Photoshop file. But when you bring Figma in, then the entire team can be part of the design process. For me, I think about all the developers who are doing all the debugging by themselves in a dark room at like 2AM and it's hard. And it's so nice to make debugging, which has been this historically walled garden, you can do it, but you can't do it with others. And only people who can debug for a long time by themselves can become developers, making that collaborative as well.
This is a really interesting question. So if Replay is its own browser, won't there also be an issue of not working in exactly the same space as users? I mean, it is Chromium. We've made minor modifications. Cool. But, again, it's not like I can capture other rendering engines.
Replay Browser and Testing Capabilities
It is a Chromium browser, and we are testing in the context of a Chromium browser. All runtime should be recordable. We come from Mozilla and started with Firefox. We also have a node recorder for the backend. Python, Ruby, Java, and Safari are recordable. Replay can do unit and component testing. Component testing would feel almost identical to what was shown. Replay DevTools can access app code even in black box testing. Think of Replay DevTools as Chrome DevTools. Source maps are needed, and they can be uploaded to Sentry and other tools for prod settings.
It is a Chromium browser, and we are testing in the context of a Chromium browser. I think all runtime should be recordable. So we come from Mozilla. So we started with Firefox. Chrome is kind of important, so we're prioritizing Chrome. But I see a future where, I mean, we also have a node recorder, too, for the backend. Python, Ruby, Java, obviously Safari, all are recordable.
There's a bunch of questions here that I think might be a little bit more quickfire. So can Replay also do unit and component testing, or is it mainly focused on end-to-end testing? Sure. You can do component testing. Yeah? How would that feel different to what you showed us today? It would be almost identical. Yeah? If the component test runs in the browser, you've got it. If you're focused on node and you're using a node recorder, sure, you can do that. Cool.
Just taking a very quick look at some of the other questions while we have just a couple more minutes, can Replay access app code, even if I'm doing black box testing, meaning you can't access the app code itself? Because I think there was something interesting there where you kept moving back into the code. But I suppose that's the shipped code, that's a bundle that is shipped. So think of Replay DevTools as Chrome DevTools. You don't have to do anything to set up Chrome DevTools. You just open up Chrome DevTools, like, hey, I can inspect my app. But you do need source maps. And most source maps are not shipped to prod. Like if you're in development, you have source maps. Same with replay. If you're using replay and there's source maps in CI, you're good. Nothing you need to do. If using replay in dev and there's source maps, nothing you need to do. If you ask support to record the bug in prod or support asks a really important user who wants the bug fixed to record a replay, you don't have source maps. So you're kind of in trouble. But what people do to get around this is they upload their source maps to Sentry and other tools. So the source maps are available in that prod setting, but only, like, for those services.
Integrating Replay into CI and Cost Considerations
If you update your build tool to also upload to replay, you'll have source maps as well. For Cypress, it's like npm install replay Cypress, add it to the Cypress config, you're done. For Selenium, download the browser ahead of time, update the web driver IO config, browser path, path to your replay Chromium, you're done. Could you use replay for testing an Electron-based app? Electron's like half Node, half Chrome. We don't have an Electron build, but you can imagine if you have a Node recorder and a Chrome recorder, like, we have an Electron build. Does replay work in a headless mode as well? Yeah, we do both. It's just a browser. Let's talk about the cost of the cloud part of the service and how it contributes to a sustainable business.
If you update your build tool to also upload to replay, you'll have source maps as well. Cool. So, yeah, I think that just the other part that isn't quite sitting with me, too, is, like, so how would you integrate replay into a CI pipeline? Yeah. For Cypress, it's like npm install replay Cypress, add it to the Cypress config, you're done. For Selenium, download the browser ahead of time, update the, I don't know, web driver IO config, browser path, path to your replay Chromium, you're done.
A question that's just come in that's interesting, I want to just take a moment on it, could you use replay for testing an Electron-based app? So it's not strictly running in the browser, but it is. I mean, what is Electron? Electron's, like, half Node, half Chrome. We don't have an Electron build, but you can imagine if you have a Node recorder and a Chrome recorder, like, we have an Electron build. Yeah. Let's just say it's on the road map. Cool. I mean, yeah. Sure. Does replay work in a headless mode as well? Yeah. And of course, that's needed to, like, run it in CI environments. Yeah, we do both. It's just a browser. I'm going to just take just a couple of seconds. We have just a minute more. We have so many questions. I'm going to literally, give me 10 seconds, I'm going to find the ones that we spend this last few moments. Can I do the costing one? Yeah, there you go. So I could just go home. You could do this. Go on. It looks fun. Sure. So we've already spoken briefly about that cloud part of the service being the key for a sustainable business. It will cost money in some ways. Let's talk about that.
Replay Cost and Celebration
Replay is cheaper to record than a video, and it has lower overhead compared to other tools. You can choose what to upload, reducing costs. Join in celebrating the completion of the track.
How does it work? So two things on cost. The first is overhead. So the really weird thing about replay is it's cheaper to record a replay than to record a video. So when Cypress had Cypress video, that added more overhead than recording a replay. The video that's in our replay dev tools, we created after the fact while we were playing. So replay is really cheap when you're recording in CI from just perspective overhead.
The second thing is on the cost side, most tools say upload everything. We say upload the things you want. And when you upload just like the five failures for this test and five failures for this test, et cetera, et cetera, it's fairly reasonable. We try to keep cost pretty good. That's never been a reason people haven't used replay.
Cool. Please join me in giving Jason a massive round of applause and make it even bigger because this track is done. We did it. We did it.
Comments