English versionEN

[EN] Beyond React Testing Library: Testing React Libraries (and library-like code)
[ES] Más allá de React Testing Library: Probando bibliotecas de React (y código similar a bibliotecas)

Beyond React Testing Library: Testing React Libraries (and library-like code)

When it comes to testing library code, the (usually amazing!) "Testing Library" approach quickly hits its limitations: We often need to test hot code paths to ensure additional guarantees, such as a specific order of DOM changes or a particular number of renders.
As soon as we start adding Suspense to the picture, it even gets almost philosophical:
How do we count a render that immediately suspended, and how do we distinguish it from a "committed" render?
How do we know which parts of the Component tree rerendered?
In the Apollo Client code base, we're using the React Profiler to create a stream of render events, enabling us to switch to a new stream-based testing method.
After testing this approach internally for a year, we have released it in a library that we want to present to the world.
I'll also briefly look into other "testing-related" problems far outside the norm that we've come across and share our solutions:
How to test libraries that bundle different code for React Server Components, streaming SSR runs, and the Browser: Testing your growingly complex `exports` fields and ensuring all those environments export the package shape you expect.
We'll even briefly look into rendering React components in different environments to test them in isolation - be it in Server Components, Streaming SSR, or simulating stream hydration in the Browser.

This talk has been presented at React Day Berlin 2024, check out the latest edition of this React Conference.

FAQ

The new testing library has been used for over a year, resulting in 544 tests in Apollo Client that have resolved many flaky tests.

You can find Lenz on GitHub as FryNias, on Twitter as Fry, and on BlueSky as Fry.dev. He is also available for Q&A on Discord.

The speaker is Lenz Wiebertronik, and the talk is titled 'Beyond Testing Library, Testing React Libraries or Library-like Code.'

Lenz Wiebertronik is a Senior Dev Software Engineer at Apollo GraphQL, maintaining the Apollo client for web. He also maintains Redax Toolkit and is the author of RTK Query.

Lenz addresses the challenge of testing React libraries and library-like code, particularly when the React Testing Library may not be suitable for optimizing hot code paths.

The key concerns are minimizing unnecessary re-renders, preventing data tearing, and ensuring extremely granular rendering.

Lenz suggests using a new testing library called Testing Library React Render Stream, which leverages the React Profiler component for more reliable testing of hot code paths.

This library allows for step-by-step assertions on renders, reduces the complexity of writing tests, and addresses the limitations of React Testing Library for certain use cases.

Lenz warns that testing approaches may change between React versions, requiring updates to tests. There's also a caution about using 'Act' as it batches renders, which may not be desired when counting renders.

react testing

Lenz Weber-Tronic

22 min

16 Dec, 2024

Comments

Video Summary and Transcription

My talk is called Beyond Testing Library, Testing React Libraries or Library-like Code. We want to optimize code by minimizing re-renders, avoiding tearing, and ensuring granular rendering. React Testing Library may not always be the right tool for libraries or library-like code. We test for synchronous results, but there are cases where unwanted re-renders and inconsistencies can occur. We need to avoid flaky tests and bug propagation. The new Testing Library React Render Stream library simplifies testing by replacing complex wrappers and assertions. We test for multiple independent components and ensure correct re-rendering. We introduce Suspense and DOM snapshotting to test granular rendering. The final test provides increased confidence and meets all special requirements.

Available in Español: Más allá de React Testing Library: Probando bibliotecas de React (y código similar a bibliotecas)

1. Introduction to Beyond Testing Library

Short description:

My talk is called Beyond Testing Library, Testing React Libraries or Library-like Code. I want to optimize code by minimizing re-renders, avoiding tearing, and ensuring granular rendering. React Testing Library may not always be the right tool for libraries or library-like code. For such code, we need to consider special requirements. Let's take a look at testing useQuery hook in Apollo Client, articleQuery, or reactQuery using React Testing Library.

Hi, there. My name is Lenz. My talk is called Beyond Testing Library, Testing React Libraries or Library-like Code. A short word about me. My name is Lenz Wiebertronik. I work as a Senior Dev Software Engineer at Apollo GraphQL. And there I maintain the Apollo client for web. But in my free time, I also maintain Redax Toolkit. I'm the author of RTK Query. And due to my ADHD, I actually maintain a bunch of smaller libraries, too. You can find me on GitHub as FryNias, on Twitter as Fry, and not on the slide, but as Fry.dev on BlueSky.

Generally, why are we here? It's a bit of a hard thing because I love testing library. I think it's amazing. But it's not always the right tool for me as a library author because React Testing Library tests for eventual consistency. And in testing library, I might not always want to look for that because libraries are a hot code path and we need to optimize for that. So for the libraries that I write and maybe also libraries that you write, either open source or as an in-house library that's shared by multiple teams, or just library-like code, we might have some special requirements that are not something that you would usually test with React Testing Library. So what are these? Well, first, I want to ensure that my code doesn't cause any more re-renders than absolutely necessary because this is like code that's in the middle of running everything. You want to have that optimized the way before you start optimizing your own code. So we do our best here.

Beyond the re-render thing, another important thing is tearing. That means that we don't want to mix data from the present with data that might be on the screen in the future or in the past. So inside of a hook, that could mean we return inconsistent state. Inside of a component would mean that two hooks might return inconsistent state with each other. And in your whole application, it could mean that one component here shows state from the future while another component down here shows state from the past. And then there's the third thing that I want to optimize for, which is extremely granular rendering. I only want the component that's absolutely necessary to re-render to re-render, and not its parents or its grandparents. So that said, if I had a hook like useQuery in Apollo Client, articleQuery, or reactQuery, how would I test that? Let's look at an example with the React Testing Library first, and this is a very common test, I believe. Here, we would start rendering our useQuery hook, and then we would start making assertions on that. So first we would test for this case on the left, loading should be true and data should be undefined. And then we test for this case on the right, where loading is false, but data is hello world.

2. Testing for Synchronous Results

Short description:

We test for synchronous things, but there are cases where other factors might cause green results that we want to avoid. For example, setState calls or usingExternalStore can cause unwanted re-renders. There are also tearing cases where loading is true but data already has the final result. These inconsistencies can lead to incorrect values being displayed.

The way we do that here is that we test for the things that can be tested synchronously, and then we wait until loading is false, and then we test for data to be equal hello world. But of course, this is the happy path, this test will always be green, but other things might also be green and we might want to avoid them. So let's look at this case, and that's a very common case where we might have like another setState call inside of our hook that doesn't really relate to the output of the hook, but it causes a re-render. Another example would be a usingExternalStore call that does the same. We want to avoid that, but with this type of test, we really have no way of determining if it was the case. Something else would be a tearing case where loading would still be true, but data would also already reach the final result. So here we have an inconsistent return value, and the way we test that, it just stays green because we just test for loading to be false, and in the meantime, data could take any value. That could even get further, and loading could also take a different value. So in this case, just a bunch of puppies. And data could have a completely unrelated value, and we would not be able to detect that with this test or most other tests too.

3. Avoiding Flaky Tests and Bug Propagation

Short description:

We can encounter a race condition where loading is false and data is hello world, but there is a small tick in between that could trigger a rerender. This can lead to flaky tests, which we want to avoid. While this behavior is acceptable in a normal app, it becomes problematic in libraries or heavily reused code. We have tried different testing approaches, but the profile component has proven to be the most effective. By wrapping our hook call in the profile component and using the current result and onRender functions, we can make assertions and avoid bugs in our code.

This goes even one step further, and we have a race condition that's possible, and that would be that loading is false, which is this here, and then data is hello world, which is this here, but in between data and loading switch. So why can this happen? And we have to rewrite the whole test a little bit to see why this happens.

And if we assign a variable here with a promise, and we await it down here, we suddenly see that the await here happens after this test. So there is a small tick in between. During that tick, React could rerender. Of course, that's unlikely. That needs very specific timing. I've seen that quite reliably in local host, but I wouldn't see it in CI. So this would end up being a test that most of the time works, sometimes not, and it would just be flaky. And we don't really know if it's flaky for a good reason or flaky for a bad reason. So we want to avoid that.

All that said, let's take one step back and reassure that all of this is totally fine in a normal app. It would prevent things from crashing eventually, and the UI would get there. It would maybe show a wrong state for a split second, but probably shorter than a person could blink. But if we are writing a library or just heavily reused code, this might not be okay anymore, because this one bug will propagate to 100 places or, in our case, to thousands of applications. And also, you add a lot of libraries to your app. So assume you add 20 libraries. Each library comes with one or two of those bugs. You haven't written a single line of your own code and you already have 30, 40 bugs in your code. So as a library author, I want to avoid that at all costs. But how do I test this now? We tried a lot of different things, like counting renders during the execution of the render function of a component or making assertions in there, all kinds of things. But at one point, everything broke down, like be it the change from React 16 to 17, be it the introduction of suspenseful code. It didn't really stick. So the only thing that stuck that became about a year ago was using the profile component.

So let's zoom in on this. First, we wrap our hook call here in the profile component. And then, inside of our hook, we assign to a current result. And then, during onRender, which is a function that will be executed exactly when a render finishes, we just take the latest current result and we put it on top of an array. And that means that now we can essentially step through the array, and the array will be as long as the amount of renders we had, and we can make assertions. So we wait until we have at least one element in here.

4. Introducing Testing Library React Render Stream

Short description:

To make tests more readable, the new Testing Library React Render Stream library simplifies the process by replacing complex wrappers and assertions with a create render stream function. By using the take render function, we can step through each render and make assertions on the snapshots. Additionally, the library eliminates the need for set timeout and provides a more convenient way to test hooks. However, it is not possible to test for component render count, but consistency testing for both the hook and component is still achievable.

So that's the first render. We make our assertion. Then we wait until we have a second element in the array, and we can make assertions on it. That's the nice thing, because if there was only one element, this will just throw and wait for it to continue repeating. In the end, we even wait like 100 milliseconds, and we make sure that there hasn't been another rerender in that time by just asserting on the length of render snapshots. This essentially does what we need, but honestly, it looks horrible, and I don't want to write at least two tests with that, and I need to write hundreds.

So in the end, we are library authors, so we know how to build a library, right? This is where we come to a little problem with this talk, because back when I submitted this talk, I called it Beyond Testing Library, but the reality is that by now, I'm introducing a new testing library. It's called Testing Library React Render Stream, and it's a new testing library based on that profiler component we saw before, but tacking it away so it doesn't annoy us anymore so we can test hot code paths reliably.

Let's get back to that test that we had earlier with a profiler and see how we can make that more readable. It starts by removing all of this weird wrapper, and the current results, and the render snapshots thing. We just move forward. We replace it with something easier, and we say, create render stream, and we get back an object that has at least a replace snapshot function and a render function. We can call that replace snapshot function in our component with the return value of use query, and then we call the render function, which is essentially the same render function with a few adjustments that we already know from testing library.

Now we have these assertions, the wait for us are not nice. Working with an error here is not nice, so how can we change this around? We take a take render function from our render stream, and that take render has the nice thing that it returns a promise of the next render, and the next render that will happen or already has happened. We can just step by step always call take render, and we will go through everything render by render by render. In this case, I'm using this notation with loan blocks, so I can reuse variable names so we don't have snapshot one, snapshot two, and snapshot three. We just say, we take the first snapshot, and we assert on it. We take the second snapshot, and we assert on it. That leaves us with this set timeout down here, which is also not nice, so let's replace that with something nicer. We can do expect take render, not to re-render. This has a default of 100 milliseconds, but you also can add an option and configure it. The last thing here is that react-testing-library also has a render hook, and we have this create render stream with a render call. We can make that simpler and if we use a render hook to a snapshot stream with a hook directly here and instead of having to take renders and take the snapshot out of the render, the snapshot is all we are ever interested in when we test for hooks. So here we can directly do take snapshot and use that. And that's actually a pretty nice test. So let's look back at the special requirements we had earlier. We had that component render count that we wanted to test for, and we can't do that. We can't test that no more renders happen. And for consistency testing, we can test the hook and we can test the component.

5. Testing for Multiple Independent Components

Short description:

Hooks should return the same data over multiple components, but they might not always have the right timing. We test for useQuery and useFragment hooks with different limitations and rendering behavior. By using createRenderStream and mergeSnapshot, we can assert on the snapshots of loading being true and both hooks returning undefined, and loading being false with both hooks returning hello world. Finally, we ensure the correct re-rendering of components at the right time.

So those are not really a problem. That leaves us with the last testing over multiple independent components. And let's take a step back and see why that's important. Hooks should return the same data over multiple components, but they might not always have the right timing. If you look at setState, that might have different renders in all React versions. If you look at using external state and setState in the same component, in React 18, those would batch individually together, but you would still have one render with all setState calls and one render with all using component calls, using external state calls. So here we want to have a way to test if that really works. And that bug has been fixed in React 19, but we don't know that always, and we have to test for it. So here we test for useQuery and useFragment, two hooks with different limitations that in the past had slightly different rendering behavior. Again, we do a createRenderStream, and we this time use an initial snapshot to give the whole thing a little bit of a shape. And we say that our snapshot should contain a query result, and it should contain a fragment result. And instead of using the replaceSnapshot function, we use the mergeSnapshot function. Then we write two components. One of them calls mergeSnapshot with the result of useQuery, and one with the result of useFragment, and we render them next to each other. And then we can take our renders and do assertions on the snapshots. So loading is true, both hooks returned undefined. Loading is false, both hooks returned hello world. With this, we ensure that there hasn't been a third render where one hook might return one thing and one hook might return the other thing. And of course, we test that there will not be another re-render in the end. So this gives us this third checkmark. And that only leaves us with the last thing of re-rendering the right component at the right time.

6. Introducing Suspense and DOM Snapshotting

Short description:

We start with an application that introduces Suspense for granular re-rendering. We test for two renders using render to render stream, replacing the snapshot and asserting on data. To ensure correct rendering, we look at the DOM by adding snapshot DOM option to render to render stream, creating full DOM snapshots for assertions. Take width and DOM out of take render to use queries like screen or utils in the React testing library.

We start with this application here, and this introduces Suspense, which will make React re-render different things at different times without re-rendering the app component around everything. So we have a suspense call fallback with a loading component. We have an error boundary, and we have a component. Because things will get complicated otherwise, let's remove that error boundary for now, and let's start with this kind of more simple example.

We want to test for two renders, and we use render to render stream. The first thing we do is snapshots because we already know how that works. So we use replace snapshot here, and we replace the result always. And during the first render, we assume that the snapshot will be undefined because this component will not have rendered. Instead, the loading component will have rendered. And then during the second render, we assume that data is equal to greeting with hello.

This alone doesn't give us a lot of security, though, because we don't know if our loading component actually rendered or something else. So this test, we have to look at the DOM. And what we can do is DOM snapshotting. So we add this snapshot DOM option to our render to render stream. And that means that we can create a new full DOM snapshot for each render as it happens and look at them later and make assertions on them. Keep in mind, this might use a lot more memory, so do it sparingly. So here, we take width and DOM out of take render, and essentially, that's like screen or utils. If you would be using the normal React testing library, you have the same queries available here.

7. Testing Granular Rendering and Final Remarks

Short description:

We test that get by text loading is in the document, and in the second render, it's not in the document anymore, but hello is in the document. To achieve granular rendering, we use use track renders to ensure that only the children re-render. This comprehensive test provides increased confidence and meets all special requirements.

So first, we test that get by text loading is in the document, and in the second render, we test that it's not in the document anymore, but we want hello to be in the document. Both of these were nice, but there are still a few more things I want to test. Especially, I want to test that app only renders once and only the children re-render. So granular rendering, how do we do that? We use use track renders. We add that to every component, and that's something that we have to keep in mind. We have to write these components for the test, and then we can use this rendered components thing that we take out of each render, and we can assert that first, it's strict equal renders only app and loading component, and during the second render, it only re-renders component, and then we go ahead and we add our boundary back in, and we take all of these into one big test, and this test actually gives me a lot more confidence. So we checked off our last special requirement.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Integration Testing for React Native Apps

React Finland 2021

25 min

Integration Testing for React Native Apps

Gleb Bahmutov

Mercari

My unpopular opinion is that testing is ... important. How do you test your React Native apps? In this presentation I will show how to run full integrations tests using Cypress while the RN app is running in the browser. This method can cover most of the application's code and be effective at finding logical errors and mistakes when calling the server APIs.

testing react native cypress react native react native detox react native testing react testing

Cypress Component Testing vs React Testing Library

TestJS Summit 2023

25 min

Cypress Component Testing vs React Testing Library

Watch video: Cypress Component Testing vs React Testing Library

Murat K Ozcan

Staff Engineer & Test Architect at Extend

The Talk discusses the differences between Cypress component testing and React Testing Library (RTL). It highlights the benefits of using Cypress Component Testing, such as easier handling of complex components and a more stable testing experience in CI. The comparison between SignOn and Jest focuses on low-level spying and mocking capabilities. The comparison between Cypress Intercept and Mock Service Worker (MSW) examines their network spy and mocking capabilities. The Talk also emphasizes the superior developer experience and observability provided by Cypress component testing compared to RTL.

testing cypress cypress react cypress react native react testing