Video Summary and Transcription
The Talk discusses optimizing rendering of big tables using Flipper, a new version that is ten times faster with improved user interaction and richer data. It explores optimizing rendering with React, virtualization, filtering, sorting, and windowing techniques. The introduction of the Flipper Datasource packet simplifies handling updates, inserts, and removals. The performance of the Flipper data source package is excellent, even in a debug build of React, with minimal CPU usage. The Q&A session covers incremental sorting, dynamic row height, and the potential for two-dimensional virtualization in the future.
1. Optimizing Rendering of Big Tables with Flipper
I'm Michel Vestrater from Facebook, and today I'll talk about optimizing rendering of big tables using Flipper. The original implementation was slow, especially with filters and line wrapping issues. We built a new version that is ten times faster, with improved user interaction and richer data. The performance improvement was significant, with a ten times frame drop and reduced CPU usage.
Good morning, everyone. My name is Michel Vestrater. I work at Facebook, I think, because it might have another name today. But it's good to be here and it's very good to see all your faces without also seeing your bedroom and your laundry and your empty bowl of porridge. Thanks for having me.
I'm working currently on a project called Flipper which is an open source platform which is used primarily by mobile developers. And I'm going to talk about how to optimize rendering of really big tables. And in this talk, I'll feature a little bit about Flipper. I'll find an actually useful application of bitcoins and I will show you an open source library.
So first of all, there's a tool called ADB which is used for debugging mobile applications. And if you run, for example, ADB log, it just generates a lot of data. It can easily generate up to 100 lines per second or something. And so we're working on Flipper, a tool that allows you to like inspect layouts in a mobile device and see the network requests of your React Native application and so forth and so on. However, we're going to focus in this talk on one specific feature, which is the logs. And the logs, they basically show the very same output as you just saw from ADB, except that we parsed it so it can sort a filter.
Now, the original implementation we had isn't very fluent. Like it starts a bit, it doesn't really keep up, and especially if you apply some filters, it kind of slows down. And also, it doesn't properly wrap lines, which is really annoying if you use logs. So we figured that this is kind of too slow, and actually what we detected, even in production builds, if the filter is slightly complicated, and you have 100,000 log lines in there already, then a single render pass of our React component could easily eat up to 250 milliseconds, so you end up with four frames per second, which is really annoying.
And so we figured we needed a better implementation, and we built one. And so this is the new version, which is much more smooth, but actually keeps up with the data that arrives over ADB, and even if you're filtering, it does barely affect performance. And we made the user interaction a lot better as well, so we have links that are highlighted, lines wrap automatically if you resize the window, that kind of stuff. So we made the data a lot richer, and we even added things like being able to sort on any column you want.
And so how did this affect performance? And it turns out the performance of the new implementation was ten times faster. And we saw that so on Facebook, we measure like what the performance is, how many frames we drop, what CPU load is for our users, and we saw basically a ten times frame drop, and a much slower usage of CPU. And if you look at the profile in Chrome, we can actually explain that. And so the left two sections are basically the old situation where there's a lot of CPU going on and most of it is yellow. And yellow means this is running JavaScript code, our code of Flipper. And in the new situation, there's still a lot of CPU usage but it's spent differently. It's no longer spent on the yellow part, no longer on the scripting.
2. Optimizing Rendering with React
Now it's doing a lot of layouting and styling, and it means that it's cranking out much more frames per second. We set up a WebSocket connection to Coinbase and start listening for three different exchange rates. We introduce some state to store all the rows that arrive and apply filters and sorting functions. Then we render the rows one by one.
Now it's doing a lot of layouting and styling, and it means that it's cranking out much more frames per second. So, in short, purple is good. Yellow is bad. And so, how did we do that? And to explain it a little bit, I'm going to show you a small demo application which streams a lot of Bitcoin transactions so that we at least put all those wasted CPU cycles to good use. And I'm not as courageous as Sarah, so I just pre-recorded it. Anyway.
So let's start with the basis. What do we do? We set up a WebSocket connection to Coinbase and we start listening for three different exchange rates. And for every new message we arrive, we collect callbacks. And the data basically looks a little bit like this. Just an exchange of a certain rate. And so what happens here is that we're not really rendering at all. Sorry. I'm messing this up. So what Sarah can do with live coding, I can do with a video. Isn't that amazing? And so the idea here is that we don't render anything yet. And if you start streaming, you will see that we basically are receiving a thousand items per second. CPU isn't doing anything because we don't do anything with it. But this is our baseline. So conceptually, this is what is happening. So let's throw React into the mix.
So we have our little base components and we start to introduce some state in which we store all the rows that arrive. We also start with a base set of ten thousand rows to have something at least. And every time any row arrives, we update the state, allocate a new array and put it in there. And then if there's a filter active, we apply that to the dataset. And similarly, if the user wants to sort the sections by price, we apply a sorting function. So, then we map over all the rows and we render them out one by one. So if you run this... Oh, I think the presentation is broken when moving it over to a laptop. Anyway, that's what happens if you change it at the last moment.
3. Optimizing Rendering with Virtualization
If you render this, it renders pretty okay as long as the data is static. But as soon as we start to filter it really starts to slow down. We have now five frames per second and CPU at the bottom is maxing out. The obvious thing to do in these situations is apply virtualization. Take a terminal, for example, it used to have a fixed amount of scrollback lines, but engineers want unlimited lines for quick scrolling, searching, and filtering. The same holds for dashboards with real-time data updates. Applying virtualization is the typical solution, with React Virtual being a highly recommended library. It provides a hook for rendering large datasets efficiently.
So if you render this, it renders pretty okay as long as the data is static. But as soon as we start to filter it really starts to slow down. We have now five frames per second and CPU at the bottom is maxing out. And if we start to enable the stream, we actually are able to process like three new records per second and that's because we're rendering all of those 10,000 rows over and over again.
This is really slow, not a good idea. The obvious thing which we do in these kind of situations is apply virtualization. At this point, there's usually some people who are like, why are you rendering 10,000 rows in the first place? That's not a good user experience. Why show so much data? And my experience, that's usually a kind of a lame excuse to avoid a hard problem. And it's very simple to prove. Take a terminal. A decade ago, it used to have like a fixed amount of scrollback lines, like maybe 1,000. And what every engineer does is he puts it to unlimited. We want to have all those lines at our fingertips when we do an NPM install. We want to scroll back very quickly, search, filter. And I'm pretty sure that most engineers would quit that job if someone suggested that the terminal should be paginated. And the same holds for many dashboards that show a lot of commercial data, graphs, rows that update with realtime stock information. Having thousands of data points with hundreds of updates per second isn't really that strange at all.
So the typical thing to do at this point is to apply virtualization. There's a bunch of good libraries out there for React, React Window, Virtuoso, but the best one I found is React Virtual by Tanner Lindsay. And it's really good. Basically, the usage is pretty simple. It provides you a hook, but it also supports row wrapping. And the idea is that you call this useVirtualHook, you give it the amount of data you want to render and a container which is used for size calculations. And then it gives you back a bunch of items to map over, and for each of them, you apply some styling. It's a bunch of style, but it's really straightforward to build this.
So let's put this one to the test. So, again, on the static dataset, it scrolls very fluently. And the interesting thing is, like, once we start streaming in the data, we can now, instead of three, four items, we can render a – we can receive a couple of hundred items before we start to slow down. Now, filtering is still expensive. Like, once you apply the filtering and sorting, you see really the amount of the items you can process in the meantime drop down.
4. Optimizing Rendering: Filtering and Sorting
React Virtuall solves the problem of fast scrolling for static datasets, but it doesn't address the expensive data processing. Memoizing and debouncing can help to some extent, but they don't fully solve the performance issues. To optimize rendering, we can filter only the data that arrives, rather than the entire dataset. By sorting rows in a ref instead of state and updating the visible row set based on the filter, we can improve performance.
And we're nicely maxing out the CPU as well. So, React Virtuall solves the problem for us if we have a static dataset and we need to be able to scroll really fast. That's the problem it solves. However, this doesn't solve our problem. Because that's not the only part that's expensive, rendering. It's also the data processing. If every time we receive a new item, we have to map over the whole dataset, filter it again. And if they're sorting active, also have to re-sort the dataset. That is actually where the performance goes down the drain.
So, the virtualization we already had in the very first example I showed. And yet we still had 250 milliseconds of per render just because of all the processing that goes on. And there's a few tricks that people commonly apply. Like, we could memo the filter rows. But that doesn't really help. That only helps if the filter changes now and then. But not if your rows are changing all the time. If you have dozens of updates you have to process, then your memo is never going to have a positive cache hit. And similarly, you can de-bounce, which makes sure that at least your application kind of stays usable. But it doesn't drop the performance of a single render. And it does allow you to go beyond those four frames per second in our use case. So what can we do smarter?
One thing we can do smarter is not filtering the whole dataset we have. Actually, we already know for every item whether it matches the filter criteria or not. We only need to filter the data that arrives, not the data that we already have. So let's do that. So the first thing is that we're not going to sort our rows in state, which would cause a render every time we changed it, but we sort in ref. And the state we're going to contain is just the visible rows. Now, if a new event arrives, we apply it to the base set, and then we also check if it matches the current filter. If it matches the current filter, then we also update the visible row set. And then there's one other thing to do, is that whenever the filter criterion itself then we need to re-filter the whole set, but that's now the exceptional case. And so let's put this one to the test.
5. Optimizing Filtering, Sorting, and Windowing
We optimize filtering and sorting by only updating the state if the item matches the filter. This reduces the filtering cost from O(n) to a constant overhead. For sorting, we leverage the fact that the data is already sorted and use a binary search instead of a full sort. This improves performance from O(n log n) to log n. Additionally, we can window the data we process for further optimization.
So remember, the important thing is that we only update the state if the item matches the filter. So in the base case, our performance is the same as in the previous example. However, if we start filtering, now the performance goes up in the sense of the amount of new items we can process. Because now in 2 out of 3 of the transactions, we don't cause a render. So we leave much more CPU capacity available to process that incoming WebSocket. So, that's a good improvement already. This moves our filtering cost from the order of n to a constant overhead.
Can we do the same for sorting? And the answer is yes. We can also do the very same thing for sorting. How do we do that? The basic idea is that we leverage the fact that we know that whatever we're rendering is already sorted, if it were sorted before. So whenever new data arrives, we don't need to resort the full set. Instead, we can use a small utility from Lodash. And find by using sorted last index by, the binary, the insertion index at which we have to insert our new row. And then there's only a binary search instead of a full sort on the data. So they move from basically N log N to log N. Again, the same thing applies for if the sorting itself changed, we have to do the full set, but, again, this is the less common case.
So let's put this one to the test. And remember, the crucial thing here is that instead of sorting the full set, we do a binary search to find the insertion position. And so, again, we process the data. We start tailing. We start filtering. And we have the nice performance. And now we enable sorting. So now the data set is sorted. And yet it doesn't really affect our performance anymore. And normally sort is really expensive. Now it's not impacting that much. So now we optimize filtering, optimize sorting. Can we do even better than this? And the answer is yes. We can also start to window the data we process.
6. Applying Windowing Earlier
We can apply windowing earlier at the moment we process the data. If a new row is added outside the current view, we don't need to trigger rendering or re-render React.
Now it sounds a little bit controversial because we are already applying windowing, but we are only applying windowing to the render. But actually, if you think about it, we can apply the windowing earlier at the moment we process the data. So when a new row is added, but it's effectively added outside the current view, we don't even need to trigger a rendering. Even if it matches the filter and finds insertion index. If after that we find it's not within the viewport, there's actually no need to cause react to re-render and do those fertilization computations. So let's apply that trick as well.
7. Optimizing Rendering: Windowing and Performance
We introduce a window to capture the visible set of rows and use a force update trick to render React when needed. After each render, we store the rendered range and check if an insertion index is within the visible window. There's an exception case for triggering a render to ensure the scroll bar keeps growing. Testing shows improved performance with minimal CPU usage.
So now we don't have any state anymore. And to capture what we're currently looking at, we introduce a window that just starts the visible set of rows and we use a trick, force update, to force a render of react whenever we want. And if we don't want to render, react will never re-render.
Now the next thing we do is after every render, we store what range we did render, the range we got back from React virtual, sorry. And then we check when we found insertion index of a row, we check is it within the Then we check React to render, and otherwise not.
Now there's one exception case, and that is once in the hundred rows, we're still going to trigger a render, and that is to make sure that our scroll bar keeps growing if more data arrives, even if all data arrives outside of the current viewport. So let's test this one. So we start streaming, and you see that, like, we process a lot of data, especially if we don't tail, because then all the events happen outside the window. Every tail, we have still the same performance as before, but with our tailing, everything, almost everything is added outside the screen, so, we barely use any CPU. This is also the first time that the CPU actually drops below maximum usage.
8. Optimizing Rendering with Flipper Datasource
Now we have a longer pipeline for filtering, starting, and checking updates within the visible window before rendering. We've generalized this approach in the Flipper Datasource packet, which handles updates, inserts, and removals using a pipeline of operations. By transforming the initial operation and applying sorting, we can easily handle different types of operations. With the Flipper data source package, we can refactor our example to remove custom logic and rendering, and instead use the data source to handle events and update the view configuration when the filter or sorting criteria change.
So, now we have like a longer pipeline, we filter, we start, check if the update actually appears within the visible window, and only if that's all true, then we cause React to render. So, we did now add a lot of logic, and it was just for appends. How do we deal with updates? How do we deal with inserts? How do we deal with removals? And the solution I've showed you so far, we've generalized in a packet. It's called Flipper Datasource. It's maintained for now inside the Flipper repository on GitHub. But you can use this packet right away. And the basic idea is that actually all the other operations basically follow the same pattern. An update or a removal is not significantly different from an insert. Those four steps are basically a pipeline that's where we describe the initial operation. For example, appending an item. And then we apply sorting and we just transform the description of the operation. So an append operation becomes an insert at the different position with the very same item. And then we also have a reversal operation in case we're reverse sorting. And then we decide whether this operation ends up in the window. So first we transform the operation without actually thinking about rendering at all. And the other operations are also really easy to express in these very same terms. They go just through the same pipeline. And so let's refactor our example to leverage the Flipper data source package. So we remove all our custom logic, our effects. We still have the same view criteria. The user has still the same features. We're just going to wire it up differently. So we don't need React Virtual anymore. That's abstracted away. We remove all that rendering stuff, which is over here. There we go. And then we just import a few things from the packets. So we import the create data source, which creates the data source and we keep a reference to it. And the only thing we do when a new event arrives is we call append, and that stores the event in the data source. Then whenever the filter on sorting criteria is changed by the user, we just update the view configuration of the data source.
9. Optimizing Rendering with Flipper Data Source
We pass the data source to the data source renderer virtual, which results in the same functionality as what we built so far in this demo. The performance is excellent, even in a debug build of React. The CPU usage is minimal, even with a high volume of records and filtering. The Flipper data source package makes the demo performant and is used to power critical views in Flipper. It has potential for future use in powering charts.
That data source, we pass it to the data source renderer virtual. And with that, we basically have the same thing as what we built so far in this entire demo. It's just now a few lines of code.
So let's put this one to the test. Again, we start. We start tailing. Making it hard. And we start filtering. And actually see that, like, the performance is really good now. Even though this is only a debug build of React, it's easily keeping up CPU-wise.
So this is slightly more efficiently implemented as what we did so far. But now our CPU is barely breaking a sweat. Even though we're adding almost a thousand records per second and we're sorting and filtering them. And to make it even harder, we can start off with a large dataset. So instead of 10,000 items, let's start with 100,000. Again, we did the same thing. And what we will notice is that, this makes it a little bit more expensive. So CPU usage is slightly higher. But still, it's keeping up nicely. So we're not maxing out on CPU and JavaScripting. So it means that the rendering, the frame rate, stays stable at 50 frames per second.
So basically, this is all that was left to make this demo performant. And that is what Flipper data source package does. It's not perfect yet, but it works really well in those cases. We use it to power all our critical views in a flipper. You can just look at the source code. It's on our repository. And there are some future ideas. Like, the very same mechanism could be used, for example, to power charts. There's a small demo of it.
Q&A: Incremental Sorting and Dynamic Row Height
But there's a lot of possibility. Because like many visualizations, follow this filter sort and window. So, give it a try. Let me know what you think. If you have any questions, feel free to ask. Thank you. The first one is, how does incremental sorting work with nonlinear data? The incoming data doesn't have to be sorted at all. Like, you can sort on as many criteria as possible. As long as you have a stable sorting function, that works. Perfect. Another question is how does the virtual list handle dynamic row height? It measures and caches the measurements to offset the next items. That works in practice really well. Thanks for the great talk.
But there's a lot of possibility. Because like many visualizations, follow this filter sort and window. So, give it a try. Let me know what you think. If you have any questions, feel free to ask. Thank you.
That was an amazing talk. Thank you so much. Give it up for him again. All right. So, we have some questions that came in during the talk and I'll just ask you and then what we'll do is at the end, you'll pick your favorite question and that person can get a T-shirt. So, I'll just go in order of the way they've been upvoted so far.
The first one is, how does incremental sorting work with nonlinear data in terms of having to recompute the sorted data set, an incoming unsorted set of dates, for example? So, let me see if I understand the question correctly. So, how the sorting works is that it's based off a comparison function where you can, like, have a stable sorting output. For example, there can be a date indeed. And it's only compared relatively to other items. So, the incoming data doesn't have to be sorted at all. Like, you can sort on as many criteria as possible. As long as you have a stable sorting function, that works. Perfect.
And another question is how does the virtual list handle dynamic row height? That's a really good question. I've read the source code of it a while. So, what it does, it runs out the items, then measures them and caches the measurements and uses that to offset the next items. But because, like, the height is memorized per row, at least as long as the width doesn't change, that works in practice really well. You have to give a heuristic function for edit base height. But then that starts adjusting. Funnily enough, that's not really noticeable. Awesome. And another one. Thanks for the great talk.
Q&A: Flippa Data Source and Sleepless Nights
Does the Flippa data source have an API for virtualization in two dimensions? No, it currently only supports one-dimensional virtualization. Adding downsampling to the abstraction would be useful for plotting charts. Auto-zebra-rendering can be achieved using the array item index. There weren't many sleepless nights during the development of Flipper. You can download Flipper and try it out. Thank you for your questions and feel free to continue the discussion in the designated rooms.
Does the Flippa data source already have an API for virtualization in two dimensions, for example, for grids? No. It currently only does one dimensional virtualization. And also what I think is really interesting to add in the future is adding downsampling into the abstraction. So that will make it really useful to plot charts if we could dynamically downsample the events that come in.
Cool. And have you tried optimizing auto-zebra-rendering? So, for example, dark and light rows one after the other. No, I didn't give that a try. But that should not... So what you get back from useVirtual is just an array with items. So you can use the array item index for zebraing, instead of associating it with the data. That should work fine. Unless if you really care about the specific row is always gray and the other one's always white. But I don't think that's usually the case.
Nice. Now, there's so many questions, we won't be able to get to them all. So I'm just gonna do this last one to wrap us up, because this is one of my favorites. How many sleepless nights were there, getting to this end solution? Actually, not that many. The reason is, I wanted to build this years ago, but I never had a solid excuse, until I started working on Flipper. And so Flipper, to answer your question, yes, we use it in real life in Flipper. So you can just download Flipper and give it a try. So it didn't give that much sleepless nights. Cool, thank you so much. Now I know there were more questions, so you can hang out in the in-person speaker discussion room, which will be around the corner just outside the door, and if you're in a remote audience, you can go to the spacial react advanced, that's bit.ly forward slash spatial react advanced, to join the conversation. But you've got a big decision to make. What was your favorite question out of all of the ones that you saw? That would be the one about the sleepless nights. Sleepless nights. That's so nice of Gareth. Such a good question, right. So Dave or DJ Amy, you're going to have a t-shirt. If you're here in person, just come over to the front. If you're online, we will contact you.
Comments