Video Summary and Transcription
Today's Talk focuses on GraphQL caching and improving performance using Fastify and Mercurius. The experiment involves federated services, resolver caching, and load testing with AutoCANON. Enabling caching with a 0 second TTL can increase throughput by 4 times. The AsyncCacheDedupe module allows for efficient caching and avoiding unnecessary computations. Redis pipelining has improved requests per second by 100 times. Cache invalidation is an ongoing topic of development.
1. Introduction to GraphQL Caching
Today, Matteo Collina will talk about GraphQL caching and how to improve the performance of your GraphQL gateway by four times. He will use Fastify, one of the fastest web frameworks for Node.js, and Mercurius, the GraphQL adapter that runs on top of Fastify and integrates with the GraphQL JIT library for faster query execution.
Hi, everyone. I am Matteo Collina, and today I'm going to talk to you about GraphQL caching. Before we start, please follow me on Twitter at matteocollina. You can find it on the slide, so hey, here I am. I talk a lot about Node.js, JavaScript, GraphQL, open source, all the things, so I don't know, you might find it interesting.
So today we are going to talk about GraphQL, before we start, though, oh, one more thing. I follow my newsletter, Adventures in Node-Land, node-land.dev. Who I am? I'm Matteo, I'm part of the Node.js Technical Steering Committee, and Chief Software Architect at a company called NearForm. Check us out, we are hiring, doing a lot for GraphQL, so if you want, it's a good company.
Anyway, going a little bit further, back when I was a kid in the 90s, yes, I am telling how old I am, I was really, really, really impressed by this show by David Copperfield, and I don't know about you, but I always wanted to be, I was fascinated by magic, right? So, and, you know, how to make things disappear, how to make things, you know, fly, whatever. It's very, very interesting and I found them very entertaining, the shows. And in fact, there is a lot of hard work behind magic, right? So in this talk, we are going to talk about magic and we are going to make things disappear. So in fact, we are going to apply magic to GraphQL. So we are going to show how to improve the performance of your GraphQL gateway by four times. How? By making things disappear.
So, how? Well, let's talk a little bit about the tool of the craft. We need tools, right? We need things that we're going to use for this little demonstration. So first of all, we are going to use Fastify. Fastify is one of the fastest web frameworks for Node.js. It's very easy to use. It's similar to Express, but more modern, faster. It has more features. All things that you will need. It's great! Check it out. We are going to use Mercurius. Mercurius is the GraphQL adapter that you can run on top of Fastify. It's cool. Mercurius offers a few interesting features that makes it unique. First of all, it integrates with the GraphQL JIT library, so that we can take your query and do just-in-time compilation for your query, so that it can execute way faster. And so on.
2. Tools, Experiment, and Magic
The tools discussed include a library called AutoCANON for load testing in JavaScript. The experiment involves two services federated by a gateway, offering user and post objects. The just-in-time compiler and cache module will be used to enable resolver caching. The service is a simple user object with an ID and a name. Live load testing will be performed on dedicated hardware.
It also does a little bit more things like that for performance and speed reasons. It's great. So check it out. Oh, it also supports the full federation, both as a gateway or as a microservice.
So last tool of the craft is a library called AutoCANON. AutoCANON is a tool that I wrote long ago to do some load testing. And you can use this to skip things in JavaScript. So it's to skip load testing in JavaScript. It's great. I use that a lot of times. So these are our tools, right?
Okay. So we're going to use these three things. So let's talk a little bit about our experiment. We have two services that are federated by a gateway. And one offers the user object, and the other one offers the post object. And we are going to use the just-in-time compiler, and we will enable the cache for the resolver depending on our algorithms. So we can run multiple experiments, right? You can see it here. You can run multiple experiments. And see the impact of this cache module, what does this module look like? So let's see where things disappear or reappear.
What's the service? Well, this is an example of the service. Literally, it's a user object that has an ID and a name. Very simple, okay? It's nothing special here.
So it's time for some magic. Are you ready for the magic? Let's make things disappear. So how? Well, let's go back into our terminal. So this is connected to my server. So it's running on dedicated hardware. So I'm going to do live load testing. Oh, wow.
3. Mercurius Cache Repo and Experiments
In the Mercurius cache repo, we have benchmarks, gateway services for user and post data, and experiments using AutoCanon. Running the script without caching does 3000 requests per second. With zero second TTL, it increases 4x. Let's explain this further.
Oh, wow. So let's look at my repo. All of this is in the Mercurius cache repo. And we can see that we have our benchmarks and this is the gateway that I just showed you. And we have our gateway services. This is the user and this is the post. Note that these services are all serving the data from memory. So there's no databases involved. They're really fast.
And this is our bench. So how do we benchmark things using AutoCanon? So basically we require AutoCanon. And then we have our query, and we send our query as a body with 100 concurrent connections. That's it. And then we do several experiments using our bench script. So in our bench script, you see that we are running all the services plus multiple stuff. Multiple example, one with no cache, one with zero seconds time to live and one with one second time to live and one with 10 seconds time to live.
So let's run this script. So first of all, this is our basic, this is our control check, right? It's a gateway mode. Something where we are not going to cache anything. So we have done this, and, whoa! It does 3000 requests per second. Okay, seems fast or not, depends on what you want to do. Latency is good though. So I'm pretty happy with the results. Now with zero second TTL, whoa! How? It just bumped 4x. I'm not doing any caching. It's zero seconds time to live. I'm just turning it on and it just does 4x. And, whoa! Still 4x. Like, how is this possible? Like, how does this work? OK, let's leave this running. Let's explain this in a little bit of a second.
4. Caching and Deduplication
Our baseline has a P99 latency of 80 milliseconds, resulting in approximately 3000 requests per second. By enabling caching with a 0 second time to live, we can reduce latency to 18 milliseconds and increase the number of records per second and throughput by 4. The flame graph shows that the majority of time is now spent on caching, thanks to deduplication. The Node.js event loop diagram provides insight into the execution flow and the blocking time between C++ and JavaScript, which is utilized for deduplication by computing a cache key.
So our baseline has a P99 latency. That's what you want to measure for latency of 80 milliseconds. While a request per second, it gives you more or less 3000 requests per second.
However, I can also create flame graphs. What is a flame graph? Well, this is a representation of where our CPU time is being spent. More specifically, all that time is being spent in doing HTTP requests. By the way, if you have not seen my talks about undici and Node.js, please check it out because you can speed up quite a lot your HTTP calls.
But the result is that the vast majority of the time is spent doing HTTP. So, well, what can we do? We need to reduce the HTTP. Yeah, how can we improve this? Well, just by making it a 0 second T time to live, we can just reduce the latency to 18 milliseconds and multiply it by 4 the number of records per second and the throughput. Whoa! This is quite an improvement for not having any caching at all. Zero caching. It's not caching at all. We just enable the cache. Well, and if we enable the cache it does not improve much.
Okay, so how come it's possible? Well, this is the flame graph of our gateway now. And as you can see in the middle, the HTTP request that was there before is gone. And now we have in the middle a huge block of of time being spent doing the caching, okay. So literally now the bottom is the caching system. So, but where did the HTTP call go? Like where did it disappear? Well, what we are doing, we are doing deduplication, which is the clear strategy that will make things incredibly faster, especially on the graphical side.
So, let's go back and talk a little bit about the Node.js event loop. You probably have seen this diagram about Node. This is great because you have seen the request comes in and you know you have an event, it goes into the event queue, it goes processing, and then this generates more asynchronous activity. What you have not seen is this diagram. This diagram, it's a different presentation of the exact same event loop. However, it shows it from the point of view of the JavaScript function being executed. So, when the event loop is running, it's waiting for something, it's waiting, right? This is, on the left and on the right, the event loop is waiting. Then, when an event happens, it calls into C++, it calls into JavaScript, which typically schedules some next-tick or some promises, then go back to C++, which in turn kicks off the promises and next-tick execution, and finally, once all of that is done and settled, it goes back and relinquishes control to the event. All the time it takes in between, though, from this starting point of the C++ to the end of it, it's the time where the event loop is blocked. So, in order to do the deduplication of the request, what we are doing is, when we are receiving our solver being executed, we can compute a cache key, okay? And with that cache key, we can create a matching promise.
5. AsyncCacheDedupe Module
The AsyncCacheDedupe module allows you to compute the same cache key for a resolver and avoid executing it multiple times. It automatically caches the results and provides a fast and efficient way to avoid unnecessary computations.
So, and then we can, you know, complete our execution, right? However, when a follow-up execution comes in, we can compute the exact same cache key and get the promise that we put there before, which might be still pending. However, we don't need to execute the same resolver two times. We can only execute them once, right? It's pretty great. We can avoid a lot of computations this way. This is what this module does. It's called AsyncCacheDedupe. You create a new cache where you go and define some methods on it and that are asynchronous, and then automatically it caches the results. And we can have, you can have a TTL, but it automatically dedupes and caches the result. It's phenomenal and it's really fast. So you can use this in all the other places where you want to use the system, right?
6. Implementing Resolvers and Caching
When implementing a solver in Node.js, you can use four arguments: root, arguments, context, and info. By combining the resolver anatomy, the info object, and other parameters, you can compute a cache key for each GraphQL resolver. However, in-process caches are problematic, and using Redis as a shared cache between nodes can lead to performance issues.
When you implement a solver in Node.js, you can have four arguments. You have the root, you have the arguments, you have the context, which you know, the root is the current object, but then you have the arguments for the solver, the context which can include your Node.js request, response, database connections, all things, and then the info object which includes the definition of the query that you are computing.
Well, take that into your mind and just wait for a second. Now, what you can do now is use this, you can create, you can take an arbitrary object and JSON-ify it, right? You can call JSON stringify it. If you do that, depending on the order of the properties, you will get different JSON. However, there is a module called save stable stringify, which independently of the ordering of the properties, it will always generate the same JSON. So, what we can do is we can use this module and combine it with the resolver anatomy, the info, the data on the info object, the root and all those things to create a cache, a hash key for an arbitrary hash key for that specific resolver.
Now, how is it implemented? Well, what you can do, as you can see here, is pretty simple. We navigate the info object to get the selection, the current field selection, and then we create an object including the current resolved object, the arguments, the fields, and some more parameters. It's pretty great, you see. We can compute a cache key for each GraphQL resolver. So, this is what we call the zero-second TTF. We are deduplicating all the resolver accessing your data.
Adding some caching is not improving much here because the target services are mostly very easy. They don't require a lot of traffic. Okay, sorry. They don't require a lot of CPUs to compute. They don't have a database. They don't have nothing. However, these adding more caching will change in case you need more. Adding more time here will improve your performance if the target services are not fast enough or slow or something. Well, all of this is very good, right? But in-process caches are problematic. So we can't really increase the time too much because it's all in process, right? So if it's all in process, if the data expires on my node, it's not expiring on the other node. So how can we implement that? Well, you know, one of the good solutions is to use something like Redis to implement a shared state, a shared cache, between all the nodes. Yeah, but we tried that, and we implemented it, and it did not work. And it did not work, well, mainly because, you know, we have in our benchmark, a hundred graphical queries per second, which each one were invoking 20 resolvers. And this turns around that, you know, if you want to fetch that data from the cache, this is two hundred to two thousand Redis Gets per second. And unfortunately, at the round trip time of Redis, the round trip time is 0.5 milliseconds. But the actual round trip time is 15. So, can't do much.
7. Redis Pipelining and Performance
We have solved the problem of head offline blocking with auto pipelining, a technique that batches multiple commands into one Redis pipeline, reducing network round trip time. This logic in production has improved requests per second by 100 times and expanded Redis by 15 times. Redis handles the traffic without any issues. However, naming things and cache invalidations remain challenges.
So, can't do much. We need to parallelize this Redis Gets, right? So, maybe you can use a connection pool or, I don't know. Well, there is something better. Well, I actually solve this problem already. Yes, it's with this figure.
Anyway, check out this talk that I did at RedisConf 2021. Explained how to solve the problem of head offline blocking with auto pipelining. So, basically, it's a technique that we have invented to... Well, we have applied to the Redis client that enabled batching of multiple commands that happens in the same event loop iteration into one single Redis pipeline. So, that we are sending them as a batch, making sure that we actually cut down the round trip time, the network round trip time on the server. It's great, and it works beautifully, and you can have to really speed up your Redis access. But this is actually the same thing that we are doing before with the sync. So, it's turtles all the way down. Happy days.
So, we have all of this logic in production. So, it's important to say that these code in production is giving us an improvement of 100 times in terms of number of requests per second. And it's having 15 times expansion factor on Redis. So, for each complex query that we receive on average, we are doing 15 Redis gets with different cache keys to verify if things works as we would like to. It's pretty great, right? But it's also quite scary. By the way, Redis is not even flicking an eye, it's not even blinking an eye. It's perfectly fine with all this traffic, so we don't care. Redis is amazing, by the way. Go use Redis. More Redis for everybody.
So, those are our real-life stuff. This technique has been a phenomenal lifesaver recently. We were able to handle a huge peak of traffic without even blinking. So, yeah, check it out. It's great. However, there are two things in computer science, right? One is naming things, and the other is cache invalidations.
8. Cache Invalidation and Conclusion
We haven't discussed cache invalidation, but it's a fundamental topic. Although I've run out of time, we are actively working on implementing this module. Soon, you'll be able to invalidate the cache locally and on Redis. Stay tuned for updates on Twitter and my newsletter. Thank you for watching!
Oh, come on, okay. That's really bad, right? Because we haven't talked about how we invalidate the cache, and this is one of the fundamental topics. However, we are almost at the 20 minutes mark, so I've run out of time. So, I'm not going to cover them in this talk.
I'm joking. We have not finished working. We have not finished the implementation of this module. But we are actually working for this. So, in reality, we'll be adding them to AsyncCache.JDube soon. So, you'll be able to invalidate the cache both locally and on Redis sooner rather than later.
So, check it out, because we are going to watch on my news on Twitter and on my newsletter, because there will be some good announcements in the coming weeks. So, with that, I just wanted to say thank you. As I said, I am Matteo Collina. I am Chief Software Architect at NearForm. You can find me on Twitter, at Matteo Collina. Please ask me any question you want on Twitter, and I will be very happy to respond as soon as I can. So thank you for watching this talk.
Comments