Hello, everybody. My name is Marco Locher and I'm part of the GraphCDN team. If you're interested in GraphCDN or caching GraphQL in general, I hope you didn't miss Max's talk on how to etch GraphQL APIs earlier today.
In this lightning talk, I'd like to give you some pointers on how your schema can help when caching GraphQL APIs, both via document caches like GraphCDN, but also normalized caches like they're implemented in clients like Apollo, Client, or Urcl.
The very first item that I'd like to talk about might seem very obvious, but having ideas on the types that you want to cache and having a consistent naming for those fields is quite important. We've seen a couple of projects from our own users where changing those had a ton of impact. If you can stick to ID or underscore ID, most clients and related GraphQL tooling will use those fields by default. However, if you can't because you're already using other names for those fields in your legacy APIs or it's not easy, you can configure your clients to use the appropriate fields. It is very important to make sure that that configuration is accurate. On GraphCDN, for example, you would configure those fields as so-called key fields, and they define how you can find those objects in the cache again, and more importantly, how you can purge them if you make modifications on your backend. Similarly, we would recommend that you stick to globally unique IDs in your application, for example, using UUIDs or something similar, but if that's not something that you can accommodate, there are workarounds for that as well that most clients implement.
The second item is keeping your types consistent. When working with our users, we have seen schemas where types are duplicated to add a single field. However, they were otherwise identical and they were representing the same data. This will make your cache less efficient as it will now need to store the data twice, depending on whether that extra field is present or not. Similarly, if you want to enrich the data with metadata that might only be required in a very specific context, thinking about search results, for example, where you want to show the search string in a highlighted way, we would recommend implementing a concept like the cursor connection specification for that metadata, instead of extending your type with fields that are only required in a very specific context. Even though the cursor connection spec is mainly aimed at handling pagination, it lends itself very well to being extended for other similar use cases like the one that I talked about just now.
Very important, as well, is to make sure that your cache is aware of your schema. Every cache will offer some functionality, whether it's aware of the schema or not. However, if you make your cache aware of the schema of your data, it will unlock additional functionality that wouldn't be possible otherwise. It can make smarter decisions based on what types are returned by your queries, something that is especially important in the context of fragments or when you're using interfaces. Having your cache aware of your schema will also allow it to return partial results based on already cached data if the missing fields are designated as optional. And while your app is already displaying some information to the user, the cache fetches the missing fields in the background. Without that knowledge, that would have been a query that would have been returned by your API directly and the cache would not have been involved with that at all.
And lastly, if you're using a document cache like RefCDN, in some cases, you might actually be better splitting your queries instead of submitting just a single one. I know GraphQL is known for its flexibility and for the fact that you can customize each query to get you exactly the data that you need. But in some cases, splitting queries and having more than one might actually be beneficial. For example, let's take a look at a query that fetches a list of the most recent articles from a blog as well as a list of recommendations based on the currently logged in user. A document-based cache like RefCDN, for example, takes a look at the whole response that you get. And if it is tied to a specific user, then it would only be able to use that cache data for that specific user in the future again. However, if you split that query into a public part and a more private part, the public part can be reused for every single user, no matter whether they are logged in or not, no matter where they're based. And especially when working with both a document-based cache and a normalized cache as part of your GraphQL client, the client will not be impacted by that either, since it will already have most of the data or even all of the data required in its local cache and would not require a round trip to the server at all.
Thank you very much for listening in. I hope there were some valuable takeaways for all of you. If you have any questions, I'm happy to answer them during the Q&A session or you can ping me on Twitter as well.
Comments