Video Summary and Transcription
The panel discussion focuses on the durability and scalability of GraphQL APIs. Testing GraphQL APIs is particularly challenging due to the infinite possibilities of queries and contextual information used by resolvers. Effective strategies include integration testing, observability, and unit testing. GitHub handles GraphQL API scaling by using a custom implementation of data loaders and server-side application-level caching. Netflix addresses traffic spikes through horizontal scaling, server-side throttling, and client retries. Effective caching strategies for GraphQL APIs include persisted queries, data loaders, and application-level caching. Error handling in GraphQL can be improved by translating errors to HTTP status codes and using the 'errors as data' approach. Recommended resources for building reliable APIs include 'Production Ready GraphQL' and materials on distributed systems and site reliability engineering.
1. Introduction to GraphQL API Durability
Today, we have an exciting panel discussing the durability of GraphQL APIs and how to ensure they can scale without issues. Mark from GitHub, Mandy from Apollo, and Tejas from Netflix will share their experiences. Let's start with the importance of testing in maintaining a reliable GraphQL API.
♪ ♪ ♪ ♪ Thank you for joining us today. We've got a really exciting panel. I'm very excited to be joined. There's a lot of excitement, in case you can't tell, by these wonderful folks who have been using GraphQL for quite a long time in a lot of different environments and just have really kind of pushed GraphQL really to the edges of what's capable. And so, you know, when we talk about GraphQL and dealing with GraphQL at scale, one of the things that we don't really talk about too often is kind of the durability of GraphQL APIs. And what does that really mean, durability? Well, it's kind of like the SRE-type focus on graphs, like how do we keep graphs up and how do we make sure that they're able to scale and that we're not going to encounter issues? So that's going to be a lot of what today's panel is going to be about.
I'm going to turn over the floor for a quick sec just in case anyone had some follow-ups on introductions there, if you had anything you wanted to add about what you're currently doing and kind of how you're working in the GraphQL space today. So I'm just going to go in order of how I see. So Mark, that would be your up first. Yes, sure. Thank you for having me on. I think as an introduction said, I work at GitHub, work on the API team. So we've got a set up that's not the most common for GraphQL where we use it as a public API for a third party. So yeah, I'm just excited to be chatting about this with this context in mind. Awesome, cool, and thank you so much again for joining us. And then Mandy, I've got you in the next window over. Hello. So I'm a Solutions Architect at Apollo, which means I work with a lot of our enterprise customers and see the kinds of interesting challenges they bump up against using GraphQL at scale everyday. So I'm very excited to be a part of this panel too. Awesome, cool. Thank you so much. And Tejas, that brings us to you. Yeah, sure. I'm a Software Engineer at Netflix, and I'm actually working on the API team at Netflix, and we are currently building GraphQL for our studio ecosystem. Awesome, very cool. So as you can see, we've got a wide range of focus here from some of the top GraphQL-consuming companies out there in the industry, and yeah, Apollo's own Mandy. So all right, thank you so much for joining us. And without further ado, let's go ahead and get into some of these questions. So I'm going to start it off with the first one, which is just like – it's actually pretty hard to know that a service is reliable without testing. So what types of testing do you find to be the most important when it comes to keeping a GraphQL API up and running smoothly? And I guess, yeah, I'll just lead off with Mark.
2. Testing GraphQL APIs and Ensuring Smooth Operation
Testing a GraphQL API can be challenging due to the infinite possibilities of queries and the use of contextual information in resolvers. It's important to extract logic outside of the GraphQL layer for better testing. Integration testing, observability, and unit testing are beneficial in maintaining smooth-running APIs. At Netflix, unit testing, integration testing, and end-to-end testing with and without mocking are used. Error handling and additional production ideas are also prioritized.
We'll just go through again, and we'll switch up the rounding later. Sure. Yeah, so this is an excellent question. I think it's good to acknowledge right off that testing a GraphQL API is kind of difficult by the nature of GraphQL, right? There's almost an infinity of possibilities of how people could query your graph, which is a feature, but also something that makes it hard to test every possibility. There's also the fact that your resolvers that are backing your graph often use contextual information from that kind of infamous context argument. So if you only focus on testing the graph itself, it's very hard to be confident in the backing logic behind the graph.
So I think my main thing, and I think I've said that a lot before, but I think focusing on extracting your logic outside of the GraphQL layer makes it so much easier to test that your domain logic is well tested. And then you can focus on testing the GraphQL parts separately. So we like to focus on more integration testing for GraphQL, test as many different queries as we can that represent client use cases, and test all our middleware layers, so our rate limiting. We'll talk about this more, I'm sure, about everything that's with error handling on the GraphQL side, so that would be my advice.
Yeah, awesome. Thank you. Mandy, you have a follow up to that. What areas do you find that like testing to be the most beneficial in helping APIs stay running smoothly? So in terms of testing, there's another way you can think about it too with GraphQL APIs where because they're evolutionary in nature, you make sure that as you're releasing new features on your graph, that you're doing so in a way that doesn't cause breaking changes for existing clients, so your observability tooling is a really important part of that story. So what that means is in practice, you'd probably be collecting operation traces and making sure that your clients identify themselves when they're using your graph, so that when you push those changes, you can check against those operation traces and make sure that you're releasing changes to your graph in such a way that's not going to break queries that are being currently made by existing consumers of your graph.
I love it. Yeah, that's a whole other avenue to think of, which is like aside from the actual nuts and bolts testing is like the observability, seeing what's happening in an existing system and using that as a baseline for testing as well. It's really cool. Tejas, you have a followup? How do you all handle testing your APIs in Netflix? Sure. Sorry. I'm going to reiterate some of the same things Mark and Mandy mentioned. I really like unit testing for the code that is inside the logic, inside the resolvers and data loaders. It's nice to kind of separate that out. Integration test is really great for testing context passing between the parent and child, data fetchers or data loaders, etc. And then smug test, we use a lot, actually, and we find them extremely useful for end-to-end testing of your GraphQL queries. And these we do both with and without mocking. We really find that super useful. And then error handling is another one. You want to trigger error scenarios using mocks because not all the time you're going to be able to test that behavior well and test it end-to-end. And then for production, we have two other ideas I can share that we like to use.
3. Handling GraphQL Failures and Preventing Downtime
We like to replay actual read queries from production and use canarying as strategies. It's important to focus on testing error paths and understand the error state when tests fail. Tejas will discuss how to handle GraphQL failures and prevent downtime.
We like to replay the actual read queries from production. This doesn't work well for mutation, but it at least gives you a little bit of extra confidence that they're always up and running. And then also canarying. If that works, subset of traffic before you promote a new code to product. So those are some strategies we use.
Yeah, all make a ton of sense. I love the focus you put on testing error paths, right? It's like oftentimes we might find ourselves getting into traps where we're testing the happy path and making sure that works, but doesn't really explain what happens when... like the scenario of when that test fails. What is the error state and what will we experience there? So, yeah, it's also very important. I actually... so this brings me into a really good next question, and so then we'll actually... I'll lead up with you, Tejas, if you don't mind. So how do you handle situations when, let's say, something was able to slip through testing, didn't catch a scenario, and you're having GraphQL failures? Or how do you prevent that? Like when a system starts encountering errors, how do you prevent too much downtime? I think you might have muted it again.
4. Handling GraphQL Failures and Downtime Prevention
When encountering GraphQL failures, it's important to focus on reducing downtime and quickly resolving issues. This can involve actions like rollback or bouncing server instances. Great observability, distributed tracing, and metrics are essential for identifying problems and taking appropriate action.
So how do you handle situations when, let's say, something was able to slip through testing, didn't catch a scenario, and you're having GraphQL failures? Or how do you prevent that? Like when a system starts encountering errors, how do you prevent too much downtime? I think you might have muted it again. Yeah, so when APIs go down, we want to focus on reducing the mean time to resolution, so we want to be up and running quickly again. This could include what is the fastest action we want to take to make the systems come back online? This is maybe a rollback, or it could be bouncing the server instances if there's some issue with the system itself. To do that, we need great observability to our system and distributed tracing, and metrics are key here for us. Let's say we assume the issues in GraphQL server, at the end of the day, they are just regular servers, and we can apply the same practices to them. So we want to have everything from a memory leak to running out of threads, all the metrics there so that we can actually quickly track down what's going on and take the appropriate action based on that.
5. Strategies for Resilient GraphQL APIs
To prevent future failures in GraphQL APIs, it's important to implement strategies such as avoiding excessive logic in the GraphQL service, utilizing federation at the gateway layer, and using GraphQL-aware IPC metrics. Another useful strategy is functional sharding, which involves separating instances for different types of operations and callers. Separating subscriptions from queries and mutations also helps with scalability and resource requirements. Distributed tracing is crucial for troubleshooting system failures in GraphQL, especially in distributed systems. Whether it's a monolithic API or part of a microservices architecture, GraphQL is often integrated into distributed systems.
And then to prevent some of these from happening in the future, GraphQL API for most people, they tend to implement as a one graph. So it can be a single point of failure for your business. It can, when it goes down, it can cause serious problems. So we want to not do, if possible, the three strategies I'm going to cherry pick from.
One, we don't want to do too much in the GraphQL service. It's meant for data fetching and data loading and try to avoid as much logic. So the surface area for failure decreases. And then for federation, which we do at Netflix, we can apply some of these tactics at the gateway layer because the federated services themselves don't have as much as an impact on the overall graph, assuming they're spread out. So that's a really helpful to make your service more resilient. And the one we have found extremely useful is GraphQL-aware IPC metrics. So generally, in GraphQL, you have different kind of errors, but they all show up as 200, right? Most of the time, your response is going to be 200. So you want to be able to take those 400s and 500s we had in REST and kind of set a different status and image that status to our metrics server. So we look at the introspective response that we are sending back and emit custom metrics based on what the error might be in the errors block or the data block. And then one last strategy I'll share. You know, we can go all day about this, but another one is, you know, GraphQL is a stateless service. So another strategy to improve resilience is functional sharding. We tend to use that across a lot of places in Netflix and, you know, basically, for example, we want to have a separate feed of instances for something like subscriptions versus, you know, querying and mutation so that you can actually kind of separate because they have different semantics on the server. You know, subscriptions is long-lived connections while querying and mutation might be quick-running. So you can separate failure based on that. And also, you might also apply this tactic to different kind of callers. For instance, your users who are high-priority users versus maybe some kind of internal backend application calling the app. So those are some strategies we use.
That's really cool. And another thing, too, that really helps, like at least separating subscriptions from queries and mutations is just like a scale thing. Like you said, you know, WebSockets are long-lived connections and have different requirements as far as hardware goes. You know, you need more memory and stuff like that than you would for most of your query and subscriptions things. So just by separating those might allow some independent scale as well, which is nice. So, yeah, moving on. Mandy, what do you think? Like, what are some of your preferred methods for, one, either dealing with failed systems or, like, making sure that you're in a place where they're not failing for at least too long? So just building off of what Tejo said around distributed tracing, I think this is really key because it's hard to troubleshoot some kind of system failure when you're just taking wild shots in the dark. And one of the things about GraphQL is that it is very often incorporated in some kind of distributed system, whether it's a monolithic GraphQL API in front of a bunch of microservices or if it's part of your PFF strategy or if you're using federation.
6. Handling Distributed Traces and Traffic Spikes
Having the context of distributed traces in place can help shorten the meantime to recovery for any particular failure. It's important to observe external calls that are coming out of GraphQL and not just timing GraphQL resolvers themselves. Handling large spikes in traffic requires observability tooling to identify emerging patterns and provide alerts for unexpected events. Being alerted to spikes in traffic allows for proactive measures to be taken.
There's probably some number of services in a service graph, and if you're trying to troubleshoot what's gone wrong at any given moment, trying to do that just using system level logs within the context of a given service or high-level metrics probably isn't going to get you where you want to go. So, having the context of distributed traces in place can really help shorten the meantime to recovery for any particular failure in your system.
Yeah. Absolutely. Awesome. And Mark, do you have anything you want to add on to this? What does GitHub do when it comes to preventing downtime? Yeah. I'll say tracing is absolutely great. We use that as well. I think one thing I'll add that is a little tricky even with tracing is that I think for us it's very important to be able to tell what kind of pressure GraphQL exerts on our systems, especially data stores. So, even... Especially with things like Data Loader that might fetch data from a different context in a specific field. It's sometimes you can look at a field and see that it's resolving very fast, but maybe it's enqueuing a lot of data to be loaded later. And if you don't observe this well, it can be really tricky to find the root cause of increased writes or reads on our database. So, for us, we noticed the thing that gave us the most bang for our buck was looking at external calls that GraphQL queries make and seeing if a certain query, for example, has started making 500 MySQL queries because something was not optimized or we're missing a data load or somewhere. So, I think, yeah, having observability into external calls that are coming out of GraphQL, even more than just timing GraphQL resolvers themselves, have been really helpful.
Yeah, I mean, that makes a ton of sense because with that I.O., eventually somewhere that has to happen. And so, if you don't have the right checks in place, the bottleneck might end up and it might affect other parts of the system, which is more tricky, right? Like, it starts affecting, creating latency for other requests, kind of like a red herring situation. So, yeah, that's really invaluable information. Thank you. Awesome. So, we've talked a little bit about kind of like keeping systems reliable and downtime. Let's take it to a different route and go a little more positive, which is, how do you handle like large spikes in traffic? Ideally, this is something kind of that we want, right? We want to see more traffic coming to our APIs. But, at the same time, especially in situations where you might not know that that traffic is coming, what are your preferred methods for handling that? And I'm going to start with Mandy this time and then just kind of jump around.
Well, I think you're probably going to start noticing a recurring theme in some of my answers because I'm a big fan of observability tooling with respect to GraphQL APIs. So, that's definitely a really important to have in place because it can help you identify patterns as they begin emerging rather than being surprised by something that you don't need to be surprised by. And also with those kinds of tools in place, you can configure them to give you alerts to when something unexpected is happening in your system whether it's around errors or a big spike in traffic and give you a heads up that there might be something that you want to deal with preemptively before something bad happens. I like that. So, kind of swooping in before that spike happened, just knowing, once you get outside of the bounds of normalcy for your API, being alerted to that so that you can react immediately. Yeah, something along the lines of an ounce of prevention is worth a pound of cure, right? Yeah. Yeah.
7. Strategies for Dealing with Spikes in Traffic
Tejas discusses strategies for dealing with spikes in traffic, including scaling horizontally, server-side throttling, client retries, and using an L7 proxy application gateway. It's important to prioritize a degraded service over no service at all. Mark doesn't have much to add.
I love it. Amazing. So Tejas, let's hear from you. What are some of your favorite ways of dealing with spikes in traffic?
Sure. Yeah. This happens all the time. It's impossible to anticipate problems or suddenly you launch a new service. For instance, if you launch Stranger Things, it's hard for us to anticipate how much more people can be involved in watching that show. So there are a few strategies that we have tried using in the past.
So one is, GraphQL server is usually kind of stateless proxy, like I mentioned earlier. So it's easy to scale it horizontally. However, we do have to ensure that our downstream service can handle that load as well. Obviously, scaling horizontally will work for the GraphQL server itself. If everything is set up downstream, then we can go ahead and do that.
The other one, obviously, server-side throttling and client retries are a great toolkit to have as a backup in case you really need to go down that route. Because it gives you that... It gives you a little bit worse experience for the user. But at least, you're still up and running. And that can be something that's a great one to use. And for like, let's say, the traffic is malignant. We have an L7 proxy application gateway that we can use so that it can really reject those requests upfront. So yeah, those are some strategies we use.
I like that. I really like how you touched on degraded service is better than no service at all. It might not be as snappy as you want. It might have to make a couple of retries but that's better than the entire system being unavailable. So that's really cool. I like that a lot.
What about you, Mark? You got anything you want to add here? I don't have much to add. Those were amazing answers from Mandy and Tejas.
8. Caching Strategies and Scaling GraphQL
Scaling horizontally and protecting data stores behind downstream services are crucial for us. Rate limits help us deal with timeouts and prevent services from crumbling. Caching plays a vital role in scaling GraphQL, and at GitHub, we rely on a custom data loader implementation and server-side application-level caching. Netflix focuses on consistent data and uses caching strategically based on factors like consistency, availability, and performance.
I think it's very similar for us. Scaling horizontally does wonders. I do think, again, for us, it's protecting those, especially data stores, behind downstream services. That's the most important. And that's a really good point that Tejas made. Returning a 422 rate limited to a client might be just as bad of an experience for them than a 500, but for us, it's the difference between dealing with timeouts and possibly our services crumbling so we can get back up way earlier by having rate limits in place. I like that. That's cool. That's awesome.
I guess we'll stick on this same topic here just a little bit more, which actually, if we're dealing with spikes in traffic, we're talking about being able to scale horizontally and downstream services being able to support that. And I feel like that's a conversation you can't have without mentioning caching, right? Caching is pretty much imperative all throughout your systems, but it kind of is a different story than maybe many folks are used to dealing with REST APIs. Only slightly, but different enough. Do you want to maybe touch, and Mark, I'll let you kick this one off. What are some caching strategies that y'all implement at GitHub and how does that allow you to scale GraphQL? Yeah, so we don't do anything super magic when it comes to caching. We rely heavily on our kind of custom data loader implementation to make sure we never repeat queries or avoid N plus one queries. I think we focus a lot on server-side application-level caching. So maybe we let teams decide. So if there's a field that's very expensive to compute that can allow for some staleness, we'll cache it there in a resolver. But generally, Data Loader does wonders for us. We can do HTTP caching, but it doesn't give us as much as with the REST API, for example. So we focus a lot on the application level caching on our application servers. Yeah. That makes a lot of sense.
Tejas, what about you? How does Netflix handle caching with GraphQL? Sure. Yeah. Currently, we are using GraphQL on the studio side of the ecosystem, which is highly ... It's an enterprise system almost. So we want to have extremely consistent data all the time for most of our workflows. I think deciding when to use caching, you need to ask a lot of different questions, especially on the server side. What is your appetite for losing consistency versus availability performance? Or are we using it to improve performance or fallback? On the streaming side, the consumer side of Netflix, we use tons of fallback logic because availability is of paramount importance.
9. Caching Strategies and Reliable API Resources
We use client-side caching techniques like the relays global object identification spec and subscriptions to ensure data consistency. Another lesser-known caching strategy is persisted queries, specifically automatic persisted queries, which allow for smaller requests and the use of CDNs. Shifting from POST to GET requests enables the use of HTTP caching mechanisms. For resources on building reliable APIs, I highly recommend Marc Andre's book, Production Ready GraphQL.
But we don't use GraphQL there yet. So maybe in the future, we might have better ideas around that. So on the studio side, we also do a lot of client-side caching when we find it super useful. We use the relays global object identification spec as a way to invalidate or as a way to re-fetch the data, just specific data that you have invalidated and subscriptions as a way to know that that data has changed. So those are some techniques we do, but mostly on the client side for the studio because the consistency of the data is so important.
Yeah, for sure. That makes a lot of sense. And Manny, I'm sure this is something you deal with quite a lot, working with a lot of different companies. And they all probably need different caching strategies, so I'd be curious to hear your answer. So I'd say one of my favorite, perhaps lesser-known ways, of caching in a GraphQL API is around persisted queries and the Paul's specifically automatic persisted queries. And what this does is essentially you send a hash of your query string as your request to your GraphQL API. And after it's been seen, once that query string, that hash query string, is cached, which means you're ultimately sending a much smaller request to your GraphQL API, which is a win in and of itself. But it also makes it possible and more feasible anyways to send your GraphQL requests in the form of GET requests, which, in turn, makes it more feasible to use something like a CDN with your GraphQL API. So that's one of my favorite ones.
Yeah, that's really cool. Yeah, because then by shifting from POST to GET, that opens the door for, pretty much, all the HTTP caching mechanisms that work so well for handling GET requests. And a lot of that just comes naturally at that point, right? If you're just using proper TTLs, that could be your caching, essentially.
For sure. Yeah, awesome. I love it. So, let's see, we're coming up, getting kind of close. But this one I wanted to leave for a little bit of time. This is one of my favorite questions to ask for any panel. And I'll start with you, Mandy. And also, like, shameless plugs are more than welcome here. But do you have any resources that you recommend for folks about building reliable APIs? Like, this is a topic that everyone can benefit from. Where would you recommend folks go to learn more about this topic? I think my number one recommendation would be Marc Andre's book, Production Ready GraphQL. That book is amazing. Everything you'd want to know about building a production-ready GraphQL API all in one place, condensed in one book, definitely go check it out. Awesome.
10. Resources for Building Resilient GraphQL APIs
Starting with reading about distributed systems and resiliency is crucial for building resilient GraphQL APIs. Tejas recommends exploring talks from conferences like InfoQ and Googling 'architecting for failure' for high-level insights. Additionally, the O'Reilly book 'Site Reliability Engineering' and resources on domain-driven design provide valuable knowledge on handling failures and working with distributed systems.
I love it. Yeah, Marc, let's follow up with you. You got anything? Thanks, Mandy. Mandy also has a book, so go check that out as well. It's great. I think besides – so as far as the panel goes, I think really important resources here is just reading about distributed systems in general and resiliency. I think a lot of what we've talked today has some GraphQL specifics, but also at its core is how to build resilient systems and distributed systems. So I don't have specific resources in mind, but I think if you want to read about something on how to make your GraphQL API more resilient, starting there would be amazing.
For sure. I love it. Thank you. Tejas, do you have anything you want to add? Yeah. I'm going to double down on what Mark said. I think the key here is whether it's a GraphQL API, REST API, the important part is you build resilient systems. My colleagues at Netflix have done – there's a whole resilience team and they've done some incredible – they've shared their ideas on conferences like InfoQ and things like that where there are some excellent talks there. And there's not just from Netflix, but other companies as well. I highly recommend Googling architecting for failure or something like going to the InfoQ website and finding some great talks there to learn. They usually tend to be high level, but there are some really neat ideas in there that you can bring back and apply to your distributed systems back home.
For sure. That's amazing. And I'll piggyback on this and add a couple of recommendations of my own. There's the O'Reilly book, Site Reliability Engineering. Again, it talks on a lot of these topics like about observability and understanding how to handle failures and best practices for dealing with these types of situations, understanding scaling. So it's a really good book. And it touches generally across like, you know, this is what I'm looking for. Distributed systems. There we go. Yeah. And the other one that I actually recommend, I think a lot of people kind of skip out on is, because it is distributed system, but like anything on domain-driven design can be really invaluable resources when wanting to work with distributed systems like this or larger scale systems when you need to have resiliency and there's a lot of connecting pieces. It's really good in helping you kind of understand those relationships between those given systems.
Handling Errors and Errors as Data in GraphQL
Handling errors in GraphQL requires careful consideration. Translating GraphQL errors to status codes can improve observability. Building custom observability for domain errors can provide valuable insights. The errors as data approach in GraphQL allows for better communication of errors to client developers. Using unions to express different states of the data response can be beneficial. Incorporating errors as data in the user interface can enhance the user experience. Sasha Solomon has given a talk on this topic. Tejas also suggests using errors as data.
So yeah, those would be my recommendations as well. It actually looks like we have a couple of minutes left. So I'm going to try and squeeze in a question or two from Discord. We'll kind of do like a speed round of questions here real quick. I think I can get one in. And so one of them is going to be, yeah. Okay. So this is a great one. What are some of your, and we touched on this just a hair, but like best practices for handling errors. It's a very important part of reliable systems, but like what are the things that you do to like make errors more usable within GraphQL? And yeah. How do you use it, take advantage of them? Sorry, Mark, why don't we jump in with you? Sure. I think Dave just mentioned something earlier. I forget what the name of the thing was, but to kind of translate GraphQL errors to a status code for kind of better observability. And I think that's one common mistake that if you don't think about, you can really get bit by is so if you do go with GraphQL specific errors, any existing observability systems you use might not be able to catch like errors like 400 in a REST API. So like validation errors, it's very hard to know when a client is having trouble with GraphQL unless you specifically watch for it. So I would say don't be afraid to build some custom observability for your domain errors and just errors that aren't necessarily like server errors. But hey, this client hasn't been able to I don't know what your domain is, but to check out or pay for something because it's hitting errors. Got it. Yeah, that's really good. Mandy, what about you how do you like to handle errors in GraphQL systems? Well, I think one thing that would be worthwhile to highlight here is just some of the interesting ways we can approach error handling with the GraphQL API that aren't necessarily available to us with other forms of APIs and specifically I'm referring to the errors as data approach, which can give us better ways to communicate errors to client developers. So rather than simply sending like a top-level error, we can actually use something like a union for instance and use that union to express different states of what the data response is. So this can be particularly useful when you're not dealing with something that's completely unexpected, like something that's a true error, versus something that might be expected, like a user sends a mutation update their username and that username is already taken. That's an expected state that we might anticipate running into at some point in our system. So one really good guideline I heard around this that I've really taken to heart is that when you're trying to engineer some aspect of the user interface around a particular error state, then that might be a good place to incorporate errors as data, rather than just relying on those top-level GraphQL API errors. This can really improve the experience for both the client developers working on your application, as well as how they can support users and communicate those errors to them in the user interface. Yeah, I love that. And I believe Sasha Solomon gave a talk about this particular topic as well. So folks, if you're interested more in that approach, definitely go check that out as well. Tejas, did you have a follow-up on that? Yeah. I was going to suggest the same idea using errors as data.
Federation and Contact Information
At Netflix, we use a combination of the errors block approach and errors as data approach in our federation. Both approaches have their advantages and disadvantages, so it's important to evaluate your data and make an informed decision. As we're running out of time, the panelists share their contact information. Mandy can be found on Twitter as Mandy Wise, Mark and Tony have a podcast called GraphQL FM, and Tejas is available on Twitter as Tejas26. Thank you to the panelists for their valuable insights and the great conversation.
The only thing I can add on to that is at Netflix we do kind of federation, right? So we have a lot of different domains, and some domains are using the errors block approach while some are using errors as data approach. So both can kind of live together, assuming there are certain parts of the graphs that are potentially mutually exclusive. But we found both advantages and pros and cons for both approaches, and, you know, it's best to look at your data and decide how you want to go with it.
Absolutely. And so this is kind of bringing us up on time. I believe we've got close to like two minutes left. So I'm just going to go around real quick. And, Mandy, one last time, where can everyone find you? Maybe if they have more follow-up questions or anything like that, where should folks look for you? The best place to find me would be Twitter. I'm Mandy Wise, and that's Mandy with an I on Twitter. Yeah. Feel free to send me a message there. I'd love to chat.
Awesome. And, Mark, what about you? What's the best way for folks to reach you? Twitter is good. I think one other thing is we have this thing called GraphQL FM. Tony Guetta from Twitch and I have this podcast, and we're actually having Tejas very soon to talk about some GraphQL stuff. So check this out. Awesome. I was going to shameless plug you on that GraphQL FM if you didn't do it yourself. So I'm glad you dove in there with it. And Tejas, where can folks find you? Yeah, Twitter is good. My handle is Tejas, my first name 26, so you can shoot me a message there or... Awesome. Cool. And you can find me on Twitter as Kurt Kempel. Also happy to answer any questions that you all might have. Once again, thank you to our panelists for joining us. This is a great conversation. I learned a ton already, invaluable stuff for the community. So thank you for coming on and doing this. We really appreciate you. Thank you so much. Yeah, and I think that's going to bring us up. We've probably got like a minute left here where we can all just kind of chitchat now because we can't really take any more questions. Thanks, everyone. Yeah.
Comments