English versionEN

Exploring the Data Mesh Powered by GraphQL

Hasura cofounder, fullstack developer working with React, GraphQL, Nodejs, Haskell, Docker, Postgres & Kubernetes.

Different approaches are being explored for building an operational data lake with an easy data access API, as well as a federated data access API, and GraphQL opens up opportunities for enabling these architectures by laying the foundation for a data mesh.

This talk has been presented at GraphQL Galaxy 2022, check out the latest edition of this Tech Conference.

FAQ

GraphQL is a query language for APIs that enables clients to request exactly the data they need in a single API call. This can reduce the load on underlying data systems, increase performance, and help in managing complex data structures efficiently.

A data API layer is crucial for standardizing API interfaces, ensuring performance quality, and providing security compliance across various data sources and client types. It facilitates efficient data management and faster application development.

While GraphQL offers flexibility and efficient data fetching, it also presents challenges such as increased complexity in schema design, authorization, security, and optimal query planning across varied data sources.

GraphQL APIs can serve a wide range of clients including internal and external clients, clients operating on cloud or on-premises, and those across different teams or regions. This flexibility makes GraphQL suitable for diverse operational environments.

Serverless architectures can significantly enhance GraphQL performance by allowing precise data fetching that reduces unnecessary operations and costs. This architecture adapts well to the scalable and efficient nature of GraphQL.

Yes, GraphQL effectively prevents over-fetching and under-fetching by allowing clients to specify exactly what data they need. This not only optimizes data retrieval but also improves overall application performance and user experience.

GraphQL integrates data fetching with authorization logic, allowing for complex security rules that are tailored to user permissions and data access requirements. This integration helps prevent unauthorized data access and ensures compliance with security policies.

graphql api development

Tanmai Gopal

34 min

08 Dec, 2022

Comments

Video Summary and Transcription

This Talk discusses the challenges of working with data APIs and GraphQL, including standardization, performance, and security. It emphasizes the need to optimize data fetches and push down authorization logic. The concept of externalizing authorization and using a GraphQL engine is explored. The Talk also covers the generation of GraphQL schemas and APIs, as well as the implementation of node-level security. Overall, the focus is on designing and standardizing GraphQL for data APIs while addressing authorization challenges.

Available in Español: Explorando el Data Mesh impulsado por GraphQL

1. Introduction to Data APIs and GraphQL

Short description:

In this part, Tanmay discusses the need for a data API layer to address the challenges of working with different data sources and clients. He highlights the benefits of GraphQL in selecting and structuring data, but also acknowledges the challenges of standardization, performance, and security. Tanmay explains how performance optimization can vary depending on the data sources and shares examples of query plans. He also mentions the discussion around the N plus one problem in GraphQL.

Hi, folks. I'm Tanmay, I'm the co-founder, CEO at Hustler. And I'm going to talk to you a little bit about data APIs powered by GraphQL today. So, increasingly platform teams across various organizations are setting up a data API layer to kind of deal with this problem of saying that you have so many different data sources and so many different types of clients. And you need to solve problems of performance and standardization and security to allow these clients to move quickly.

We have to deal with the fact that these data, the domain data is kind of coming from different sources, databases, services. Clients can be internal or external, they can be at the edge, they can be on the cloud, they can be on-prem, they can be within the same team, they can be across different teams. And we need kind of a data API layer that can absorb and solve for standardizing the API or providing a certain performance quality of service or providing security compliance guarantees. As a Data API, GraphQL can be a great fit and we'll see some of the benefits of GraphQL in addressing some of these challenges as well.

So, GraphQL is a nice API because as we all know, it allows us to select exactly the data that we need in a single API call. This has pretty large impact if the amount of data that we're fetching is fetching models that have been hundreds of attributes where we can drastically reduce the load on the underlying data system. Increasingly as we move to serverless centers and serverless data centers, there's a massive cost saving impact that also happens when we're able to select exactly the data that we need. We all know about the fact that GraphQL has a really nice schema and you have a type of graph that allows us to select exactly that allows us to structure the way that we're getting kind of our output, but also it allows us to structure our input and parameters fairly easily. Right? And that has an impact in our ability to handle increasing complexity. When we think about this query here, when I'm fetching orders, I'm fetching order where the user is greater than a particular value on it, ordered by the user ID in an ascending order. Providing these input parameters and arguments is much more easy with the GraphQL compared to trying to do this with a rest API, for example. Right? And so kind of being able to layer on this complexity becomes much easier. When we think about taking this, these kind of niceties of GraphQL that we're all aware of, and we think about standardizing and scaling this, we kind of run into some challenges at its core. It's because the cost of providing this increased flexibility means that we need to do a little more work in solving for kind of standardization or schema design and guaranteeing performance and solving for authorization and security, right?

Let's take a look at performance, for example. If we think about the types of data sources that we have and the way that we execute a query across those data sources, that optimal data fetching that we do can be very contextual. If we take a simple example of fetching orders and the user for each order as well, the username. Depending on the topology of this data, we might have varying query plans. For example, if it came from the same data source that supported JSON aggregation, if I had to implement a controller that would result and respond with just this data, I could make a single query that would perform the JSON aggregation at the data source itself. That means that I'm not even doing a joint that fetches a Cartesian product, I'm making a more efficient query that is fetching just the order the user are constructing the ship, the JSON, then sending that back to the client. Let's say it's coming from two different databases, in which case I would use something like an inquiry and perform memorization, so that I'm not fetching duplicate entities into this cross database joint. If this was coming from two different services, then I'd have to make multiple API calls, but again, I would do a form of memorization to prevent duplicate entities being fetched within the same request. It's a variation of the data pattern. But the idea that this query plan depends on the kind of data that we have, and the same kind of query plan will not work across these different data sources. There's an interesting thread that popped up on Twitter a few weeks ago, where we talked about how GraphQL doesn't create an N plus one problem, and Nick, one of the creators of GraphQL kind of chimed in saying that, well, GraphQL doesn't create the N plus one problem. But because of the way that we typically think about executing a GraphQL query, it does make it harder to address that problem in kind of a systematic way.

2. Challenges of Data Fetching and Authorization

Short description:

In this part, Tanmay discusses the challenges of integrating predicate pushdown with data fetching and the need to push down authorization logic. He emphasizes the importance of optimizing data fetches and explains the challenges of doing this across data sources.

And that's kind of what we look at, and we see how we can address those kinds of challenges. And we think about authorization. Very common challenge is that we have to integrate predicate pushdown along with our data fetching. Again, if you look at the same query where we're fetching orders user, and let's say this query is being made by a regional manager that can only fetch orders that are placed within the region, within their region. And so if we made a naive kind of request where we selected all of this data, and then after selecting the data, start filtering by region, terrible. We obviously can't do this when you have millions or billions of rows. And what you don't do is you don't want to fetch that data or select from orders, from the orders kind of model or table or whatever, where the region is equal to the current region. This is the predicate and again, pushing down that predicate in our data fetch, right? And we'd want to be able to push down our authorization logic with our data fetch as much as we possibly can. Doing this across data sources can become challenging, right?

3. Challenges of Data API Design

Short description:

This section discusses the challenges of standardized design when dealing with fetching data, mutations, and query planning in a data API. It highlights the benefits of better query planning and the performance gains it can offer. It also addresses the need for externalizing authorization logic and the advantages it brings. The examples provided demonstrate the importance of a rigorous technical foundation in data API design.

This can also be something that we need to kind of think about in standardized, let's take a few examples of where these challenges pop up. If you think about fetching use, or the total number of orders that each user has, right? There's an aggregate property that is kind of added to the user pack. Right? And so, you know, how do we want to think about this? Right? Do we want to add this as an attribute to the user type, which is what you would do conventionally, but what if we wanted to have certain arguments to that so that you only fetch those aggregates with a particular condition? Like only orders created or what if you want to layer on other aggregate properties? What if the order service is a totally different service in the user service, right? How do we want to expose aggregations from a different service being kind of quote, unquote, federated into a model definition that comes from a different service altogether, right? And so, we kind of need a way to think about the design here.

When you think about querying a parent by a property of the child, right? It's not a property of the parent, but a property of the child. Again, in situations where this is kind of federated, it becomes a complicated thing to design. And you want to have kind of a standardized design when you think about dealing to these kinds of workloads as well.

When you think about mutation, you have two broad types of mutations, right? You can do kind of this Kradesh style of mutations. And that's nice because you can compose well. You can say that I'm inserting an object and a bunch of related objects together. Or I'm updating them or inserting them or deleting them and stuff like that. Or you can think about a non-Kradesh style, where it's kind of more CQR style of mutation, where you have a unique mutation for each type of action that you want to do. And although this is nice, it does make it hard to standardize. Right? So, you want to be able to handle both of these flavors well.

So if you think about wanting to address these challenges, given that we're going to do it anyway, when we think about data API, let's take a look at what benefits we would get from a kind of more rigorous technical foundation, right? So, let's take a look at what would happen if we had better query planning. So, this set of examples, I'm going to continue using this e-commerce example, where I have users and orders and logistics service, right? So, we use a service, an order service, and a logistics service, and the user model has orders and items. And track order is kind of like an API call that performs a business logic to interact with say a FedEx API, or UPF API to fetch the order status. If I look at a simple query like fetching orders, and the user for that order, depending on how that data is laid out, I might have three different types of query plan. I might have this nice plan where I do n plus one. I have a serialization, deserialization overhead, I might do batching where it's possible to batch. And it's like a data loader, where I have to make two I O calls at least. And then perform JSON serialization, deserialization. Or in the best case, where this kind of data is laid out in a way that I can make a single call. And then the JSON serialization, deserialization just in one place and steam that back, right? And what we talked about, when we think about the benefit of this query plan, right? And we look at what this looks like in practice, benchmarking B99s at about 1000 RPS, increasing the depth of fetching data that is coming from two different services. If we look at this kind of approach, where we let better query planning happen, and not just default to kind of a data loader style and plus one is a massive amount of performance gain that we can get. Right? You can see that the blue is where there's query planning. And the red are where I just have kind of a GraphQL gateway that is federating out to two different GraphQL services that have been fetching from those underlying sources, right? The benefits kind of add up where I'm fetching just the right slice of data from the underlying database, where I'm performing minimal amount of repeated JSON serialization and then finally getting to the shape of data that I want to have. When we think about the benefit that we want on the authorization side, let's take an example where we have this Trap Order API, right? Where I'm fetching information about the delivery status or the order status for a particular order ID. Now, typically, if you wanted to have authorization logic that guaranteed that when I'm placing the order, I'm only looking at the order that I can look at as a user, the order belongs to me. What we'd have to do is we'd have to push down that authorization logic and ask the folks who've implemented the Trap Order function to implement that authorization logic, right? So, what I would do today is that in the logic or in the controller that does the Trap Order, I would want to fetch the order information, get the user information from that, fetch the user information that is related to that order, and then check a condition about the current session and whether that is connected and how that's related to the order and that order is used. And then depending on that condition, I would then actually validate it and make an API call to the actual logistics service and do whatever business logic needs to be done to return that order status, right? This authorization logic in orange is what I would like to be able to externalize, right? Because we're kind of doing this graph traversal where we're saying if the input.order.user is current user, right? Some condition like that, I would like to evaluate that condition and then make this business logic happen.

4. Authorization Logic and GraphQL Engine

Short description:

This part discusses the need to externalize authorization logic and the benefits of centralizing authorization. It explores the idea of predicates and aggregations in graph reversal and the importance of evaluating related predicates. The text also introduces the concept of decoupling the syntax and semantics of a GraphQL API from its execution by using a GraphQL engine. The engine uses a domain graph and allows for composing predicate and aggregation functions. It can plan queries and fetch data from the domain graph, providing a GraphQL schema derived from it. The focus is on accessing the domain through the GraphQL API rather than treating the API as the domain itself.

And then depending on that condition, I would then actually validate it and make an API call to the actual logistics service and do whatever business logic needs to be done to return that order status, right? This authorization logic in orange is what I would like to be able to externalize, right? Because we're kind of doing this graph traversal where we're saying if the input.order.user is current user, right? Some condition like that, I would like to evaluate that condition and then make this business logic happen. You can think about more complex scenarios where I have multiple queries and I have repeated authorization checks. It's kind of doing this data fetching thing and graph reversal thing that we saw in just vanilla data access, but we're doing that for authorization as well. And the same benefits that we want on data access, we can start to bring those benefits to the authorization as well. Apart from the fact of course that we're now able to centralize authorization and guarantee a degree of security and compliance, the data API layer itself across different types of data sources.

So, when you kind of think about what powers this, one of the key ideas that powers this is the idea of predicates and aggregations that compose the graph reversal. Let's take this example of where I'm fetching users and orders, right? And if I want to make a query that fetches an order where the user region is equal to a particular value, this is where I'm kind of bringing in a property of the child and I'm able to bring that property of the child into the parent, and then I'm able to kind of evaluate a predicate on the child for the parent. And it's not just about kind of the syntax that we'd want on the GraphQL API side, but we'd also want our data API engine to be able to evaluate these kind of related predicates and push them down into underlying data sources whenever possible. This is interesting code to write even if the data was in the same place and it's obviously very interesting to think about it as coming from two different data sources, two different services. And this is what we want to be able to do.

All right. So how would we go about doing this? And that's going to be the second half of what I talk about today. The first problem here, or the key insight here is that we want to make sure that the right information is available at the right level. I mentioned ways of doing GraphQL and executing GraphQL query. It coupled the execution model to the GraphQL query and the GraphQL schema structure. So for example, we're forced into when you're servicing a query that fetches orders and users and stuff like that, we're making a breadth first reversal and we're calling these resolver functions in that kind of breadth-first way that is looking at the GraphQL query and then we're executing the logic there, right? When we're forced to write our logic in that particular way, then we need a data layer that can understand those things. And this is very challenging to write because it generates a data layer across these data services that becomes challenging, right? I think about all the problems that we were talking about. What we want to do instead is think about decoupling the syntax and the semantics of our GraphQL API from how it's executed. We move from this GraphQL server idea to a GraphQL engine, right? And the GraphQL engine uses a domain graph, not the GraphQL schema. It uses a description of the domain and description of security policies on that domain, et cetera, and how that's enough for online data sources. It allows a certain way of composing predicate functions and aggregation functions. And it can do that query planning to fetch data from that domain graph. And then on top of that, it exposes a GraphQL schema that is derived from that domain graph. But how a particular GraphQL query is processed is left entirely to the GraphQL engine, right? Whether, for example, it's going to be making multiple API calls or it's going to be composing or compiling it to a single query. That execution is kind of left to the GraphQL engine. And we are not kind of thinking about it from this point of view of calling a GraphQL function, right? Another way to think about this is that it's kind of GraphQL without resolvers, right? I guess the mental model that I'm trying to go for here is saying that if you have to build, if you have to think of your GraphQL server as calling a GraphQL function, right? And executing that GraphQL function, when you call a GraphQL API, instead of that, what if we move to this model of saying, well, we have a graph of our domain, and we're making it accessible over a GraphQL API. So we're kind of focusing more on the domain, and we're using GraphQL API as a way to access that domain, rather than thinking of the GraphQL API as our domain. So let's take a look at what this kind of specification can look like. We looked at, you know, we looked at GraphQL, we looked at row level security, we looked at graph databases, looked at relational algebra and how composition works. Techniques like foreign data wrappers that systems like Postgres have, that help bring different types of systems into a relational system.

5. GraphQL Data Specification and Domain Graph

Short description:

We can describe a GraphQL data specification that consists of a domain graph description, node level security, and conventions to generate the GraphQL schema and API. The domain graph describes models, fields, edges, and commands, which the GraphQL engine can interpret. In the e-commerce example, the graph represents connected nodes and commands that operate on the graph. The GraphQL engine can evaluate predicates and perform aggregations on the nodes. Predicate functions allow the evaluation of properties on models, including traversing edges and composing Boolean expressions. This enables complex evaluations on the domain graph.

So kind of looking at all of this, we can describe a GraphQL data specification that is composed of three pieces. The first is domain graph description, which describes the kind of models and edges commands that are a part of that domain guide that come from different data sources. We then have this notion of node level security. That is an authorization language and policy framework that allows nodes of that domain graph to be validated or constrained or filtered. And then finally, we have conventions and the particular grammar to generate the GraphQL schema and API from the underlying domain graph incorporating these node level security rules. Right?

So let's take a look at what the domain graph looks like. So this language is intended to describe the models, the fields, and the edges in the domain. It describes commands that can take a model as an input and return a model as an output, right? And the GraphQL engine can interpret this domain graph. Should provide predicate functions and aggregation functions, pagination functions, and stuff like that that can operate on that top graph. Right?

Let's take a look at what this looks like with that e-commerce example, right? And visualize So if you look at kind of this model's graph, right, we have these nodes that are connected together, right. And we have commands that take a graph as an input and return this as a graph as the output, right. This becomes the input to a GraphQL engine that is now able to evaluate predicates on these nodes, right? And can say that, hey, here's a particular expression where we want to evaluate a node, traverse the graph, and evaluate a property to a false. We want to be able to evaluate aggregations on sets of nodes, right? And the GraphQL engine should know how to select, stream, insert, delete, and update, or invoke commands on that particular graph. So this is the work that the GraphQL engine should be able to do, given this input.

If you take a look at the specific e-commerce list example, right, we had a data source where we have a model for a user that has an ID, a name, and an edge to orders. We have orders that have ID, amount, user ID, as fields and an edge back to the user and an edge to items, which then has the item description and stuff like that, right? We have the track order API call, where there's an input model that takes an order ID, and you'll see that it has an edge to the order model itself. And you see the output model here, which is the order status, which has description, status and stuff like that, but also an edge to the order model, right? You see this command that can take an input model, that takes an input and returns a particular output. So this is kind of how we'd want our domain to be kind of mapped into a graph, right? A set of models and edges.

Now, the key part of this is predicate functions, which actually make the compilation happen. So as soon as we have a user model, we want to generate a predicate function that can evaluate properties on the user model. So we can do and on expressions, we can have an id, the id can be compared to a value with certain integer operators, string operators, region string operators. But when we see an edge to order from the user model, we're also able to traverse the order Boolean expression. And we're also able to traverse an order aggregate Boolean expression. This is what allows us to say things like, you know, where the user dot total orders is greater than 10. Right. And so you're kind of able to create or evaluate a property of the user model by composing related entities as well. Similarly, on the order side, you can do the same thing where you have Boolean expressions on those fields itself. But you're also able to compose Boolean expressions of the edges, right. And this is what allows us to have functions like user dot order aggregate dot concrete and five, or order dot user dot region is equal to not medical email, or something like that, right. And this is what allows this kind of evaluation to happen.

6. Node-Level Security in Data API Specification

Short description:

This part discusses the implementation of node-level security in the data API specification. It explains how permission policies and constraint predicates are used to control access to different models and nodes in the graph. The example of an eCommerce system is used to illustrate how node-level security rules can be applied to user models, order items, and track order API calls. The concept of creating a subgraph with accessible attributes and nodes is also introduced.

This is what will, as you can see, will show up in our filtering or will show up in our node-level security rules and stuff like that will show up in the argument for a Rockwell API and things like that. Coming to node-level security, which is the second piece of this specification. For each model, we can attach a permission policy, right, and that policy decides what attributes of that model are accessible, you know, and a constraint predicate that has to be met to filter this or to validate this particular node when it is accessed, right?

You can model allow and deny kind of rules as and, or not operators, you can have multiple policies, and you can define how you'd want multiple policies to be composed together, right? So, you can select overlapping parts of the graph, select the right pieces of data from overlapping parts of the graph, stuff like that. Again, just taking a step back to look at what this looks like on our eCommerce example, right, on the user model, we'd have a node-level security rule that says if the user.id is equal to the current session user id, I should be able to kind of access it, right? If I look at order items, I can access an item in order if that order actually belongs to me, so the item.order.userid is equal to me. If you look at our track order API call, the input model can traverse the edge as well. So, I can say that the input.order.userid should be equal to the current session user id, so I can kind of traverse inputs in the graph as well, right, and then validate that and check if there are certain things that are, if their operation is allowed, for example, or if that data can be returned as a response of that command, right, as a result of executing that command. If you visualize this from a graph point of view, it kind of feels like you have this graph of model and an NLS rule, essentially, an NLS policy allows you to create a sub graph with certain attributes and certain nodes that are accessible, right. If you take a look at a command, you're kind of seeing that the input graph must be validated, that this graph is actually allowed to exist for this command and then the output is filtered so that you have the right response that you're able to see by executing that command, right, and that's kind of what node-level security looks like.

7. Generating GraphQL Schema and API

Short description:

We have conventions to generate the GraphQL data schema and API, allowing for fetching models and performing aggregations. Mutations enable CRUD operations and invoking commands. The benefit includes improved performance, ability to compose and design the API, externalizing authorization rules, and creating a standardized security layer. Additionally, the technique can be used to build an automated API cache data API.

Finally, based on these two things, we have conventions to generate the GraphQL data schema and API, right, so for example, you have the query root which then allows you to fetch models on their edges with the predicate function, right, so filtering, you can fetch aggregations on lists or on related lists that allows you to do users and users.order aggregate, so you can compute users, but average order amount and total number of orders and stuff like that or the average number of items in the single order and that can compose well into a query syntax.

On mutations, if you're on the CRUD side, you have insert, update, delete semantics for models and edges where CRUD is possible and where you want to expose that CRUD and other places you have mutations for commands where you can invoke those commands, right, and that convention is what kind of generates the right GraphQL schema from the underlying.

Now we can kind of see that the benefit here is, you know, on the performance side and on our ability to compose and design the API and of course to externalize the authorization rules so that we can kind of create a standardized security and compliance layer for our data API that is, you know, a unified approach across different kinds of data sources. A bonus benefit apart from being declarative and our ability to kind of reason about things at a domain graph level and look at authorization rules is that you can also use the same technique to build an automated API cache data API.

8. Authorization Rules and Scaling

Short description:

Let's take an example where I have documents, a team, of a model, a document model, right, that has an ID and a title. Each document belongs to a team, and the team has a list of users. Authorization rules on documents determine if a user is allowed to operate or access the document. This logic is hard to externalize, but now we have an approach to scale and standardize it across models and edges.

Let's take an example where I have documents, a team, of a model, a document model, right, that has an ID and a title. And each document belongs to a team, and the team has a list of users, right? A authorization rule on document would say that if the team.user contains the current user, I'm allowed to operate or access this document, right? If user 1 was making a query to document, user 1 belongs to team 1. And if user 2, who also belongs to team 1, queries the document, we want a cache hit to happen. And this cache hit can now happen because the authorization rule is the same and can determine that user 1 and user 2 are actually going to result in a cache hit, right? This kind of logic is hard to externalize unless we externalize the authorization rule itself. So our authorization rules are also kind of becoming cache heroes. And that's what you would have done if you were kind of building this yourself. But now we have an approach to kind of scale this out and standardize this across whatever model we have, whatever edges we have to traverse.

9. Summary and Authorization Challenges

Short description:

We discussed the design and standardization of GraphQL for data APIs, as well as node-level security as an authorization method. The GraphQL Data Specification on GitHub provides a formalized grammar and a playground for API design exploration. If you're building a data platform or API, I'd love to hear from you. In a poll, authorization and security were identified as the top concern in GraphQL stacks. Performance was the second concern. When it comes to authorization, there are two main dimensions to consider: performance and the ability to externalize authorization logic. Integrating data fetch and authorization logic is crucial, especially in federated scenarios or when dealing with multiple data sources. Filtering data based on authorization rules should be done during the fetch process to avoid unnecessary data processing. Complex authorization rules may involve traversing the graph and evaluating properties of the data and user session.

So to quickly summarize, we looked at data API, we looked at how we can design and standardize the GraphQL approach for data APIs. We looked at node-level security as an authorization. We put this together on GitHub at GraphQL Data Specification and there's a playground there as well that you can check out. There's a formalism with the draft specification that formalizes the grammar and there's a playground where you can input certain models and edges and see what the authorization looks like and see what an API design would look like. And you can use that if you're thinking about building a data platform or data service or data API inside your organization.

If you're doing something like that, I'd love to hear from you. I'd love to exchange notes, but that brings me to my time. And hopefully that was an interesting introduction to how did APIs work with GraphQL. Thank you. And thanks for your poll question. I think it was a very interesting question. Let's maybe look at the results. So if you look at the results of your question, the question was, in your GraphQL stack, which of the following is the most concerned or biggest blocker for your teams? And the options were authorization and security performance, not using GraphQL right now at all, or time spent in running resolvers. So authorization and security was the top result. And then second was performance.

I guess maybe from that, by the way, I'll just tell people that you can continue asking questions on Discord if you want, and I will estimate these questions. So thinking about your talk, how do you think, you know, it looks like there's the topic of authorization and security just keeps coming up in every GraphQL conversation. What do you think will be the biggest difference for people, let's say, if we adopt this idea of what you're suggesting in your talk? Yeah, I think the, so I think there are just two main concerns with authorization, right? So let's leave GraphQL API security outside and let's talk about authorization, right? So GraphQL API security stuff is API security like rate limiting, cost-based limiting, and stuff like that is that is common to all APIs, right? But there are some unique aspects to GraphQL that might require you to become more specific about it, right? Like maybe rate limiting should be based on something that is based on the query complexity or based on the query depth or something like that, that might be a little bit unique to GraphQL. But broadly, any normal API security will give you the same kind of API security with GraphQL, right? So that's gonna work okay, that's not a big deal. Disable introspection, all that, all the standard stuff, not an issue. But when we think about authorization in particular, it gets very interesting and challenging on two dimensions, right? The first dimension is performance and the second dimension is being able to externalize authorization logic. So what I mean by performance is that we'll notice oftentimes that when we are, especially in a federated kind of scenario where we have multiple services or multiple sources, but even in a single source where you're getting data from multiple API endpoints or multiple tables or whatever you have as your upstream source. The data fetch and the authorization logic need to be integrated together, right? Extremely simple example. I'm fetching a bunch of, I'm fetching a list of orders from the system, right? When I'm fetching a list of orders from the system, I should only fetch those lists, those orders that have access to. So you can't make a query to a billion orders and then filter that, right? You can't go to an upstream API and say, yeah, give me all the orders that you have. Your upstream service will die. You will die, right? Like trying to process that much data. And then after that we'll apply an authorization rule to filter out which orders can be accessed, right? That's not what you want to do, right? Imagine other authorization rule is more complex that it traverses the graph. You can access the orders if the order dot like account manager dot region is the same as the current users region, right? You can evaluate the complex property of the data and a complex property of the user session to decide whether somebody is outside of that part of that information is outside of your current service. Exactly.

10. Challenges of Authorization and API Design

Short description:

The overlap between optimizing data fetching and evaluating authorization rules in GraphQL poses a challenge. Naively approaching GraphQL without considering authorization can lead to problems. It is important to standardize authorization and place it in the right context. A draft specification is available for exploring API and authorization design. The goal is to put authorization in the right place and start organizing code accordingly. A working group for GraphQL authorization is suggested.

Could be right? That's when the complex right? So so we kind of you you seeing that this overlap is happening, right? There's an overlap between all the optimizing work that we would do for fetching data across the services. It's really similar to all of the optimization work that we would have to do when evaluating authorization rules on this data and integrating with the data fetch, right? These are like problems that are that any API would have, but in the context of GraphQL, if we naively approach GraphQL, the reason why authorization becomes a problem is because there's no way of thinking about how to solve this problem, right? If you had a REST API, you can, you're not really worried about this problem because I mean, you're in control of the API, right? Ultimately, the API endpoint is as something as stupid as slash orders. Now you can do anything inside that code, right? Like you can, you can, you can make complex authorization complex data fetching, whatever, because you didn't provide any flexibility. If you're not providing any flexibility, it's all up to you to do, you can do whatever you want, right? So with GraphQL, providing that flexibility, but also integrating authorization with that flexibility, that starts becoming like a, that's the reason why it comes up as a dogmas challenge.

Yeah, no, I like it because also, like, basically one of the things is that when people just, there's a, like a tendency when people are joining into GraphQL to put authorization also in the wrong place, right? And then leaving, you know, leaving stuff open or just, you know, being centralizing that the management of it, even though it's not the people actually needs to take care of authorization or taking care of it. And this is like, kind of trying to standardize it or at least tell people, this is where it belongs. This is who needs to write that maybe.

And do you see, so do you think this is kind of like a, let's call it like an extra spec, like a GraphQL plus X. Do you see, what do you see the first concrete ways for people to get use or to try these things or to? Yeah, I put in a link to the, I put in a link to our GitHub repository where we starting to put the draft of the specification together. And so you can try it out. It's kind of a specification. There's no run time yet. I mean, what we've been doing at Hasura is kind of like a runtime for the same specification. So if you want the reference implementation of the specification that's close to it, you can look at Hasura course to see how that works. But if you look at the specific self, there's a draft you can start looking at it and kind of forming opinions on how you can start to use the specification for your own API design and authorization design and stuff like that. Yeah. So theoretically, we can also take this and then have just people writing today manual resolvers in TypeScript, JavaScript, kind of define the framework of where they should write these things and then even for now, not use like the… let's say the benefits of execution for now, but at least start ordering stuff and then maybe later start introducing smarter execution engines.

Exactly. But at least now you're starting to put your authorization in the right place because otherwise, you're putting your authorization code, maybe you can graph your schema or something and that's never going to work. So things like that. Let's do a working group. Graph your authorization working group. Okay, so we are out of time, but we have very exciting things to continue the conversation with. So first of all, thanks Palme. It's a pleasure as usual. It was very, very interesting.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

From GraphQL Zero to GraphQL Hero with RedwoodJS

GraphQL Galaxy 2021

32 min

From GraphQL Zero to GraphQL Hero with RedwoodJS

Top Content

Tom Preston-Werner

GitHub cofounder, RedwoodJS author

Tom Pressenwurter introduces Redwood.js, a full stack app framework for building GraphQL APIs easily and maintainably. He demonstrates a Redwood.js application with a React-based front end and a Node.js API. Redwood.js offers a simplified folder structure and schema for organizing the application. It provides easy data manipulation and CRUD operations through GraphQL functions. Redwood.js allows for easy implementation of new queries and directives, including authentication and limiting access to data. It is a stable and production-ready framework that integrates well with other front-end technologies.

frameworks graphql redwoodjs builders and founders

Local State and Server Cache: Finding a Balance

Vue.js London Live 2021

24 min

Local State and Server Cache: Finding a Balance

Top Content

Natalia Tepluhina

GitLab

This Talk discusses handling local state in software development, particularly when dealing with asynchronous behavior and API requests. It explores the challenges of managing global state and the need for actions when handling server data. The Talk also highlights the issue of fetching data not in Vuex and the challenges of keeping data up-to-date in Vuex. It mentions alternative tools like Apollo Client and React Query for handling local state. The Talk concludes with a discussion on GitLab going public and the celebration that followed.

graphql vue server cache

Get rid of your API schemas with tRPC

React Day Berlin 2022

29 min

Get rid of your API schemas with tRPC

Top Content

Giorgio Boa

Claranet

Today's Talk introduces TRPC, a library that eliminates the need for code generation and provides type safety and better collaboration between front-end and back-end. TRPC is demonstrated in a Next JS application integrated with Prisma, allowing for easy implementation and interaction with the database. The library allows for seamless usage in the client, with automatic procedure renaming and the ability to call methods without generating types. TRPC's client-server interaction is based on HTTP requests and allows for easy debugging and tracing. The library also provides runtime type check and validation using Zod.

api development

Batteries Included Reimagined - The Revival of GraphQL Yoga

GraphQL Galaxy 2021

33 min

Batteries Included Reimagined - The Revival of GraphQL Yoga

Uri Goldshtein

Founder of The Guild, the largest open source group in GraphQL ecosystem.

Envelope is a powerful GraphQL plugin system that simplifies server development and allows for powerful plugin integration. It provides conformity for large corporations with multiple GraphQL servers and can be used with various frameworks. Envelope acts as the Babel of GraphQL, allowing the use of non-spec features. The Guild offers GraphQL Hive, a service similar to Apollo Studio, and encourages collaboration with other frameworks and languages.

graphql react server components

Rock Solid React and GraphQL Apps for People in a Hurry

GraphQL Galaxy 2022

29 min

Rock Solid React and GraphQL Apps for People in a Hurry

Ryan Chenkie

Founder @ CourseLift

The Talk discusses the challenges and advancements in using GraphQL and React together. It introduces RedwoodJS, a framework that simplifies frontend-backend integration and provides features like code generation, scaffolding, and authentication. The Talk demonstrates how to set up a Redwood project, generate layouts and models, and perform CRUD operations. Redwood automates many GraphQL parts and provides an easy way for developers to get started with GraphQL. It also highlights the benefits of Redwood and suggests checking out RedwoodJS.com for more information.

react graphql

Adopting GraphQL in an Enterprise

GraphQL Galaxy 2021

32 min

Adopting GraphQL in an Enterprise

Shruti Kapoor

Lead Front End Engineer @ Slack

Today's Talk is about adopting GraphQL in an enterprise. It discusses the challenges of using REST APIs and the benefits of GraphQL. The Talk explores different approaches to adopting GraphQL, including coexistence with REST APIs. It emphasizes the power of GraphQL and provides tips for successful adoption. Overall, the Talk highlights the advantages of GraphQL in terms of efficiency, collaboration, and control over APIs.

graphql enterprise

Workshops on related topic

Build a Headless WordPress App with Next.js and WPGraphQL

React Summit 2022

173 min

Build a Headless WordPress App with Next.js and WPGraphQL

Top Content

Workshop

Kellen Mace

In this workshop, you’ll learn how to build a Next.js app that uses Apollo Client to fetch data from a headless WordPress backend and use it to render the pages of your app. You’ll learn when you should consider a headless WordPress architecture, how to turn a WordPress backend into a GraphQL server, how to compose queries using the GraphiQL IDE, how to colocate GraphQL fragments with your components, and more.

next.js wordpress graphql

Build with SvelteKit and GraphQL

GraphQL Galaxy 2021

140 min

Build with SvelteKit and GraphQL

Top Content

Workshop

Scott Spence

Have you ever thought about building something that doesn't require a lot of boilerplate with a tiny bundle size? In this workshop, Scott Spence will go from hello world to covering routing and using endpoints in SvelteKit. You'll set up a backend GraphQL API then use GraphQL queries with SvelteKit to display the GraphQL API data. You'll build a fast secure project that uses SvelteKit's features, then deploy it as a fully static site. This course is for the Svelte curious who haven't had extensive experience with SvelteKit and want a deeper understanding of how to use it in practical applications.

Table of contents:
- Kick-off and Svelte introduction
- Initialise frontend project
- Tour of the SvelteKit skeleton project
- Configure backend project
- Query Data with GraphQL
- Fetching data to the frontend with GraphQL
- Styling
- Svelte directives
- Routing in SvelteKit
- Endpoints in SvelteKit
- Deploying to Netlify
- Navigation
- Mutations in GraphCMS
- Sending GraphQL Mutations via SvelteKit
- Q&A

graphql svelte

Relational Database Modeling for GraphQL

GraphQL Galaxy 2020

106 min

Relational Database Modeling for GraphQL

Top Content

Workshop

Adron Hall

In this workshop we'll dig deeper into data modeling. We'll start with a discussion about various database types and how they map to GraphQL. Once that groundwork is laid out, the focus will shift to specific types of databases and how to build data models that work best for GraphQL within various scenarios.
Table of contentsPart 1 - Hour 1 a. Relational Database Data Modeling b. Comparing Relational and NoSQL Databases c. GraphQL with the Database in mindPart 2 - Hour 2 a. Designing Relational Data Models b. Relationship, Building MultijoinsTables c. GraphQL & Relational Data Modeling Query Complexities
Prerequisites a. Data modeling tool. The trainer will be using dbdiagram b. Postgres, albeit no need to install this locally, as I'll be using a Postgres Dicker image, from Docker Hub for all examples c. Hasura

database graphql

Hands-on with AG Grid's React Data Grid

React Summit 2022

147 min

Hands-on with AG Grid's React Data Grid

Top Content

Workshop

Sean Landsman

Get started with AG Grid React Data Grid with a hands-on tutorial from the core team that will take you through the steps of creating your first grid, including how to configure the grid with simple properties and custom components. AG Grid community edition is completely free to use in commercial applications, so you'll learn a powerful tool that you can immediately add to your projects. You'll also discover how to load data into the grid and different ways to add custom rendering to the grid. By the end of the workshop, you will have created an AG Grid React Data Grid and customized with functional React components.- Getting started and installing AG Grid- Configuring sorting, filtering, pagination- Loading data into the grid- The grid API- Using hooks and functional components with AG Grid- Capabilities of the free community edition of AG Grid- Customizing the grid with React Components

react components api development aggrid react react table

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

Top Content

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

node.js cloud graphql fastify

Building GraphQL APIs on top of Ethereum with The Graph

GraphQL Galaxy 2021

48 min

Building GraphQL APIs on top of Ethereum with The Graph

Workshop

Nader Dabit

The Graph is an indexing protocol for querying networks like Ethereum, IPFS, and other blockchains. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.

In this workshop you’ll learn how to build a subgraph that indexes NFT blockchain data from the Foundation smart contract. We’ll deploy the API, and learn how to perform queries to retrieve data using various types of data access patterns, implementing filters and sorting.

By the end of the workshop, you should understand how to build and deploy performant APIs to The Graph to index data from any smart contract deployed to Ethereum.

graphql ethereum api development