1. Introduction to GraphQL and its Potential
I'm an engineering manager at Apollo, and today I want to talk about how to fully leverage and consume a GraphQL API. GraphQL can be useful to people in your organization beyond just the developers. It can be the standard way that we model and query our business data. We will compare SQL to GraphQL and discuss the concerns of using GraphQL as a universal access point for data.
Hey, everyone. My name is Danielle, and I'm an engineering manager at Apollo where my team and I are responsible for building dev tools specifically that help people query and use GraphQL APIs. Today, I'm really excited to be sharing some of the ideas that inspire our work centered around how you can use GraphQL to connect people in your organizations beyond just your developers to data that would empower and enable them to do their jobs more effectively.
This talk is going to be a little bit different from others because instead of talking about how to build a GraphQL API and the many interesting technical challenges there, I want to talk about how to fully leverage and consume a GraphQL API. I believe that GraphQL can be useful to people in your organization way beyond just the developers who are using it to query their data. I believe that you can build a unified graph for your data for everyone to use and that it will empower people in your organization like you've never seen before.
Data accessibility is a really hard problem, and it's really hard to access data from all of our systems these days because we store it in all sorts of different places, different databases, different microservices, everything has been optimized to be for a different type of data, everything is queried in a slightly different way, and it's hard to figure all these systems out sometimes, even as developers, but there are a lot of people who could do their jobs more effectively if they could just plug into the data in our systems. And for product development, we've solved the situation of having many services that are all a little bit different by introducing a new layer with GraphQL and using it to create a singular API. And I believe that this new layer that we've introduced for our APIs with GraphQL can also be used to solve the more general problem of data access in our organizations. I believe that GraphQL can be the standard way that we model and query our business data for almost all use cases.
So with our time today, I want to walk you all through how to think about using GraphQL in this way as we pose this question of, can GraphQL be the way that we create a universal access point for our data? And to get into this topic, I want to start by walking through a SQL query together and comparing SQL to GraphQL a little bit. So this is a query that I've written many times over myself, and it's an analytics question. For each account, how many users have I seen in the last 30 days? And if I break this query apart and look at the different elements of it, there are some distinct things that stand out. The select here lets me control what I'm asking for given a platter of options which we have the exact same ability to do with GraphQL. The where here is a conditional selection. I only want to select users if I've seen them in the last 30 days. With GraphQL, we have nothing specifically in the language to express a filter like this, but we can still filter our data using arguments. The join here lets us select data across multiple tables. With GraphQL, you actually build your join logic into your schema. So the query writer doesn't have to know anything about how to join data to benefit from being able to query join's data. I actually think that the GraphQL experience here is much better for the data browser than the SQL experience because you're not kind of reconstructing your business logic around joins. And then the last thing that I want to point out for now is this ability to group and count. This idea that we have aggregation and array functions that we can apply to our queries is something that I really miss in GraphQL. If you want to query something that's computed, you have to build those computed fields into your schema, which means you have to anticipate their needs, which you can do for applications like building layouts and clients, but you can't exhaustively anticipate every need that anyone is going to have when they're just casually browsing your data. So coming back to our question of if GraphQL can be the way to create a universal access point for our data, I think the main concerns with taking this approach are going to break down into three categories that I want to walk through together. Can I optimize my queries enough for it to make sense? GraphQL is built on top of anything. Or it could be built on top of anything. So it's going to be important to consider that we may be wanting to query very large swaths of data. Number two, can I express what I want to express? I think this one comes down to what I showed you with the SQL query where GraphQL is just kind of missing computation elements in the language itself. And then number three, can I see things the way that I want to see them? GraphQL was designed to be used by developers.
2. Optimizing GraphQL Queries
GraphQL queries can be optimized by mapping them directly to performant database queries, which is the most efficient approach. Tools like JoinMonster help in this process. For more information, refer to the blog post by Marc-Andre Giroux.
So it's not the most accessible thing to people from the data world who are very technical but are used to working with data in table formats and doing things like Excel, applying Excel formulas and being technical in that way.
So let's walk through these together and talk about whether or not these hurdles can be overcome with GraphQL.
So this first question of can I optimize my GraphQL queries enough for it to make sense? The thing that really comes to mind with this one for me is can we provide a mapping of our queries to an implementation that is performant? GraphQL is adding this layer of processing in your stack.
So the best thing we can try to do is make that layer as thin as possible and avoid adding extra processing at the GraphQL layer. And ideally, we can take the GraphQL queries that come in and map them directly to database queries, which is guaranteed to be the most performant outcome.
And there are a lot of tools that do this or that help you do this specifically with SQL that are out there. There are even companies that build a GraphQL on top of SQL as a service. And in all of those tools, what you're trying to do is take your GraphQL query and identify the precise SQL query that needs to be made to fetch the data that was asked for.
And what I have on this slide is an example of one of those libraries called JoinMonster that does this. But there's a great blog post written on the topic of GraphQL to SQL, specifically that I've linked here in this slide by Marc-Andre Giroux.
3. Translating Druid Queries to GraphQL
Something that's a little bit closer to our hearts at Apollo is how to translate Druid queries to GraphQL. We built a Druid to GraphQL translator to support our product needs. We generated a portion of our GraphQL schema from Druid, allowing GraphQL queries to be transformed to Druid. This approach provided flexibility but had challenges with return data formatting and confusion in query execution.
Something that's a little bit closer to our hearts at Apollo, though, is how do you translate Druid queries to GraphQL? Druid is a time series database that's designed to help you query analytics data over large swaths of time. And we do use Druid for some stuff that we do at Apollo. And we actually built a Druid to GraphQL translator a couple years ago to support some of our product needs.
At the goal, our time was to build a flexible API for querying stats data. We basically generated a portion of our GraphQL schema from Druid, so you would run GraphQL queries and they would be transformed to Druid. So I want to show you all what that looks like, actually, in an example. So here, I've got a query for data on a service. And the part of the schema that's translated to Druid, or generated from Druid, is the stats part of the schema. And each field that I can query under stats is actually corresponding to a table in Druid.
So, now my 2,285 queries or requests are going to be split out into the number of requests for each query that got made. And each of the things that I can group by, these are effectively columns in Druid. So, I can group by client name and segment the query further. And the more things you group by, the more parameters you have, the more results you'll have from your query. So, this is just kind of directly mapping to Druid. So, this I would say worked extremely well on the flexibility side of things. Because you can just make Druid queries which is not something I would've otherwise been able to do without needing to know how to connect to the Druid database.
On the flip side, though, I think the jury is still out as to whether or not this actually covered our product needs. It was a big pain point that our return data was not in the shape formatted for our client layouts, because we still had to do a lot of computation in our front end to get our data into the shapes that we needed. And I think the bigger anti-pattern with this is actually that it was confusing. That if you added fields under group by, you were actually going to directly affect how the query was executed and the data that you got back. This ended up being kind of an intuitive to a lot of our teammates, who would have expected something like that to be put into an argument instead.
4. Flexibility and Translation of GraphQL
The flexible queries allow for quick iteration with feature work. However, there are pros and cons to this approach. It's possible to translate GraphQL into other languages, but API and analytics concerns have different goals. Request latency is less important in data analytics, where streaming and scanning large data results are prioritized.
The flexible queries also let us iterate really quickly with our feature work. Because we didn't have to know what our precise end goals were in order to get started. And this generated schema was nice to keep up to date. Because every time we added a new column to a table, that would just get pulled into the schema automatically.
On the flip side, though, I think the jury is still out as to whether or not this actually covered our product needs. It was a big pain point that our return data was not in the shape formatted for our client layouts, because we still had to do a lot of computation in our front end to get our data into the shapes that we needed. And I think the bigger anti-pattern with this is actually that it was confusing. That if you added fields under group by, you were actually going to directly affect how the query was executed and the data that you got back. This ended up being kind of an intuitive to a lot of our teammates, who would have expected something like that to be put into an argument instead. So, I think there are pros and cons to an approach like this.
But the point is it's possible to translate GraphQL directly into other languages, especially complex database languages. And before we move on from the query optimization topic, I just want to highlight that API concerns are going to be different from analytics concerns. GraphQL schemes are typically built to be APIs. So, it's common to have things for pagination and other types of limitations built in. But if you're trying to do data analytics, request latency is not going to be as important as being able to stream large data results back and scan really large arrays of data. So, these two things, these two worlds actually have kind of competing goals in some ways. And that's something you're just going to have to reckon with.
5. Using Directives and GraphQL LowDash
GraphQL has a concept called directives, which can be applied to queries and schema. One project, GraphQL LowDash, implements functions to transform arrays of objects. It provides support for applying LowDash functions to queries through directives. An example using the GitHub graph is shown, querying the top voted issues in the Apollo server repository.
So, question number 2. Can I express what I want to express? I think the interesting thing about this one is that despite the language not having counting and aggregation functions built in, GraphQL does have this concept called directives which can be applied to both queries and schema. And you can basically define logic and functions in directives. And if you apply them to your queries, that will give the schema some indication about how the query should be executed. And there are a lot of interesting things up there that people have done with directives including things around authentication and skipping, including deferring fields. But what I wanted to show you today in the spirit of what we're talking about with query flexibility is a project called GraphQL LowDash.
LowDash is a utility library in JavaScript, and it implements a very large number of functions to transform arrays of objects. These functions include filter, count, min, max, sort, reverse, all sorts of things. And GraphQL LowDash is a node package. And you can add it to your server, and what it will do is provide support for applying LowDash functions to your queries through directives so you can transform the results of your queries. And to really show you all what's going on with GraphQL LowDash, I thought we would jump into another example. And my favorite graph to query is the GitHub graph. So I thought we could try and ask the GitHub graph and analytics question, which is what are the top voted issues in the Apollo server repository?
So here I have started us with a query for the Apollo server repository. I've asked for a list of issues. And on each issue, we can actually query the reactions that people have had to that issue. So I thought a good way to proxy votes would be people providing a thumbs up reaction on issues. So if we look at our data, you can see that we have thumbs ups here, we have eyes. And what I want to do is kind of transform this result into the answer to our question. So the first thing I'm going to do is I'm going to actually map this edges array to try and count the number of thumbs up reactions that we have. So in edges, I'm going to say let's map this to node.content. And if I rerun this query, you'll see now that array is much simpler. It's not an array of objects. It's just an array of strings. And actually, instead of mapping, if I count by node.content, we'll get an actual count of the number of reactions. So now we know how many thumbs ups we have, how many eyes we have as reactions, but we don't know what our issues are yet. So let's ask for the titles of these issues. And I don't want in my results a kind of object of reactions. I want a single number that represents votes. So I'm going to get edges.thumbs-up here. And I'm going to alias this field reactions to votes.
6. Transforming Data and Accessibility in GraphQL
In this example, we transformed the data by mapping, sorting, and filtering it to get the desired result. The ability to apply transforms to query results is powerful, but it can break the principle of GraphQL. While it's suitable for analyzing data within the console, it's not recommended for application development. If you need computed fields, they should be built into your schema. Lastly, GraphQL's accessibility for non-developers is a challenge due to the complexity of the language.
Because we've decided that a thumbs up is a vote. And now that I have my array of data basically that I want, I want to transform this into something that's a little bit easier to scan. So I don't really want this kind of node object as a middle between my array and the title and votes and my issues. So for my edges array I'm going to map to just my node. And what I really want to do is sort this array in a descending order so that I can see the top voted issue. So I'm going to sort by votes for my array. And it looks like sort by is ascending by default which makes sense. So I'm going to reverse this array. And it looks like I have some issues here that don't have any votes at all which also makes sense. So I'm going to filter only for issues of votes. And now we've got the answer to our question. The top voted issue here is an Apollo server fastify playground issue. If we wanted to look into this more, I could get the URL and follow it. But we have to get back to the presentation.
A few things I want to point out from this example. The ability to aggregate group and generally apply transforms to query results is a really powerful thing in my opinion. This is what lets you ask questions to your data and get answers within the context of your tool. Without having to take your data out of that tool and move it to another tool. This is also what enables people to kind of use your schema beyond what you may have currently imagined and built into your schema through computed fields. On the flip side, GraphQL LODASH is not particularly intuitive. It's taken me several hours of fiddling to feel comfortable with it. And even now I'm not an expert. And the even bigger, more important thing to point out about this example is that transforming our data like this is actually breaking a principle of GraphQL, which says that your responses need to be congruent to the queries that you sent. So for use cases like this where you're writing queries for analyzing data within the console, I don't think that's a big deal because you're not taking that experience outside of the kind of single window. But if we were to try and actually take this query and put it into our code, that's where we're going to get into trouble and things are going to get iffy because there are other developer tools like code generation are going to rely on us staying spec compliant. So this is something that I love, but I would not recommend using it for kind of application development. If you need competed fields there, you should build them into your schema.
Finally, our third question, how do I see things the way that I want to see them? As I mentioned earlier, what motivates me most about GraphQL is this opportunity that I see to make data more accessible. And I think the last big issue we have to cover is whether or not GraphQL itself is going to be accessible enough for use by people who aren't developers. It's really hard to stare at a blank editor and get started with a query when you don't even know what the language is.
7. Working with GraphQL Responses
Today, I want to focus on sharing some thoughts on working with the responses from GraphQL queries. JSON is a beautiful format for developers and APIs, but it's not commonly used outside of the developer world. I want to show you Table mode, which allows you to interact with JSON data in a more user-friendly way. By making our tools more accessible, we enable people to go beyond what can be done with code. GraphQL can have a significant impact on organizations beyond just helping developers be more productive.
So I could give a whole separate talk on the query building side of this and the data discovery aspect of data browsing. And here, I have a picture of GraphiQL's Explorer on the left and Studio's Explorer on the right, both of which have thought quite a bit about how to actually help you write queries without needing to know exactly what to type. But unfortunately, today we don't have enough time to go into the query building side of things. So instead, I want to focus on sharing some thoughts with you on working with the responses from our queries.
And when we talk about API responses, at least GraphQL ones, we're talking about working with JSON data. JSON is a beautiful format for developers and APIs because you can express complex objects, it's human readable, and it's basically universally accepted and usable within our code. But the problem with JSON is that it's a very developer centric thing. It's not very common to work with outside of the developer world. And usually, when we're talking about data sets, we're talking about tables and CSVs and loading things into Excel. And to turn JSON into tables, it usually takes code to do that because it's not necessarily a given transformation. And if you're not comfortable writing code, then you're kind of stuck.
So, my last demo here is pretty quick, but I just want to show you all that there's more to GraphQL response browsing than just scrolling long arrays of JSON data. You know, and I want to encourage you to always be expecting more from your tools. So, if we go back to our GitHub example, I want to just show you quickly Table mode, which is an idea that you could kind of generate a table as best you can from JSON results and give people some tools to interact with their data in a way that's not JSON. So, with Table mode here, we can sort our columns by title alphabetically. We can also sort our votes, which would have helped us not even have to add this sort by directive. I can also download this data to a CSV if I wanted to and kind of move it to another tool. So, by building accessibility into our tools like this, we're enabling people to go beyond, you know, just what you can do with code. So, I see a lot of pros to making our tools more accessible in this way. Table mode is much easier to scan data from, even for developer use cases. And something like this is just naturally going to feel more familiar and welcoming to everyone else. And I don't see a ton of downsides to building things into our tools like this, other than kind of the eventuality that we don't want to overload our tools with too many things and make them too busy for any one use case. But beyond even working with data and your editor, I've seen people build integrations between GraphQL and other tools that are already familiar in their work flows, like Tableau. And I find that kind of those kinds of integrations and that kind of thinking really inspiring. So, as we wrap up, I want to leave you all with this thought. GraphQL can be impactful to your organization way beyond helping your developers be more productive. I've talked with product managers who use a Graph to put queries into their product specs to kickstart projects, and designers who like to browse the Graph to figure out what data they can even add to mock ups. I've taught our customer success team how to use the Graph to run admin mutations that don't yet exist in our admin app. And I aspire to one day maybe even teach your sales team how to use the Graph to look up information on behalf of their accounts. If our tools become accessible enough and our schemas are well designed and well built, maybe we won't even need a lot of our integrations and admin apps in the future because everybody could just use the Graph.
8. Encouragement to Design Flexible Schema
I encourage you to design your schema with flexibility, expect more from your tools, and share your Graph with your organization. Try out the Explorer tool in Apollo Studio.
So I encourage you all to think about how to design your schema with flexibility so it can be used beyond the ways you've currently imagined. I encourage you to continue to always expect more from your tools, especially when it comes to making them more accessible to different groups of people. And most of all, I encourage you to share your Graph with your entire organization and to do the work to make your Graph work for everybody. If you're interested in trying out what I've been showing, that's the tool that my team builds called the Explorer in Apollo Studio, and it's free to use. Thank you all so much for tuning in and listening.
Q&A on graphql-loadash and Explorer
If you have any questions, please don't hesitate to ping me on the conference Discord or reach out on Twitter or ask in the Q&A. The add underscore functions come with a package called graphql-loadash, which is not packaged directly with Apollo Server. Most of my usage of graphql-loadash has been on the front end, in the explorer. There are no noticeable performance issues with graphql-loadash. I have a dream of having charts in the Explorer, but it's not available currently.
If you have any questions, please don't hesitate to ping me on the conference Discord or reach out on Twitter or ask in the Q&A. My DMs are open and I look forward to seeing you on the internet.
Hey, so great talk and without further ado, I think we should jump right into the audience questions and the first question is from Rada. Oh sorry, I'm looking at the wrong questions. It's from Nikin. Is there any extra implementation needed to use the add underscore functions or do they come with Apollo Server?
Yeah, that's a great question. So, that is a set of directives that come with a package called graphql-loadash and graphql-loadash is not packaged directly with Apollo Server but you can absolutely use these two things together. Graphql-loadash is just its own npm package. But the tool that I was showing you all to write those queries is called the, we call it the explorer, it's in Apollo Studio and if you query through the explorer, the explorer actually extends the schema that you're using with those directives automatically. So, you can do kind of front-end queries with graphql-loadash using the explorer but if you're using another query tool you would need to add that to your server.
Okay. Thanks for the question, Deegan. Next question is from TheWorstDef, that's a great nickname. Are there any noticeable performance issues with graphql-loadash? That's a good question. So, most of my usage of graphql-loadash has actually been on the front end, in the explorer. And so the ways in which it has been slow have been when you're querying a large amount of data that you then transform on the front end. And the slowness there, I would not attribute to graphql-loadash. It's mostly just large amounts of data coming over the wire. But I imagine if you put graphql-loadash on your server, it would be much better, much different and better performance-wise. But the challenges that you'll have there, then if you use graphql-loadash and you provide that to your clients, then you're going to be breaking the spec in other ways. So you want to be specific about where you use it and why you're choosing to use it.
Okay, and then the worst dev, who is now hopefully the best dev, has a follow-up question also. What other types of visualizations make sense? Would something like charts ever be in Explorer? I have a dream that charts would one day be in the Explorer, but they're not right now. But you can imagine all sorts of things. Like if you get a array of data back and it's all numbers, why wouldn't we give you a chart? Why wouldn't we have ways that you could transform your results to see them more visually? So yes, having charts in the Explorer is like a pipe dream of mine, though to actually bring that to fruition and make it practical for everyone and kind of a generic use case I think could be a little bit of a challenging problem. But not one that's, you know, not doable. You gotta put the bar really high right for yourself and make it work. But maybe the worst death can just help you out.
That's true. I also, I was thinking while watching the talk, is this way of working something you cooked up in your own brain or is it something that you're doing at Apollo or maybe at one of your previous employers? Yeah.
Making GraphQL Accessible and Its Potential
A lot of the inspiration for making GraphQL more accessible comes from my experience of consuming GraphQL APIs for the last four years. People wanted their tools to be more accessible, especially when getting started. We have seen people integrate GraphQL into tools like Tableau. To use GraphQL as a generic data querying tool, APIs need to be designed in a way that allows for it. The GraphQL specification is consistent, strongly typed, and widely adopted in the developer world.
Yeah. That's a great question. A lot of the inspiration for the ideas in this talk around making GraphQL more accessible to folks who are just getting started and to folks from the query writing perspective comes from my own experience of consuming GraphQL APIs for the last four years to build apps. I've always been a front-end developer. So I've always come from the perspective of writing queries, not building schema and my experience building schema is much newer than my experience of learning GraphQL through the query world.
And what we've done at Apollo, that's related to this talk is we've done some user research on how people get started with GraphQL and write queries in general as they progress through their GraphQL journey. And when we did that user research about a year ago, we learned that people wanted their tools to be more accessible, especially when they were getting started because GraphQL itself is like a language. It's like code. You have to learn how to write it. And there are a lot of people who can benefit from seeing data if they could only write queries, but they get intimidated by looking at kind of a blank editor that tells them to write some code to do something.
A lot of what we've done has been informed by some research that we did do at Apollo to make GraphQL more accessible, to make the data in your API is a little bit more discoverable, but yeah, a lot of it is kind of cooked up in my own brain. I will, I'll say yes to that. Well, you could have just said yes, then. Sorry. We're looking for the long answer. We want to have your opinion and that's why we have you here speaking. So I'm just kidding with you.
We have a question from Hoang, thanks for the talk, that was great. Do you think GraphQL could actually be an accessible solution or would you prefer to use another tool to access the data? So, like using GraphQL over another tool like Tableau or something to access data, I will interpret it that way. I do think GraphQL can become the way that you access data generically. That's kind of the picture I was trying to paint with the talk. And I have seen people use GraphQL and integrate it into a tool like Tableau. I found that to be really inspirational and interesting. I think to get to that point, we need to design APIs in a way where they can be used that way. I don't think GraphQL out of the box can be used as a generic data querying tool. I think you have to use GraphQL to build your API into a generic tool because of those bullet points I was talking about with performance and schema design being flexibly used and stuff. Being key to using GraphQL to make your data more generally accessible. But I do think it can be used. I think the fact that the GraphQL specification is consistent and strongly typed and already so adopted in a developer world, like, it all leads to the science that it can be used that way. Okay.
Implementing GraphQL in Companies
The talk presents a vision for utilizing the GraphQL API to its full potential. The implementation depends on individual company practices, but the key is to design the schema for flexibility and direct translation to database queries, resulting in improved performance and wider adoption within the organization.
Then we have one more question. Oh, how do you see us getting from where we are now to working in your proposed way and how would you implement this in a company? Yeah. Well, so my talk is kind of trying to paint a vision for what the GraphQL API could be used to do. But it doesn't necessarily prescribe how you get there because I think that's always going to kind of depend on how your company does things and what's right for your company. But I think what I want you all to take away is that you should think about your API potentially being used this way so that you can design your schema in a way where it can be used more flexibly. It can be more directly translated to database queries and be more performant. And if we get there more and more and more and then more and more people in your companies are going to be able to use it and then there will be like a natural draw. So, I think the way to get there, it's not like a prescriptive formula. It's a mental model and a way of thinking that you have to adopt and kind of bring to your companies.
Comments