Video Summary and Transcription
This talk discusses the life of a GraphQL developer at Yelp, covering tooling, processes, and the scale of GraphQL usage. It emphasizes the importance of making good schema choices and using GraphQL Faker for quick iteration. The talk also highlights the challenges of using data loaders at Yelp and the solutions implemented, such as code generation and precommits. It mentions the significance of schema review, documentation, and GraphQL adoption. Lastly, it mentions the value of obsessive documentation and the use of vPress for generating markdown and an in-house UI for query reference.
1. Life as a GraphQL Developer at Yelp
This talk is about the life of a GraphQL developer at Yelp. Mark, a lead on the client data API team, will take you through the tooling and processes they've built. They'll share the scale of GraphQL usage at Yelp, with over 500 types in the schema and 10,000 QPS. The talk will also cover the importance of making good schema choices.
Hi, everyone. This talk is going to take you through what life is like as a GraphQL developer at Yelp. Specifically, I want to take you through some of the tooling and processes we've built to make our lives easier and to ship stuff safer and quicker.
So, my name is Mark, and I'm a lead on the client data API team. So, we maintain the GraphQL infrastructure used by the developers at Yelp to build our web and mobile apps. And if you want to get in touch with me about anything I'm about to say, then I'm on Twitter, it's Mark underscore Lara.
If you're not familiar with Yelp, it's a place to connect with great local businesses and find where it's good to eat or find plumbers or movers that you can read their reviews, that kind of thing. And here's a typical Yelp business page. Look at all this lovely data that we've pulled from the database. Looks useful, right? Here's what this page looks like without any data. Which is not very useful.
So, I think we can all agree that data is good. And our job is to pipe it into this page somehow. So, what I want to do is take you on a journey, and we'll go through the experience together of making a pull request and adding schema for a new feature to our GraphQL service. Along the way, I'll share the processes and tooling we've added for folks, so you can get a taste for what our developer experience is like.
Now, I want to set the scene and just share the scale of things and why we're invested in spending a lot of time on this. So, GraphQL is the modern standard for doing data fetching at Yelp. It's used by hundreds of developers across the org. There are over 500 types in the schema. 500 active persistent queries. So, that's queries that are actually being used in production within the last two weeks. And as a whole, the GraphQL service gets around 10,000 QPS. So, a lot of Yelp depends on this thing running smoothly and efficiently. Let's see what that looks like.
Okay. So, let's say we're making a brand-new product at Yelp. And there may be some existing schema that you could use. But mostly, we're gonna have to implement some new backend logic for this. So, the first thing we're gonna need to figure out is, well, what kind of query do you want to send? What kind of schema do we want? And obviously, we don't just want to add the first thing that comes to mind and commit that. Bad schema choices can be costly and hard to remove once they're used in prod.
2. Writing Dream Queries and Using GraphQL Faker
We encourage developers to write a dream query, which is the query they wish they could write if the schema was available. We have schema reviewers who help review and point to existing types. We use GraphQL Faker, an open-source tool, to iterate and test new schema before implementing resolvers.
So, we want to keep things fairly flexible up until the point that we actually commit the code. Now, maybe you're new to the company or new to GraphQL, and you haven't quite yet got a feel for what's in the rest of the schema or what idiomatic schema looks like in general. And so, when writing out the query for your new page, we encourage developers to just write out a thing that looks like a GraphQL query and does the thing that you want. And we call this the dream query. And that's the query that you wish you could write if the schema was magically available to power it.
And from there, we have a cross or group of schema reviewers who can help review and point to existing types and such. And we found this to be a good way of communicating things and onboarding folks in a less intimidating manner. I've linked to our blog post, which describes this process in more detail and why we like it. Go check it out at the link in the slide.
So, yeah, let's go ahead with our new feature and we've written a dream query that we want to use on our new webpage. And once we've got something that we're reasonably happy with and looks good on paper, next, we can start to hammer out the schema that we'd write to power that query. And I guess really this can be done in parallel with a dream query. We're a big fan of a tool, an open-source tool called GraphQL Faker. It's a bit like GraphQL or GraphQL Playground and it spins up an IDE for you to make queries in. But you also get an editor to add new schema and new types on the fly.
3. Using GraphQL Faker for Quick Iteration
You can query and extend your real GraphQL schema with autogenerated data using GraphQL Faker. It's globally available on our developer machines and can be easily plugged into your React app for quick iteration and parallelizing work between backend and front-end developers.
So, you can query them and get autogenerated lorem ipsum data back. And this can also plug into and extend an existing schema. So, we use this to extend our real GraphQL schema and you can make queries that combine real schema and your new work in progress schema. This is a great way to have a quick iteration loop set up and get a feel for the new schema you want to add before spending time implementing the resolvers. We've made this globally available on our developer machines. Just type in one command and start up a GraphQL fake instance, pre-configured to talk to one of our real dev instances of GraphQL, and it's all magically available. You can go further with this and plug this into the React app you're developing and try it out with some real components you're developing in the UI, in tandem, so you don't have to wait for the backend. And, yes, this is great for parallelising work between backend and front-end devs. GraphQL fake is great. I highly recommend using it in your development workflows and providing some light tooling around it so it's easy to extend your real schema and use it in your React apps.
4. Implementing Resolvers and Using Data Loaders
We're happy with our new schema and want to implement the resolvers. Our architecture consists of different services owned by different teams, exposed through a GraphQL gateway service. To avoid unnecessary network requests and utilize bulk endpoints, we use data loaders. Data loaders are batching and caching functions that connect to external resources when direct access is unavailable.
Okay. So we're very happy with our new schema, and we want to invest some time in making it work for realsies and implementing the resolvers.
Quick architecture snapshot. This is a rough overview of what our architecture looks like. Hopefully, it looks kind of familiar. We just have a bunch of different services owned by different teams that expose internal REST APIs for various products and use cases. We talk to all of these services through GraphQL gateway service, which is our slightly monolithic GraphQL service written in Node with Apollo Server. And resolvers need to fetch data and talk to these endpoints.
There's some common problems here. How do we avoid blowing up the backend and making a bunch of extra wasted unnecessary network requests? How do you properly utilize bulk endpoints? The main answer is data loaders. Apollo also has a pattern called data sources. I'm sure most of you probably already know about data loaders. It exists in a few languages but if not, the TLDR here is that they're a magic batching and caching function over some underlying resource. If you don't have direct access to where the data lives in your GraphQL server and you need to connect to some external resource then you probably should be using something like this.
5. Challenges with Data Loaders at Yelp
We use data loaders to handle the hundreds of internal HTTP endpoints distributed across multiple services at Yelp. Writing and maintaining individual data loaders for each endpoint would be time-consuming and prone to inconsistencies. It's challenging to determine which endpoint a data loader should go to and how to handle error handling and logging consistently. This approach would result in messy and difficult-to-manage boilerplate code.
Okay. So we use data loaders. But as with many things in software and open source tooling, you've got some basic examples in the docs and some tutorials online but it gets tricky when we want to scale this thing up and plug it into the rest of Yelp. The issue is that we have hundreds of these internal HTTP endpoints that are distributed across hundreds of services and the vanilla approach might be to write the hundreds of data loaders by hand to talk to all these hundreds of endpoints and this would be pretty gross for a bunch of reasons. We have multiple endpoints that return user information. For example, there's many different representations of user, so if someone made a user data loader, what endpoint does it go to? Who decides? How do we stop people making multiple user data loaders? How do we get typing on these data loaders? How do we ensure that we get the error handling and the logging correct each time? It just takes time to write. It's another thing to maintain, et cetera. So you can imagine that all of this inconsistent boilerplate might get pretty yucky to deal with.
6. Code Generation, Precommits, and Schema Checks
We came up with a code gen layer called data loader code gen that generates data loaders for every specified endpoint. This eliminates the need to think about which data loader to use for each endpoint and saves time. We also recommend using generated types for resolvers to type check the JavaScript implementation. Precommits are used as an early warning system to catch issues before committing. We use tools like GraphQL Schema Linter to check naming conventions and style rules. GitHub bots, such as schema check bot, detect breaking changes in the schema and provide warnings. They also check for queries that use fields being removed to prevent breaking production. If no queries are found, the pull request passes the check.
The solution that we came up with was a code gen layer imaginatively named data loader code gen. We already have Swagger UI set up to document all of the internal endpoints across Yelp. Folks are already pretty used to thinking in terms of these Swagger endpoints and their interfaces. They're very widely used and very well documented, just a good source of truth. We want people to think in terms of those endpoints. And this is what data loader code gen does.
We have a config file and it generates data loaders for every endpoint that we specify. So, you think in terms of the endpoints, not data loaders, which might have subtle differences. And we have this strict one to one mapping. There's no need to think about which data loader do I need for this endpoint. That will just always be the one true one. And it will have kind of a matching interface. And this is great. So, the data loader layer is basically now transparent. It's one less thing to worry about. It saves a bunch of time writing them and maintaining them and has allowed us to scale up to having hundreds of data loaders all with the exact same error handling and logging, et cetera. Even if you're not talking to rest endpoints behind the scenes, some of these concepts may still apply to your situation. I think the takeaway here is that cogen and removing human maintained things where possible is good.
So, anyway, we open source this, and if you're interested in learning more about it, you can check it out on GitHub with the link in the slide. The other thing that I highly recommend is generated types for resolvers. We use a good library from the Guild, GraphQL cogenerator, which does a bunch of other things as well, but we use it to generate types from our schema file, and we use that to type check the JavaScript implementation. That's just a nice, easy, quick win.
Okay, so now we're really ready to commit to our new schema by committing the code, and we use Precommits as an early warning system for things that would break the build, so Precommit is a tool that performs checks just after you type git commit in your terminal, but before it actually gets committed, and this is mostly for things like linters, and type checkers, and stuff that runs in CI, but this runs in your terminal immediately. So you don't have to wait 20 minutes for CI to fail and then realize, oh, I had some whitespace bug. Aside from linting the JavaScript, we also lint the schema, and to do this, we use a tool called GraphQL Schema Linter, and this lets us check against a basic set of naming conventions and style rules, and, yeah, it's a really good tool, and we run it in precommit, so this is what happens if you try and sneak through some schema that we know for sure breaks the rules.
Assuming everything went well, now it's time to send your pull request, and the first thing we do is have a couple of GitHub bots to check more of your schema. The first one is schema check bot, and this uses a package called GraphQL inspector to detect any breaking changes to the schema. If we find any breaking changes, things like removing a field or renaming a field, then we show a big warning sign like this, and we go a step beyond this, and so we have this allow list of queries, persisted queries, and we know when queries are being used, and we can go through all of the queries that are being used in the last two weeks and say, hey, it looks like you're trying to remove this field, but we have, like, these 20 queries that are using this field in prod at the moment, so it's going to break these queries, so maybe don't push this, so that's a good warning system and stops us from breaking Yelp, like the PR will actually go red and you can't push it. And on the flip side, if you really do want to remove a field after a proper deprecation process or cleaning up some experiment stuff, then we can say the logic says, okay, well, you're removing a field, but we didn't actually find any documents that were being used in prod that uses this field, so it's not going to break anything, you're good to go, and then the GitHub check just passes for you.
7. Schema Review and Documentation
If you want to remove a field, but it's not being used in production, the GitHub check will pass. Schema suggestions bots provide feedback on potential issues that are hard to strictly lint against. Approval from the schema review group is the final step to ensure consistency. Strong documentation is essential for onboarding and maintaining consistency. We open sourced some pages related to schema design on GitHub. We welcome feedback and are hiring.
And on the flip side, if you really do want to remove a field after a proper deprecation process or cleaning up some experiment stuff, then we can say the logic says, okay, well, you're removing a field, but we didn't actually find any documents that were being used in prod that uses this field, so it's not going to break anything, you're good to go, and then the GitHub check just passes for you.
Another bot that UA on our team is currently working on is schema suggestions bots, and so this is for things that we think might apply to you but they're inherently hard to strictly lint against and so we don't want to fail the build on these. And this could be stuff like, hey, you're adding a new type called business hours, but it looks like there's already a type called business opening hours, so maybe do you want to use that instead, and we just have a bunch of fancy regex rules to sniff for stuff that may be breaking our schema design guidelines. So if we think that we found something, then you'll see an inline GitHub comment and you can choose to do something about it or ignore it as a false positive.
Once everything looks good, the final, final step is to get approval from someone in our schema review group, and these are folks that are familiar with our schema design guidelines, and this step exists as a final extra safety check to make sure that the schema looks good. Initially, this was just a small handful of devs to make sure that we had consistency in the first few types and seed that knowledge across the company, but now it's open to everyone and it's a learning module that we ask people to go through and then once they've completed it, they can join the group and approve new schema. We try and aim for two reviewers per team, but it is uneven in places and so we have folks handle reviews from different teams where necessary, and that's a good thing as well, because it means there's no kind of like team-specific opinions about schema and everything kind of looks a bit more consistent across the company.
Okay, one final thing I wanted to mention was the value of strong documentation. In the first year that we were rolling out GraphQL, I'd say it was an absolutely even split of time spent on coding versus the internal documentation to bootstrap knowledge of this new thing and serve as onboarding material. For a platform that many developers are going to use over time, the docs are just as much of a product that is a focus and honestly, just the documentation and the thought and process that goes into it could be a talk in itself, but that's not the most gripping subject matter, so that's a subject for another talk. The only big takeaways I'll give here are that most of the time, product people just want to focus on building products, and for infrastructure, just want to see things to copy and paste and not have different first principles. The whole thing you see in open source sometimes of we're unopinionated, use x however you want is great for open source where the authors don't know your specific setup, but internally at the company, we do know your setup, and it's our job as an Infra team to have those opinions and create abstractions and set guidelines as much as possible to save everyone else at the company time having to do that and doing it a million different ways. So having all that listed in the docs of just do this, we found that to work for us. I'm kind of excited to say that just this week, we open sourced some of the pages related to schema design. So you can go check it out on GitHub with the link on the slide to see how we do schema design at Yelp. Finally, if you think that we missed anything that we should be doing, then great. We'd love to hear from you. We are hiring. So go check out the careers page. And yeah, that's all I have. Thanks, everyone.
GraphQL Adoption and Gateway Service
It's interesting to see the level of GraphQL adoption and the investments being made. The GraphQL gateway service is becoming a monolith, causing technical and social problems. Splitting it up is the next big thing our team will work on. We're also exploring GraphQL mesh and federation. The integration of GraphQL faker allows for fast iteration and quick page development without waiting for back-end developers. This focus on a quick iteration loop is a key aspect of our approach.
So we have the question here. You asked, how mature is graph Ql adoption at your organization? What do you think about these answers? Is this what you were expecting? Yeah, it's very interesting to see kind of who comes to these conferences and the type of information that applies to you at various different stages of adoption. Obviously, the level of investment that you might want to put into your infrastructure depends on how many people you expect are going to use it. It looks like it's used in many places, it's a popular choice. Hopefully, there's a lot of people thinking about a lot of these investments.
Yeah, for sure. It looks like a lot of people are looking into it, but it's already being used. That's great. Let's move over to some Q&A from our audience now. We have a question here that is, GraphQL gateway service sounds like a monolith. Are there any plans to break that apart? Yeah. So, as I wrote in the architecture diagram, all of the data loaders, all of the backend logic is in this one GraphQL gateway service. And as much as it is intended to be kind of like a thin-ish proxy over all of those different services and data loaders, and endpoints, it's, you know, it does get to be a bit of a monolith, because there is inevitably some logic in that service, and we're talking, like, hundreds of these resolvers, so that creates technical problems, and social problems, technical problems that, you know, usual monolith woes, like it becomes really hard to live in that monolith because you have, like, hundreds of tests that now could flake whenever you're pushing your little change, and, you know, and there's, like, thrashes dealing with merge conflicts, and, you know, dealing with other teams' conventions and stuff. And then there's social problems, because our team owns that service, and we own the infrastructure behind it, and so we have, like, these best practices which might, you know, we think might apply to everyone, but then maybe some team has, like, well, you know, we want to do things kind of, like, a specific way. So, and there's some kind of, like, performance asterisks which I won't dive into, but it really seems like splitting that up in some way will probably be the next thing, the next big thing that our team works on, and there's lots of answers to that sort of federation, graphQL mesh, so we're definitely hard at work doing R&D on that. Nice, so there are plans, coming soon. Yes. Awesome.
So, yeah, so, you know, in the talk, you brought out this GraphQL faker. That's something that I have never used before, and the integration of that, and that's something that I'll need to look into, because that looks very interesting. So, you know, managing GraphQL, and you have, obviously, you may not have data to start with. So, you have this faker, right? And that is one great part. Do you want to expand upon any of that? Yeah. A lot of what we see are teams who have entire pages that go to production that might get thrown away. Because we just have kind of like business teams who they need like a really fast iteration loop. And having something that you can spin up a page as a web developer, you don't have to wait for like a back-end developer necessarily. You just want a thing that works. You need to play around with that query. And the schema as well. So, that quick iteration loop is a thing that we focus on a lot.
Tools, Drinks, and Documentation
Always on the lookout for good tools like GraphQL or Faker. I love it when I come across new tools. What is your drink of choice when working on such high-scale projects? I'm a teaman myself, so I like a good green tea. So, the attendees, just continue to ask any questions that you may have in the Andromeda Q&A channel. Is there a list of tools that Mark presented? I can share the slides on my Twitter and I'll share the list of tools. A lot of like everything mentioned in the slides, huge effort across the whole team. If I had more time, I would have spent a lot more time on the documentation. The value in thinking a lot about that, and not just kind of like throwing information on a page, but kind of a considered flow of information, what's relevant, how does it look.
The more juice we can get out of that. The better, more productive everyone is. So, always on the lookout for good tools like GraphQL or Faker. And if anyone else knows anything, any other tools out there like that, please let me know. Yeah, I love it when I come across new tools. I have so many tools. The tools, they help us no matter what you're looking into. There's probably a tool for it.
So, we have another question here. What is your drink of choice when working on such high-scale projects? Oh, that is a good question. I like that question. Depends on how the project is going, doesn't it? Yeah, I suppose. What time I'm working on it. I'm a teaman myself, so I like a good green tea. Hot or cold? So, a good oolong, maybe? Yeah. Nice, nice. All right.
So, the attendees, just continue to ask any questions that you may have in the Andromeda Q&A channel. We have another question here. Someone has the list of tools, or does someone have a list of tools that Mark presented? Is there a list somewhere? I can share the slides on my Twitter and I'll share the list of tools. Yeah, I can do that. Okay, okay, nice. And finally, I just want to say that a lot of like everything mentioned in the slides, huge effort across the whole team. So I know some of them watching, so I'm giving them a shout out. Thank you to everyone. Definitely, yeah, I'm not seeing any other questions. Is there anything that you know, in your talk that you wanted to expand upon or any future plans, anything else that you want to, to add on for the audience? Um, you know, if I if I had more time, I would have spent a lot more time on the documentation. I kind of alluded to it in the talk. And that it really was like a big, big focus. I can't sort of emphasize enough the value in thinking a lot about that, and not just kind of like throwing information on a page, but kind of a considered flow of information, what's relevant, how does it look.
Documentation and Query Reference
We value obsessive documentation and have found it to be highly beneficial. We use vPress as a markdown generator and have a hand-curated schema design guidelines page. Additionally, we have an in-house UI that displays submitted queries and mutations, serving as a reference for prior art. Thank you for attending the talk and feel free to ask more questions in the speaker room.
Maybe borderline obsessive, but I think that's something that we got a lot of value out of. And this whole conference is dedicated to documentation and websites and stuff. So for anyone rolling this out to lots and lots of people, yeah, plus, plus.
Great. And we have another question here. Do you use any special tools to organize your documentation? Organization is always a hard for some of us.
So we use vPress as the markdown generator site thing. Other than that, it's just sort of hand curated. We added a little theme, I guess, to make it red for Yelp. But yeah, other than that, the schema design guidelines page, which is now open source, so you can see that is just kind of hand curated and done in a way such that it hopefully flows. And we also have I was going to say, something which I didn't include in the slides is apart from the graphical schema docs tab, we also kind of have like an in house UI where you can see all of the queries and mutations that have been submitted to the allow list. And so that's kind of something that allows you to see all of the different queries at Yelp. And that kind of helps serves a reference for what is some prior art and you can filter by team and see the evolution of one of those queries, etc.
Well, again, we want to thank you so much for this great talk. If any of the attendees have any more questions, for Mark go to spatial chat into the speaker room. You can ask more questions there. Mark, again, thank you so much. Thank you.
Comments