In Memory of Travails

Rate this content
Bookmark

Two aspects of resolvers have an outsized influence on their performance: the size of the execution context, and the way we compute their value. In the Node.js implementation of graphql, promises wrapping primitive values are especially disruptive, since they add a large computing overhead. The context size creates a memory usage baseline that can rise very quickly with even small additions to the context, when there are many concurrent contexts. The execution can create temporary objects, increasing memory usage. Often-run resolvers, such as those responsible for filling out large arrays of objects, can become performance bottlenecks.

At Auction.com, our search results page (SRP) requests up to 500 items of roughly 80 fields each. The query resolving these fields was suffering a high latency. We shall examine the tools to instrument our code and identify memory usage and CPU utilization bottlenecks.

Our realtime elements (e.g. realtime updates to the status of currently viewed properties) are implemented using a translation of  kafka messages to graphql updates. We shall present the tools and procedures to reduce memory usage and CPU usage when fanning out such messages.

This talk has been presented at JSNation US 2024, check out the latest edition of this JavaScript Conference.

FAQ

Auction.com introduced auto-scaling to handle memory spikes and nightly restarts to address a slow memory leak, ensuring the system remained stable.

Upgrading packages like GraphQL Redis subscriptions allowed auction.com to benefit from optimizations and improved performance, though the improvements were minor.

They used Google Chrome DevTools heap snapshots to identify and remove unnecessary memory allocations, such as location objects in GraphQL ASTs.

One key lesson was to carefully manage what is attached to the GraphQL context, ensuring only necessary data is included to optimize memory usage.

They modernized their code by leveraging native JavaScript features for Node.js 20, reducing reliance on polyfills and improving execution efficiency.

Memoizing snake case conversions led to a 3X performance increase, smoothing out message processing and improving overall system responsiveness.

Auction.com experienced frequent restarts in their Kubernetes cluster due to a 'reached heap limit allocation failed' error, indicating that Node.js was out of memory.

Auction.com tested memory consumption by running Kafka, Graph, and a client locally, connecting 4000 WebSockets, and sending messages through the setup to observe performance.

They replaced Lodash calls with native JavaScript iterators and used memoization to avoid redundant conversions, significantly improving performance.

They used JavaScript proxies to create lazy-loaded backends, reducing memory consumption by only creating backend objects when needed.

Gabriel Schulhof
Gabriel Schulhof
28 min
21 Nov, 2024

Comments

Sign in or register to post your comment.
Video Summary and Transcription
Hello, my name is Gabriel. I work at auction.com and I'm going to be talking about how we improved the performance of GraphQL resolvers for our GraphQL service. We had a problem with our subscription deployment, where we were experiencing a high number of restarts due to memory allocation failures. This prompted us to investigate and optimize the memory consumption of our resolvers. To assess the performance, we set up a local environment with Kafka, Graph, and a client that connected 4000 WebSockets to Graph. After running the test, we found that we were only able to process and distribute 16 messages to our clients. Yay. The memory consumption graph showed peaks and valleys as messages were delivered. Three distinct phases were observed: idle, Sockets connected with no messages, and messages being processed. We decided to optimize the context, which contains request-specific information and backend details. Since subscriptions primarily involve sending Kafka messages, we realized that the message itself often has all the necessary information. Therefore, we only create backends when a call is made to them. We optimized our backend creation process using the proxy object, which allows us to create backends lazily only when they are accessed. This resulted in less memory consumption without changing the code or the schema. The less memory consumption is evident in the second phase of the recording, where the plateau formed from having multiple contexts is significantly lower. Most of the savings were achieved by reducing temporary objects and using native iterators instead of Lodash calls for converting key names to Snakecase. All of a sudden, the performance increased by 18%, resulting in increased memory consumption. Upgrading to newer versions of GraphQL Redis subscriptions did not have a significant impact on memory usage. However, optimizing the conversion of key names to snake case by using memoization improved computational efficiency. Our performance significantly improved after implementing snake case. However, memory consumption remained high. To address the memory leak, we introduced auto-scaling and restarted the service every night. Additionally, we optimized the code generation process to improve memory consumption. We explored using heap snapshots in Google Dev tools to analyze and reduce memory consumption. By identifying unnecessary objects and removing them, we were able to free up memory and improve performance. We patched the location object to improve performance and reduce memory consumption. We also optimized data loaders to avoid N+1 queries and improve efficiency.
Available in Español: En Memoria de las Dificultades

1. Introduction

Short description:

Hello, my name is Gabriel. I work at auction.com and I'm going to be talking about how we improved the performance of GraphQL resolvers for our GraphQL service. We had a problem with our subscription deployment, where we were experiencing a high number of restarts due to memory allocation failures. This prompted us to investigate and optimize the memory consumption of our resolvers. To assess the performance, we set up a local environment with Kafka, Graph, and a client that connected 4000 WebSockets to Graph. After running the test, we found that we were only able to process and distribute 16 messages to our clients.

Hello, my name is Gabriel. I work at auction.com and I'm going to be talking about how we improved the performance of GraphQL resolvers for our GraphQL service.

So the problem that we found was that in the case of our subscription deployment, we were getting a lot of restarts. So this is the number of restarts and as you can see, we're on the order of like 200, 300 restarts in our Kubernetes cluster and all the restarts were happening because of this magic error message that said reached heap limit allocation failed. And this is basically the JavaScript engine telling Node.js that it's out of memory. Node.js dies and game over. You have to restart the pod. And so we started looking into how we can improve the memory consumption of our resolvers.

So then let's see a little bit of background. How do we GraphQL at auction.com, right? So we use for our subscriptions, we use Kafka topics and GraphQL subscription, Redis for PubSub. So in comes Kafka message, it goes into Redis, it comes out of Redis where it needs to. And then it goes off to the clients over WebSockets.

So to do the test for memory consumption and performance in general, what I did was I ran Kafka locally, I ran Graph locally and I ran a client locally that would produce like large number of WebSockets and connect them to Graph. So I connected 4000 WebSockets and I used KCAT to basically send messages to the local Kafka broker, which sent them to Graph, which sent them to Redis, which sent them to those 4000 Sockets. So that was the setup. And this was the initial result in, what is it? Let's see 250 seconds, that's four minutes and 10 seconds, I guess. Yeah, something like that. We were able to process and distribute to our clients a whopping 16 messages.

2. Optimizing the Context

Short description:

Yay. The memory consumption graph showed peaks and valleys as messages were delivered. Three distinct phases were observed: idle, Sockets connected with no messages, and messages being processed. We decided to optimize the context, which contains request-specific information and backend details. Since subscriptions primarily involve sending Kafka messages, we realized that the message itself often has all the necessary information. Therefore, we only create backends when a call is made to them.

Yay. So as you can imagine, that's not exactly stellar performance and you can see the memory consumption here. It has all these like peaks and valleys, peaks and valleys. So you can see that for any given message, it would allocate a whole bunch of memory and then finally deliver the message and then do that 15 more times. And you can see down the green little peaks at the bottom where the X axis is, that's when messages were actually delivered.

A little bit more about the experimental setup. So you can see that there are like three distinct phases to the memory consumption graph. One of them is when the whole process is just idle. That's just, you know, so you can have like a baseline of memory consumption. And then the second one is where we have 4000 Sockets connected, but there's absolutely no messages. So there's zero traffic, just the Sockets. And then finally, the last phase of the graph is where you have the messages coming in and the graph attempting to process them.

One of the things that we immediately thought about doing was attacking the context, so to speak. So the context, as you may or may not know, is the thing that every GraphQL request or subscription or what have you has in order for it to execute in a request specific way. So anything that is specific to that request, such as user credentials, the request itself, like what is it that the user wanted, all of that is attached to the context. Right. And in our case, since we might be accessing things behind the graph server, like backends, we have those things attached to the context as well. Right. And each backend has like, you know, five different methods, you know, get, put, post, delete and patch. And so for each of those backends and for each of those methods, we had a little wrapper that would, you know, encapsulate the backend specific details of that backend. And so, you know, things like URL and so forth. So you could just, you know, call the backend, call the method, you know, and just get the data without having to put the things like the server or the domain in like a million places in your code. And put it in one place and that's it. So if we change the backend, we could just change it in one place. But the problem is this all requires the wrapper objects to store on the context. And so we were like, okay, well, subscriptions, they hardly make any backend calls because they are really just Kafka messages being sent out. There's already all the information that people might want in the message itself. They very rarely access backends. And so, and even in the case of like our request response service, you know, you're not going to need like 40 backends for every single request, right? So we were like, okay, so let's keep the syntax that we have for accessing our backends, but let's not create the backends unless somebody actually attempts to make a call to those backends.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

A Guide to React Rendering Behavior
React Advanced 2022React Advanced 2022
25 min
A Guide to React Rendering Behavior
Top Content
This transcription provides a brief guide to React rendering behavior. It explains the process of rendering, comparing new and old elements, and the importance of pure rendering without side effects. It also covers topics such as batching and double rendering, optimizing rendering and using context and Redux in React. Overall, it offers valuable insights for developers looking to understand and optimize React rendering.
Speeding Up Your React App With Less JavaScript
React Summit 2023React Summit 2023
32 min
Speeding Up Your React App With Less JavaScript
Top Content
Watch video: Speeding Up Your React App With Less JavaScript
Mishko, the creator of Angular and AngularJS, discusses the challenges of website performance and JavaScript hydration. He explains the differences between client-side and server-side rendering and introduces Quik as a solution for efficient component hydration. Mishko demonstrates examples of state management and intercommunication using Quik. He highlights the performance benefits of using Quik with React and emphasizes the importance of reducing JavaScript size for better performance. Finally, he mentions the use of QUIC in both MPA and SPA applications for improved startup performance.
React Concurrency, Explained
React Summit 2023React Summit 2023
23 min
React Concurrency, Explained
Top Content
Watch video: React Concurrency, Explained
React 18's concurrent rendering, specifically the useTransition hook, optimizes app performance by allowing non-urgent updates to be processed without freezing the UI. However, there are drawbacks such as longer processing time for non-urgent updates and increased CPU usage. The useTransition hook works similarly to throttling or bouncing, making it useful for addressing performance issues caused by multiple small components. Libraries like React Query may require the use of alternative APIs to handle urgent and non-urgent updates effectively.
The Future of Performance Tooling
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Top Content
Today's Talk discusses the future of performance tooling, focusing on user-centric, actionable, and contextual approaches. The introduction highlights Adi Osmani's expertise in performance tools and his passion for DevTools features. The Talk explores the integration of user flows into DevTools and Lighthouse, enabling performance measurement and optimization. It also showcases the import/export feature for user flows and the collaboration potential with Lighthouse. The Talk further delves into the use of flows with other tools like web page test and Cypress, offering cross-browser testing capabilities. The actionable aspect emphasizes the importance of metrics like Interaction to Next Paint and Total Blocking Time, as well as the improvements in Lighthouse and performance debugging tools. Lastly, the Talk emphasizes the iterative nature of performance improvement and the user-centric, actionable, and contextual future of performance tooling.
How React Compiler Performs on Real Code
React Advanced 2024React Advanced 2024
31 min
How React Compiler Performs on Real Code
Top Content
I'm Nadia, a developer experienced in performance, re-renders, and React. The React team released the React compiler, which eliminates the need for memoization. The compiler optimizes code by automatically memoizing components, props, and hook dependencies. It shows promise in managing changing references and improving performance. Real app testing and synthetic examples have been used to evaluate its effectiveness. The impact on initial load performance is minimal, but further investigation is needed for interactions performance. The React query library simplifies data fetching and caching. The compiler has limitations and may not catch every re-render, especially with external libraries. Enabling the compiler can improve performance but manual memorization is still necessary for optimal results. There are risks of overreliance and messy code, but the compiler can be used file by file or folder by folder with thorough testing. Practice makes incredible cats. Thank you, Nadia!
Debugging JS
React Summit 2023React Summit 2023
24 min
Debugging JS
Top Content
Watch video: Debugging JS
Debugging JavaScript is a crucial skill that is often overlooked in the industry. It is important to understand the problem, reproduce the issue, and identify the root cause. Having a variety of debugging tools and techniques, such as console methods and graphical debuggers, is beneficial. Replay is a time-traveling debugger for JavaScript that allows users to record and inspect bugs. It works with Redux, plain React, and even minified code with the help of source maps.

Workshops on related topic

React Performance Debugging Masterclass
React Summit 2023React Summit 2023
170 min
React Performance Debugging Masterclass
Top Content
Featured WorkshopFree
Ivan Akulov
Ivan Akulov
Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)
Building WebApps That Light Up the Internet with QwikCity
JSNation 2023JSNation 2023
170 min
Building WebApps That Light Up the Internet with QwikCity
Featured WorkshopFree
Miško Hevery
Miško Hevery
Building instant-on web applications at scale have been elusive. Real-world sites need tracking, analytics, and complex user interfaces and interactions. We always start with the best intentions but end up with a less-than-ideal site.
QwikCity is a new meta-framework that allows you to build large-scale applications with constant startup-up performance. We will look at how to build a QwikCity application and what makes it unique. The workshop will show you how to set up a QwikCitp project. How routing works with layout. The demo application will fetch data and present it to the user in an editable form. And finally, how one can use authentication. All of the basic parts for any large-scale applications.
Along the way, we will also look at what makes Qwik unique, and how resumability enables constant startup performance no matter the application complexity.
Build Modern Applications Using GraphQL and Javascript
Node Congress 2024Node Congress 2024
152 min
Build Modern Applications Using GraphQL and Javascript
Featured Workshop
Emanuel Scirlet
Miguel Henriques
2 authors
Come and learn how you can supercharge your modern and secure applications using GraphQL and Javascript. In this workshop we will build a GraphQL API and we will demonstrate the benefits of the query language for APIs and what use cases that are fit for it. Basic Javascript knowledge required.
Next.js 13: Data Fetching Strategies
React Day Berlin 2022React Day Berlin 2022
53 min
Next.js 13: Data Fetching Strategies
Top Content
WorkshopFree
Alice De Mauro
Alice De Mauro
- Introduction- Prerequisites for the workshop- Fetching strategies: fundamentals- Fetching strategies – hands-on: fetch API, cache (static VS dynamic), revalidate, suspense (parallel data fetching)- Test your build and serve it on Vercel- Future: Server components VS Client components- Workshop easter egg (unrelated to the topic, calling out accessibility)- Wrapping up
Building a Shopify App with React & Node
React Summit Remote Edition 2021React Summit Remote Edition 2021
87 min
Building a Shopify App with React & Node
Top Content
WorkshopFree
Jennifer Gray
Hanna Chen
2 authors
Shopify merchants have a diverse set of needs, and developers have a unique opportunity to meet those needs building apps. Building an app can be tough work but Shopify has created a set of tools and resources to help you build out a seamless app experience as quickly as possible. Get hands on experience building an embedded Shopify app using the Shopify App CLI, Polaris and Shopify App Bridge.We’ll show you how to create an app that accesses information from a development store and can run in your local environment.
React Performance Debugging
React Advanced 2023React Advanced 2023
148 min
React Performance Debugging
Workshop
Ivan Akulov
Ivan Akulov
Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)