Video Summary and Transcription
Async local storage is an API that has been around in Node for quite some time and is gaining popularity in other frameworks and runtimes. It allows for easier logging by eliminating the need to pass values through multiple functions. The storage frame in async local storage acts as a map, storing key-value pairs. Tasks and promise continuations are used to perform the next steps in a promise chain. Async Context is a TC39 proposal that adds async local storage to the JavaScript language.
1. Introduction to async local storage
Hello, I'm James Snell, a principal engineer at Cloudflare and a member of the Node Technical Steering Committee. I will be discussing async local storage and how it works, providing insight into this API that has been around in Node for quite some time and is now gaining popularity in frameworks and runtimes like Deno and Workers. Let's delve into the details.
All right, hello Node Congress. I am James Snell. I'm going to be talking to you a little bit about async local storage and how it works. It's an API that's been around in Node for quite some time now, a number of years. And it's starting to be picked up by a number of frameworks, next and a few others, and it's being implemented in a variety of runtimes like Deno and Node and Workers and Bun. So, I kind of want to talk about how it works. Just kind of dig into the covers a little bit so you can kind of get an understanding of what's going on.
For some background on me, I'm James Snell. I'm a principal engineer at Cloudflare. I work on the Workers platform. I've also been on the Node Technical Steering Committee for, oh wow, almost a decade now. I've been contributing to Node since 2015. And I am the co-chair and one of the founders of the WinterCG, that's the Web Interoperable Runtimes Community Group, where we're focused on getting all these different runtimes, Bun and Deno and Workers and all these different things together to talk about standardized APIs across the board. So, that's me. If you've heard of me before, it's probably because some of the things I've contributed to Node over the years, things like the WG URL, web crypto implementation, web streams implementation, that kind of thing. But yeah, let's get into it.
2. Understanding async local storage
Let's dive into the details of async local storage and understand how it works under the covers.
All right, so if we look at the async local source documentation on the Node website, this is the very helpful introduction that it gives, talking about, you know, these classes are used to associate state and propagate it through callbacks and promise chains. It doesn't really do a lot to help us understand what this API is doing under the covers. And if you haven't noticed, you know, just Node docs in general don't do a great job of explaining things. So, we're going to dig into the details just a bit more, explain how this works, how it all comes together.
3. Using async local storage for logging
Let's imagine we have a function that needs to log a message with a request ID. Before async local storage, we had to pass the ID through all the functions that need it. However, async local storage allows us to create a container and specify the value we want to run with. The other functions don't need to know about the request ID, they can simply access it from the container when needed. This makes it much easier, especially when dealing with external dependencies. We can set the context and retrieve it later without modifying everything else.
All right, so let's look at an example. This is kind of the canonical example I like to use for async local storage. Let's imagine we have a function. You know, we have this export default function right here. This is the kind of the workers' way of exporting a function handler. We have this async fetch. We wanted to do something and eventually we want some logging to happen in here. So, you know, we have this log function that logs a message. You know, think about it essentially as just a console log, that kind of thing.
And what we want to do is define a request ID that gets logged with the message whenever it goes out. Now, kind of the traditional way of doing it before async local storage, the way we would have to do this is, you know, we'd come up with a request ID. So, here we're using the const ID equals crypto random UUID. And we actually have to plumb that ID through all the different functions that need it, or that end up in a log statement. So, here we have to modify do something and do something else to take that request ID and forward it on to log, even though the do something and do something else have, you know, don't do anything else with request ID itself. So, you know, this gets a bit complicated if these functions that you're passing this through are in some other library or some dependency in your code.
Or if they're, you know, or if they're doing complex asynchronous tasks, that kind of thing. It becomes kind of awkward to have to modify all of these functions just to pass this ID along in order to get it into the log statement. So, what if there was a way that we could just say, here's the ID, whenever log runs, use this ID, but not have to be forced to pass that along through all the different layers of functions and stuff that we want to call. And that is what async local storage gives us, allows us to do. With async local storage, we can create this, this holder, this, you know, request ID equals new async local storage, this is container. And we can specify in our fetch handler that, hey, we want to run with this value, in this case, it's our request ID, it's random UUID. We want to run this code with that value set. Right? And those, you know, what changes here to do something and do something else, don't have to know anything about the request ID at all. They don't have to pass it along, except as a, as an argument. When log is called, it doesn't actually, it doesn't have to tell it, okay, what request ID are you running under? When log is called, when log is invoked, it will go to that container, that async local storage container, and just say, hey, get the value that's currently set for this when I'm running right now. So it's going to give us a really simple way of just kind of, hey, store this value later on in some other layer, we want to get the value. And notice that we're doing this past awaits. So this is happening asynchronously across multiple steps of the, of a promise branch. So, you know, it's, it makes it a lot easier, especially if these do something and do something else functions are coming from some dependency that you have no control over that you actually can't modify this way. You can actually set that context and retrieve it later without making any modifications to the, to everything else.
4. Exploring the Storage Frame
Let's explore what happens under the covers of async local storage. In node, the storage frame is an abstract concept that acts as a map, storing key-value pairs. Each async local storage variable is a key, and it has a value at any given time. The value is initially undefined but gets set when you call run. Retrieving the current value is done with ALS and get store. This frame, or map, is not limited to a single frame in the application.
So that's the advantage of async local storage, but what is happening under the covers to allow this to work? Let's get into that.
All right. So we have this concept of a storage frame. Now this is in node. The storage frame is really more of an abstract concept and some of the other runtimes like workers, we actually have a, an internal class called, you know, async context frame and basically what this is, is a map, right? All it does is store key values. You know, you specify a key, specify a value, whatever. So acts like a pretty simple map.
How that relates to async local storage is that every async local storage variable, every instance of that is a key. And at any given time, that key, that async local storage instance has a value. Now, initially that value is always undefined. But whenever you call run, it sets that value. Okay. And then whenever you call ALS and get store, it get, it retrieves the current value in that, in that map. So think of it, you know, just this frame, that is essentially just a map. We could also just call this, you know, the async local storage map, right? Importantly, there's not just a single frame in the application.
5. Working with Storage Frames
Storage frames are immutable and use a copy-on-write strategy. When a new value is set for an async local storage variable, the existing frame is copied into a new instance. This allows for retrieval of values set on previous frames, even if there have been copies since then. The initial frame is empty, and subsequent frames are created as needed. Calling ALS run creates a new frame and sets the value specified. Subsequent calls to ALS run create new frames and add values, with the most recent frame becoming the current.
Importantly, there's not just a single frame in the application.
All right. Storage frames are immutable. We use a copy on write strategy. So basically whenever we set a new value for a new async local storage variable, we actually copy the existing frame into a new instance and then set that value, set that value there. So it's very important for, for performance. And we'll get into exactly why this works and how this works in a few minutes.
But the, you know, it's just important to understand that once a storage frame is created, that instance should remain unchanged. It is immutable. And what that allows us to do is anytime we want to reference that frame anywhere along in that promise chain or that async context flow, we can actually always retrieve the value that was set on that frame, on the relevant frame that was set, even though we might have any number of frames that have been copied since then.
We'll go into a little bit more detail on this, but, you know, the point of this is just to show the very simple primitives that are used here, a map, copy and write, right? We're just setting these values. It's just key values. That's it. Okay. So initially, when you start your application, the initial frame, there's always a frame. There's always going to be a frame, a current frame. That is initially empty. It has no values in it whatsoever.
The first time we call ALS run, and ALS being an async local storage, first time we call run, and we specify a value, what's going to happen is that initial empty frame is going to be copied into a new instance. And then we're in that new instance, we're going to set that ALS as the key equals this value that we passed in, this one, two, three in this case. And notice that when we're running inside that, inside ALS run, that new frame that we copied becomes the current.
That initial frame still exists. It's still there. Right? But that new frame becomes the current frame. And when we call ALS two, another instance of async local storage, we call ALS two run, we end up copying that second frame into a third, and then setting that ALS two value in addition to the ALS one, two, three that was already there. So in this case, we have three storage frame instances now. ALS run has been called twice, but then, right, on the inside, this second ALS two run, it's this third frame that actually becomes the current. The other two still exist.
6. Internal Working of ALS Run
ALS run sets a new frame as current and runs a callback. The original frame is restored afterwards. Each frame is a copy of the previous frame and sets a new value. This process ensures the right frame is used in each async activity.
ALS run has been called twice, but then, right, on the inside, this second ALS two run, it's this third frame that actually becomes the current. The other two still exist. Right? They haven't gone away yet. All right.
So we're essentially creating a stack of what these frames, you know, these copies of these frames, and each copy is setting a new value in that map. All right. Okay.
So there's always a current storage frame. Run stores a reference to the current frame, copies the current frame, sets the new value, sets that new frame as current, runs that callback, and then restores the original frame. And this code right here is not the actual implementation. This is just the kind of a pseudocode implementation of this that shows the flow. Okay. So when run is called, this is the ALS run, we get the current frame. All right. Then we copy that current frame. We set that value that this in this case is the async local storage instance. So we're using that as the key. We set that to this value, and we set that new frame as current, run the callback. And notice in the finally, so whether the callback errors or not, whether it completes correctly or if there's an exception, we're still going to come back and we're going to restore the frame that was current when we started the run. All right. So we're basically just swapping these things in and out as we go through and run this code. Okay. All right.
So is that really just a map? There's a bit more to it than that. So we have these frames. Like I said, every time we call run, we go out and we copy this frame, we copy this map, add a new value, set it as current, all these kinds of things. But that doesn't quite give the entire picture because what we also have to know is like, how are we setting and resetting these things so that in every individual promise or every individual async activity that happens, the right frame is being used at any given time. So let's talk a little bit about how promise works. These are the internals. And I'm greatly oversimplifying this on purpose.
7. Internal Working of Promise
A promise has four values: status (pending, resolved, or rejected), result (value or exception), and reactions (resolve or rejection). The promise resolver modifies the state, and the micro task queue scheduler runs the tasks. The resolver sets the status and result, and passes the tasks to the scheduler to run.
But this should give a little bit of an understanding how this flow and what's going on under the covers. All right. So we have a promise, right? And inside a promise, we have basically four values. There's more than this, but like I said, I'm oversimplifying it. We have a status, which is the promise is either pending, resolved, or rejected. We have a result, which is either the value that resolved to or the exception. And we have a set of reactions, either resolve reactions or rejection reactions. There's also finally reactions, but I'm going to ignore those for now.
And what those are, are arrays of tasks, okay? For each of those. And tasks are, these are like the things to do after it resolves. So the things to do after it rejects. So these are the things that are set with then or catch or finally, that kind of thing. Okay. So the promise is that structure. Also, we have this promise resolver, which is the pair of functions, either the resolve or the reject, that modify that state. And then we have this micro task queue scheduler that actually runs the tasks.
Okay. So we have some bit of code. It calls resolve, to resolve that promise. We're going to set the status to resolve. We're going to set the result to whatever value it is, and they're going to take those array of tasks in the resolve reactions. And we're going to pass those off to the micro task queue schedule to run. And then at some point in the future, we'll tell the micro task scheduler, hey, run all of the tasks that you have, that you have collected up to this point. And the scheduler will go through and just start iterating through every single one of those tasks and say, okay, run this one, run this one, run this one. All right. That's an environment. That's an important step. Okay? So we have the promise, we have the resolver, we have the scheduler. Resolver sets the status of the promise, updates it, and causes the micro task scheduler to receive a set of tasks to run. All right.
8. Tasks and Promise Continuations
Tasks are used to perform the next step in a promise chain. A task is a function with resolve and reject functions attached to it. When a task represents a continuation of a resolved promise, its resolve and reject become the resolver for the next promise. Calling .then or .catch creates tasks that create new promises and return them.
Notice it says tasks and not functions. When you call .then on a promise, you're passing it a function. When you call it .catch, you're passing it a function. And these are callbacks that are revoked. A task is not quite the same thing. A task is a bit more than just the callback. The task has the function, right, and then it has a resolve and reject function attached to it. Okay? And it is used to perform the next step in a promise chain.
So when the scheduler receives this thing, it's going to look at this. This resolve and reject is the resolver for the next promise that this task represents. So we took one promise, resolved it, and we have another task of then as a continuation, right? That is also represented by a promise. This resolve and reject, this task is the resolver for that second promise. Bit confusing. We don't need to go into all the details there. Like I said, I'm oversimplifying this quite a bit just to break it down. But this is what a task is, right? It's just a function, a callback that resolves another promise.
Calling then, calling catch, like I said, creates these tasks. So, this is essentially what is happening inside that promise then, or promise catch, is we create another promise. We push the reaction onto that, you know, into that promise reaction array. We're creating this task as the function, resolve and reject, and then we're returning the new promise that was created. Okay? Pretty straightforward, pretty simple. All right. What about async context? Where does that play in all of this? Okay.
9. Working with Tasks and Async Context
Calling then, calling catch creates tasks that create new promises and return them. When creating a task, an additional field referencing the current storage frame is added. The task becomes a function with resolve, reject functions for the next promise, and a reference to the current frame. When the task is run, the scheduler stores a reference to the current frame, temporarily sets the captured frame as the current, runs the function, and restores the captured frame in the finally block.
Calling then, calling catch, like I said, creates these tasks. So, this is essentially what is happening inside that promise then, or promise catch, is we create another promise. We push the reaction onto that, you know, into that promise reaction array. We're creating this task as the function, resolve and reject, and then we're returning the new promise that was created. Okay? Pretty straightforward, pretty simple.
What about async context? Where does that play in all of this? Okay. So, what we do when we call, again, this is looking at the XAN and catch. When we want to create that task, we have to add an additional field. And this additional field is a reference to whatever the current storage frame is. All right? So, like I said, you know, there's always a current frame. Initially, that's empty. But every time we call ALS run, we create a new frame and set it as the current. So, when I call promise.then, it's going to grab whatever the current frame is, a reference to that, and store that reference in the task itself. All right? So, the task is now a function, a resolve, reject for the next promise, and a reference to the current frame. Okay?
Now, when we run the task, again, this is in the scheduler. What the scheduler is going to do is say, okay, I want to get the current frame, whatever the frame is right now, before I run this task. And it's going to store a reference off to the side. And then it is going to take whatever frame, whatever storage frame that task is captured, and set it temporarily as the current. Okay? That's what we have here, current equals current frame. That's where we're setting the current one off to the side. We're replacing that with whatever one the task is captured. Then we run the function. All right? And again, if the result succeeds, we set the resolve of the next promise as a result. If it rejects, we reject. And notice in the finally, so whether it succeeds or fails, in the finally, we restore the current frame that was captured. So, every time we run a task, grab the current, replace it, run some code, restore the current. Okay? So, every time it runs, we're replacing this thing. So, that when task function runs here, it's going to see the frame that the task captured when the then was called, or when the catch was called. That's the one it's going to see as the current.
10. Working with ALS Frames
The current storage frame can be swapped in and out. Every call to ALS run creates a new frame. Calling ALS get retrieves the value from the current frame. Promise then and finally create new tasks that capture references to frames. Frames can be garbage collected and freed when there are no more references to them.
That's the one it's going to see as the current. So, essentially, we're just swapping things in and out. Okay? All right. There's always a current storage frame. Initially empty. Every call to ALS run creates a new frame, that's a copy, plus the new value. Calling ALS gets only looks at the current frame. So, it says whatever is current, get me the current value for that, if anything. Every call to promise then, finally, creates a new task that captures a reference to it. When a task is run, its capture frame is temporarily set as a current. And when there are no more references to a frame instance, that one frame can be garbage collected and freed. So, for instance, like I said, there's initially an empty one. We call ALS run once. It creates a new frame and runs some code. If nothing in that run code captures a reference to that frame, so if we don't create another copy, if we don't create any tasks, there's nothing holding a reference to it. It's just another JavaScript object, which means it can be garbage collected and freed so we're not, you know, holding on to any memory, that kind of thing. So, it's important that there's, I'm oversimplifying things quite a bit in here. There's a lot more going on under the covers, but this kind of gives you the basics, the high level of how all this works.
11. Working with Set Timeout and Async Context
When set timeout is called, we capture a reference to the current frame and restore it when the function is invoked by the timer. Node's implementation is older and slower, copying frames for every promise and async resource. We want to improve this by using a model similar to other runtimes, such as workers. Async Context is a TC39 proposal to add async local storage to the JavaScript language. Currently, it can be imported from Node Async Hooks.
What about something like set timeout? Set timeout's not promise based. How does that work? Well, same way. When set timeout is called, we capture a reference to whatever the current frame is. When that function is actually invoked by the timer, the underlying timer, we restore that, right? But first, we capture the current frame, set it aside, restore the frame that we reference that we got when we called set timeout, invoke the callback function, and then restore the frame. So, we're just, you know, the entire time, we're just kind of swapping out these references so that the get current frame will always reflect the right one.
Node does things a bit differently. Node's implementation, it's older. It uses async hooks and promise hook APIs. It's a lot slower. Frames end up getting copied every time a promise is created and every time an async resource is created, like a timer or next tick, rather than every time ALS run is called. And you can imagine that in an application that is creating tens or hundreds of thousands of promises, copying this frame on every time a promise is created gets very expensive from a performance point of view. We are looking at improving this. Other runtimes, like workers, uses a model that's closer to what I described in here, where we're copying every time ALS is invoked. We want to make Node's implementation like that. And we're using an obscure V8 API that's used in workers, used in Chromium, that makes all of this a lot easier, and it's going to underlie some of the new things that are coming. I'm going to talk about it in just a second.
So, the Node implementation does work a little different. It's a little slower. But we want to change it. We want to update it and get it and basically modernize it. What is Async Context? Async Context is a new TC39 proposal. TC39 is the committee that actually standardizes the JavaScript language. The proposal is basically to add async local storage to the language itself, rather than it being this nonstandard API. But they are changing it. They are changing the name of it. Some of the details will change. So, right now, in order to use async local storage, you import it from Node Async Hooks, and this is whatever platform you're on, whatever runtime you're on. You access it the same way. You're importing it from this Node Async Hooks module.
12. Using Async Context with Async Local Storage
With Async Context, you don't have to import it anymore. It becomes a global. Get store is now get, and request ID run remains the same. There are some other differences with Async Local Storage API, but these are the key points. Just grab the frame, modify the value by copying it, and swap them in and out.
You're importing it from this Node Async Hooks module. You create it with the new Async Local Storage constructor. You have get store run, right? With Async Context, for the most case, it's going to be a drop-in replacement. A few details change. You don't have to import it. It's going to be a global. Get store just becomes get, right? So, request ID get. Request ID run remains the same, exactly the same way. So if we look at the two, very minimal differences between these. There are some other differences with Async Local Storage API, some things that Async Context will not pick up. I'm not going to go into those right now. They're not that important, and they're experimental features on Async Local Storage anyway, so it's not worth actually going into. This here are the pieces that you really need to understand.
Okay, and that's it. So, I hope that that really helped. Just kind of draw back the curtain just a little bit so you can understand what is happening under the covers with Async Local Storage. Again, just grab the frame. We modify the value by copying it. We're just swapping those things in and out, and yeah, I hope that helps. Hope you enjoy the rest of your conference.
Adoption and Passing ALS Instance
64% of people have not used Async Local Storage before, which is not surprising. It's an API that most people are not aware of. The adoption of Async Local Storage seems to be quite high already. One common question is how to pass the ALS instance down the call stack without drilling it down. Different strategies include declaring it at a top-level scope or using a module as a dependency. These approaches help avoid passing it down directly.
First of all, let's take a look at the poll question that you posted to the audience before. The question was, have you used Async Local Storage within your application before? 64% of the people said no. Is that a big surprise to you? No, not a big surprise at all. It's one of those APIs that most folks aren't going to think about. They're not going to really know that it's there. I would have been way more shocked if those numbers had been flipped. Yeah, also, that wasn't the second question in my mind. It's like, all right, how many people have used it before, but they didn't know that they used it? Probably quite a few. That, quite a few. Now that it's starting to be incorporated in a number of frameworks and stuff, it's probably a lot more common to find under the covers. Yeah, I mean, I'm actually surprised that it's sort of one-third, two-thirds, one-third have used it before, so the adoption seems to be quite high already.
Let's take a look at the questions that you guys submitted for James. Thank you so much. I'm just going to jump here to the first question. How would you suggest to pass the ALS instance to the code which needs it down the call stack? You still need to pass it, hopefully, without drilling it down. Yep, yep. This is actually one of the more common questions for ALS, so I'm really happy to see this one come through. It's clear that it doesn't actually solve the entire problem. How do you make this, you're not going to pass the log ID down or the request ID, but you still have to pass something down. All of the examples are going to show this thing as being declared as some top-level scope that whatever code needs to have access to it, to either setting the value or getting the value, needs to see. You can set that at global scope, you can set that in the top level. When I mean top level, the difference between that and the global, say, if it's a variable declared at the top of a function, and anything within it closes over it, or if it's on some context object that is available everywhere. It just has to be at some level that anything that needs it is going to be able to see it. Another strategy that I've seen is using a module for it where you can import that ALS instance from something as a dependency. Any other modules that need it can just do the same thing. They import that, get access to it, and you can set the value. There's a number of different strategies. Like I said, it doesn't solve every aspect of it, but it does at least give you part of the solution of avoiding passing things down.
Thank you. Thank you so much.
Memory Leaks, Performance, and Understanding APIs
Storing ALS in global scope can lead to memory leaks. Avoid using ALS in Node.js due to performance issues with async hooks. Understanding how APIs work internally is crucial for problem-solving and fixing bugs like memory leaks. It's important to look inside black boxes and remember our technical capabilities.
The next question is, are there any gotchas with memory leaks with ALS? Yes. With anything. If you're storing a value and it's going to be there, say, if you're putting it in the global scope, and you run this thing, anything that's going to hold on to a reference to that ALS instance, like again, if you're storing it in global this, that means that that value is likely to be retained much, much longer than you might have anticipated it to be. There's no way to unset that. In the workers implementation, this frame that we have that was talked about in the talk is basically a map. We copy that map on every time we set a value. We reference count that thing internally so that when nothing else is pointing at it, nothing else needs it, and that reference count drops to zero for that particular copy, we can free that, the garbage collector can kick in and reclaim the memory. But if an implementation is not careful, it's possible, for instance, for the ALS to capture a reference to itself, in which case, that thing is just going to stick around in memory and get put into old space very, very quickly. You've got to be careful. I've got to watch out.
Would you recommend avoiding ALS in Node.js until some of the performance issues are resolved? Unfortunately, yes. The implementation that is there is based on async hooks, and async hooks are very expensive to turn on. We want to fix this implementation. Bradley Fryas has been working on this based on the implementation that's going into V8 for async context, and based on what we did in workers, just modeling after that, it would be much, much faster. It's worth using it, but be aware that you're looking at potentially up to a 30% performance hit just turning on async hooks and using this, so you've got to be careful. Right. For now, just use it in runtimes like the CloudFlare workers, where that kind of stuff is optimized. Yeah, or places where the performance hit may not be as devastating, right? There's some service scenarios where 30% hit is just not acceptable.
As the last question, and maybe you could provide us with a quick answer here, why is it important to understand how these APIs work internally? Isn't it just enough to know that it just works? You have to know how these things work just so you can solve problems, fix bugs like the memory leak issue, right? If you don't know how it works internally, then you'll never understand how to fix that memory leak and why the leak is actually there. For me, understanding the internals, the depths, everything that's happening under the covers is essential to writing good code. I also feel like that with more and more black boxes, obviously that we have to interact with every single day, it doesn't hurt every now and then to actually look inside of a black box and remind ourselves that we are in fact technically trained people. We are capable of understanding even this very deep down level stuff.
Jeff, thank you so much for this amazing talk, and thank you so much for the contributions that you make to the Node.js community. Really awesome having you. Yeah, thanks for having me here. It's been fun.
Comments