Observability with diagnostics_channel and AsyncLocalStorage

Modern tracing products work by combining diagnostics_channel with AsyncLocalStorage. Let's build a tracer together to see how it works and what you can do to make your apps more observable.

Rate this content
Bookmark
Video Summary and Transcription
The video delves into observability in Node.js using diagnostics_channel and AsyncLocalStorage. diagnostics_channel is a high-performance global event channel that allows passive data broadcasting with minimal performance impact. It is advantageous for developers to use diagnostics_channel for efficient application monitoring without the need for direct inter-module dependencies. With node diagnostics_channel, developers can create channels at the top level of their JavaScript files, enabling subscriptions from anywhere in the application. This setup is beneficial for tracing async or sync code effectively. AsyncLocalStorage, similar to lexical scoping, allows data propagation through asynchronous callbacks and promise continuations without passing them as parameters. This feature is especially useful for storing and correlating IDs with log messages in static file serving modules. The video also covers the process of creating a Diagnostics Channel and how it can be used to improve application performance. By utilizing node:diagnostics_channel, developers can achieve detailed tracing and context tracking across asynchronous tasks. A span is explained as a container storing execution data, crucial for forming a graph in tracing applications. The video highlights how spans help in identifying and correlating different parts of execution in Node.js applications, ensuring efficient observability.

This talk has been presented at Node Congress 2023, check out the latest edition of this JavaScript Conference.

FAQ

Async local storage in Node.js provides a way to store data that follows the call path rather than the scope of definition, similar to lexical scoping. It enables data to be passed through asynchronous callbacks and promise continuations without having to explicitly pass it through function parameters.

To implement async local storage, use the 'run' method to pass values through the current state of the store during function execution. Values can later be retrieved using the 'get store' method, ensuring that data flows with the execution context across asynchronous operations.

Combining Diagnostics Channel with async local storage allows for decoupled code and detailed tracing of application behavior without direct inter-module dependencies. This setup supports efficient observability and debugging in Node.js applications by allowing passive observation and context tracking across asynchronous tasks.

To create a Diagnostics Channel, call the 'channel' function at the top of your JavaScript file and provide a name for your channel. You can then publish to this channel using the 'publish' function. Channels share a global namespace, allowing you to acquire the same channel anywhere in your application without explicitly sharing the channel object.

Yes, you can subscribe to channels for modules that are never loaded, which means events will never be published from these modules. This feature is particularly useful for tracing products designed to observe module behavior passively, without requiring an explicit connection between modules.

Diagnostics Channel is a high-performance global event channel designed for passively broadcasting data about the current execution in Node.js applications. It functions similarly to an event emitter but is optimized to be low-cost when no listeners are active, encouraging developers to broadcast data without worrying about performance impact.

Diagnostics Channel is designed to be as low-cost as possible when no subscribers are listening. This design allows developers to broadcast extensive data without significant performance penalties, optimizing the observability of applications without degrading their operational efficiency.

1. Introduction to Diagnostics Channel#

Short description:

Let's talk about observability with Diagnostics Channel and async local storage. Diagnostics Channel is a high-performance global event channel designed to be low-cost when nothing is actively listening. Channels are created at the top level of your JavaScript file and can be subscribed to from anywhere in the application. Each diagnostic channel should have a single object structure, and you can subscribe to channels for modules that are never loaded. An example is provided to demonstrate how to use the channel and publish data to it.

So let's talk about observability with Diagnostics Channel and async local storage. So hi there. I'm Steven. I've been involved in Node.js core in a diagnostics working group for many years. I work at Datadog building diagnostics tools and my pronouns are he-him.

So first of all, what is Diagnostics Channel? So Diagnostics Channel is high-performance global event channel to passively broadcast data about the current execution. It's a lot like an event emitter, but specifically designed to be as low-cost as possible when nothing is actively listening. The idea being that users should feel comfortable broadcasting lots of things without worrying about the cost, if it's not going to be observed most of the time.

So channels are created at the top level of your JavaScript file by calling the channel function and providing a name for your channel. share a global namespace to allow acquiring the same channel anywhere in the application without needing to explicitly share the channel object, and your module name should generally be included in the name to avoid collisions with channels from other things. Once you have the channel object, you can start publishing to it. This is like the emit function on an event emitter but by creating channel objects ahead of time, it allows several optimizations such as avoiding looking up the handle each by name every time an event occurs, and making a publish call as close to zero cost as possible when there are no listeners.

So each diagnostic channel should follow a convention of having a single object structure per channel, and if you have differently shaped data sets to communicate it's likely those should probably be separate channels. So at least document the names and shapes of your channels. Anywhere in the application you can call channel again with the same name to acquire the same channel and then subscribe to it. The order of channel calls doesn't matter. Whichever place calls it first will create the channel, and every subsequent call will retrieve it. You can even subscribe to channels for modules that are never loaded and therefore never have events published. This enables complete decoupling of code, allowing things like tracing products to passively observe your module behavior without needing any explicit connection between modules.

So let's look at an example. We start by defining our named channel at the top of the file. Then we write our function, which we want to broadcast some data about. Typically when it gets called, like it completes its internal set immediate, and calls its callback, it will broadcast the data to the channel with the publish function. This could be handy for all sorts of things. For example, you might want to capture metrics of how many times do a thing did whatever it was supposed to do. It could even be captured with the time of completion to chart activity over time. None of this needs to be specifically supported by DoA Thing as the subscriber can decide on its own what to do with the message or if it wants to capture any timing information. Publish is no op unless there are subscribers. It has like no cost unless there are subscribers. Sometimes constructing the message itself can be costly if the thing would run very frequently so there's the has subscribers can be used to skip message construction entirely in performance critical situations.

2. Understanding Async Local Storage#

Short description:

Async local storage is like lexical scope but follows the call path instead of the scope of definition. It allows us to propagate values through calls, callbacks, and promise continuations without needing to pass them as parameters. To use async local storage, you call the run method on the store to pass the value and retrieve it later with the get store method.

So what is async local storage? It's like lexical scope but following the call path rather than the scope of definition. Lexical scope allows us to retrieve data from the scope outside of the function itself. However, when a calls b and we define 'thing' in an inner scope, there's no way to reach it in b. In more complicated situations where we don't control the intermediate steps between where we've defined the variable and where we want to access it, passing it as a parameter to b is not feasible. So how can we get 'thing' into the event handler of 'something' when we can't pass it through the interface? This is where async local storage comes in. With async local storage, the value gets propagated through calls automatically without needing to add it to the arguments, and it flows into callbacks and promise continuations. This means that as long as a call, callback, or continuation path can be drawn between the two points, the value should be available. To pass the value through, you call the run method on the store, which will call the given function with that value as the current state of the store. And to retrieve the value from the store, you can call the get store method later.

3. Understanding Async Local Storage and Tracing#

Short description:

This means that async local storage allows values to be propagated through calls, callbacks, and promise continuations. A module for serving static files can use async local storage to store an ID and correlate it with log messages. Tracing a single action involves five events, including start and end events, async start and end events for callbacks, and an error event. Each event is assigned a separate channel, and they share a message object. Tracing async or sync code provides a scope for attributing activity to a message. Errors and return values are captured, and callback tracing includes error events and scoped execution with async start and end events.

This means that no matter how much complexity there is between the two points as long as a call callback or continuation path can be drawn between the two the value should be available. To start passing the value through you call the run method on the store and it will call the given function with that value as the current state of the store until the function ends and any async activity triggered within that scope will also contain that value and then to retrieve the value from the store you can call the get store method later.

So here we have a module for serving static files so it takes the folder to serve from analog function so notice that it returns a request handler function to be called later so it's designed to be called at the top level outside of the request to produce the handler so that there's nothing to pass through any sort of request identity or like in any way change it to differentiate concurrent requests. So it would be very difficult to identify in the log which loading message pairs with which downloading message if the path is the same. So with asynclical storage we can store an ID which we can retrieve in the log function we provide even though the log function is defined outside the request. Now we can stamp the request ID on every message so we can correlate the exact request to every log message.

So now let's put the two together. This next part is a bit dense but this is a lower level manual approach which is getting simplified and I'll get to that in a bit. Tracing a single action can be expressed with five events. When the call starts and ends an equivalent async start and end around callbacks and an event for errors. The tracing prefix and the event name suffixes you can see in the names there are important. The middle part can be whatever you want, as stated earlier it's a good idea to include your module name and the channel name so module.function is a reasonable naming pattern but you can follow whatever naming pattern you like the middle part as long as you're consistent about it and documented clearly. Each of these events is given a separate channel allowing selectively listening to only which events are relevant to the particular use case. And all events within a single action share a single message object to allow sharing data between event handlers so the end event will receive the same message object as the start event if they both come from the same traced call. If an error occurs an error property will be added to the message and it will publish the message to the error channel. And a result property also gets added to the sync synchronous end or async start. It will have a result property. And so, tracing async or tracing sync code, the start and end events provide a scope in which any sync or async activity will be attributed to the given message. In the case of trace sync, it doesn't need to associate async activity, but it will provide a shared message object, making it easy to correlate the events from a single traceable action. If the trace function throws an error, it will be added to the message object and an error event will be published. And the return value is also stored on the message object and can be seen in the end event. So tracing callbacks is the same as with sync tracing. Callback tracing has a start and end event around the sync call and captures thrown errors. But additionally, callback errors are also captured with error events. And callback execution is scoped by the async start and async end events. This is helpful to restore span scope with async local storage if necessary. So as with callback tracing, all sync tracing code applies the same. Promise projections are captured and published to the error channel. And async start marks the completion of the async task. For consistency with callbacks, async end is also published, though it's semantically identical to async start in this case.

4. Understanding Tracing and Span Lifecycle#

Short description:

It triggers immediately. Due to the infinitely chainable nature of promises, there's no distinct end point for promise so the async end is triggered immediately. This makes sense anyway as promises are meant to model async graph, resolving and merging back, as expressed with async await. So storing traces, now that we're publishing traces to tracing channels, we need to transform that data into something which meaningfully represents that collection of lifecycle events. The span object is then stored in async local storage using enter with to be made available in descending async activity as the parent binding shown in the start event and restored in the end event. Tracers will generally have an end marker for when the span ends, but it varies when that would happen. Some tracers will consider, once the processing of the async task itself is done, then it will mark it as done, but some will consider themselves as owning their callback, and therefore anything that happens within the callback and descending from that is owned as well, so it might trigger the end event in the async end or the async start, or it might even pass it through a whole graph and not complete the span until the whole tree of spans completes.

It triggers immediately. Due to the infinitely chainable nature of promises, there's no distinct end point for promise so the async end is triggered immediately. This makes sense anyway as promises are meant to model async graph, resolving and merging back, as expressed with async await. There wouldn't be anything which descends internally like callbacks do, so the graph would have merged back to the awaiting code.

So storing traces, now that we're publishing traces to tracing channels, we need to transform that data into something which meaningfully represents that collection of lifecycle events. This is typically called a span, which contains the metadata marking the logical start and end from the tracer perspective, and captures whatever metadata it needs to identify what the application was doing that produced this So, for example, a span for an HTTP request would likely contain the method URL. Here we're just passing in the message as is. But realistically, a tracer would do some additional processing.

The span object is then stored in async local storage using enter with to be made available in descending async activity as the parent binding shown in the start event and restored in the end event. Using this enter with method is not really recommended, but this is how we have to do things at the moment. Better things are coming, which I'll get into in a moment. But between the async start and end events, marking the callback scope, the span stored on message can be restored as the current span. This would not actually be necessary as async local storage is meant to handle this, but it can be handy. There are cases where context loss can happen, so this can enable manually restoring the context if necessary. Tracers will generally have an end marker for when the span ends, but it varies when that would happen. Some tracers will consider, once the processing of the async task itself is done, then it will mark it as done, but some will consider themselves as owning their callback, and therefore anything that happens within the callback and descending from that is owned as well, so it might trigger the end event in the async end or the async start, or it might even pass it through a whole graph and not complete the span until the whole tree of spans completes.

5. Understanding Spans and Tracers#

Short description:

A span is a container to store the data about the current execution along with IDs to form a graph. Holding a parent span object directly can cause memory leaks and complicate event streaming, so it's generally preferred to use IDs for correlation linkage. When the end is triggered, the span is reported to the tracer.

A span is a container to store the data about the current execution along with IDs to form a graph. Typically spans have their own unique ID and then a separate unique ID to their graph to correlate them later. Holding a parent span object directly can cause memory leaks and complicate event streaming, so it's generally preferred to use IDs for correlation linkage. When the end is triggered, the span is reported to the tracer. Like I said before, depending on the implementation of the tracer, this may aggregate to a trace object locally and send when the request completes, or it may stream the individual spans immediately. It's up to the implementer.

Stephen Belanger
Stephen Belanger
21 min
17 Apr, 2023

Comments

Sign in or register to post your comment.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

TensorFlow.js 101: ML in the Browser and Beyond
ML conf EU 2020ML conf EU 2020
41 min
TensorFlow.js 101: ML in the Browser and Beyond
TensorFlow.js enables machine learning in the browser and beyond, with features like face mesh, body segmentation, and pose estimation. It offers JavaScript prototyping and transfer learning capabilities, as well as the ability to recognize custom objects using the Image Project feature. TensorFlow.js can be used with Cloud AutoML for training custom vision models and provides performance benefits in both JavaScript and Python development. It offers interactivity, reach, scale, and performance, and encourages community engagement and collaboration between the JavaScript and machine learning communities.
Multithreaded Logging with Pino
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Top Content
Today's Talk is about logging with Pino, one of the fastest loggers for Node.js. Pino's speed and performance are achieved by avoiding expensive logging and optimizing event loop processing. It offers advanced features like async mode and distributed logging. The use of Worker Threads and Threadstream allows for efficient data processing. Pino.Transport enables log processing in a worker thread with various options for log destinations. The Talk concludes with a demonstration of logging output and an invitation to reach out for job opportunities.
Using MediaPipe to Create Cross Platform Machine Learning Applications with React
React Advanced 2021React Advanced 2021
21 min
Using MediaPipe to Create Cross Platform Machine Learning Applications with React
Top Content
MediaPipe is a cross-platform framework that helps build perception pipelines using machine learning models. It offers ready-to-use solutions for various applications, such as selfie segmentation, face mesh, object detection, hand tracking, and more. MediaPipe can be integrated with React using NPM modules provided by the MediaPipe team. The demonstration showcases the implementation of face mesh and selfie segmentation solutions. MediaPipe enables the creation of amazing applications without needing to understand the underlying computer vision or machine learning processes.
Charlie Gerard's Career Advice: Be intentional about how you spend your time and effort
Charlie Gerard's Career Advice: Be intentional about how you spend your time and effort
Article
Charlie Gerard
Charlie Gerard
When it comes to career, Charlie has one trick: to focus. But that doesn’t mean that you shouldn’t try different things — currently a senior front-end developer at Netlify, she is also a sought-after speaker, mentor, and a machine learning trailblazer of the JavaScript universe. "Experiment with things, but build expertise in a specific area," she advises.
What led you to software engineering?My background is in digital marketing, so I started my career as a project manager in advertising agencies. After a couple of years of doing that, I realized that I wasn't learning and growing as much as I wanted to. I was interested in learning more about building websites, so I quit my job and signed up for an intensive coding boot camp called General Assembly. I absolutely loved it and started my career in tech from there.
What is the most impactful thing you ever did to boost your career?I think it might be public speaking. Going on stage to share knowledge about things I learned while building my side projects gave me the opportunity to meet a lot of people in the industry, learn a ton from watching other people's talks and, for lack of better words, build a personal brand.
What would be your three tips for engineers to level up their career?Practice your communication skills. I can't stress enough how important it is to be able to explain things in a way anyone can understand, but also communicate in a way that's inclusive and creates an environment where team members feel safe and welcome to contribute ideas, ask questions, and give feedback. In addition, build some expertise in a specific area. I'm a huge fan of learning and experimenting with lots of technologies but as you grow in your career, there comes a time where you need to pick an area to focus on to build more profound knowledge. This could be in a specific language like JavaScript or Python or in a practice like accessibility or web performance. It doesn't mean you shouldn't keep in touch with anything else that's going on in the industry, but it means that you focus on an area you want to have more expertise in. If you could be the "go-to" person for something, what would you want it to be? 
And lastly, be intentional about how you spend your time and effort. Saying yes to everything isn't always helpful if it doesn't serve your goals. No matter the job, there are always projects and tasks that will help you reach your goals and some that won't. If you can, try to focus on the tasks that will grow the skills you want to grow or help you get the next job you'd like to have.
What are you working on right now?Recently I've taken a pretty big break from side projects, but the next one I'd like to work on is a prototype of a tool that would allow hands-free coding using gaze detection. 
Do you have some rituals that keep you focused and goal-oriented?Usually, when I come up with a side project idea I'm really excited about, that excitement is enough to keep me motivated. That's why I tend to avoid spending time on things I'm not genuinely interested in. Otherwise, breaking down projects into smaller chunks allows me to fit them better in my schedule. I make sure to take enough breaks, so I maintain a certain level of energy and motivation to finish what I have in mind.
You wrote a book called Practical Machine Learning in JavaScript. What got you so excited about the connection between JavaScript and ML?The release of TensorFlow.js opened up the world of ML to frontend devs, and this is what really got me excited. I had machine learning on my list of things I wanted to learn for a few years, but I didn't start looking into it before because I knew I'd have to learn another language as well, like Python, for example. As soon as I realized it was now available in JS, that removed a big barrier and made it a lot more approachable. Considering that you can use JavaScript to build lots of different applications, including augmented reality, virtual reality, and IoT, and combine them with machine learning as well as some fun web APIs felt super exciting to me.

Where do you see the fields going together in the future, near or far? I'd love to see more AI-powered web applications in the future, especially as machine learning models get smaller and more performant. However, it seems like the adoption of ML in JS is still rather low. Considering the amount of content we post online, there could be great opportunities to build tools that assist you in writing blog posts or that can automatically edit podcasts and videos. There are lots of tasks we do that feel cumbersome that could be made a bit easier with the help of machine learning.
You are a frequent conference speaker. You have your own blog and even a newsletter. What made you start with content creation?I realized that I love learning new things because I love teaching. I think that if I kept what I know to myself, it would be pretty boring. If I'm excited about something, I want to share the knowledge I gained, and I'd like other people to feel the same excitement I feel. That's definitely what motivated me to start creating content.
How has content affected your career?I don't track any metrics on my blog or likes and follows on Twitter, so I don't know what created different opportunities. Creating content to share something you built improves the chances of people stumbling upon it and learning more about you and what you like to do, but this is not something that's guaranteed. I think over time, I accumulated enough projects, blog posts, and conference talks that some conferences now invite me, so I don't always apply anymore. I sometimes get invited on podcasts and asked if I want to create video content and things like that. Having a backlog of content helps people better understand who you are and quickly decide if you're the right person for an opportunity.What pieces of your work are you most proud of?It is probably that I've managed to develop a mindset where I set myself hard challenges on my side project, and I'm not scared to fail and push the boundaries of what I think is possible. I don't prefer a particular project, it's more around the creative thinking I've developed over the years that I believe has become a big strength of mine.***Follow Charlie on Twitter
TensorFlow.JS 101: ML in the Browser and Beyond
JSNation Live 2021JSNation Live 2021
39 min
TensorFlow.JS 101: ML in the Browser and Beyond
JavaScript with TensorFlow.js allows for machine learning in various environments, enabling the creation of applications like augmented reality and sentiment analysis. TensorFlow.js offers pre-trained models for object detection, body segmentation, and face landmark detection. It also allows for 3D rendering and the combination of machine learning with WebGL. The integration of WebRTC and WebXR enables teleportation and enhanced communication. TensorFlow.js supports transfer learning through Teachable Machine and Cloud AutoML, and provides flexibility and performance benefits in the browser and Node.js environments.
Observability for Microfrontends
DevOps.js Conf 2022DevOps.js Conf 2022
24 min
Observability for Microfrontends
Microfrontends follow the microservices paradigm and observability is crucial for debugging runtime production issues. Error boundaries and tracking errors help identify and resolve issues. Automation of alerts improves incident response. Observability can help minimize the time it takes to understand and resolve production issues. Catching errors from the client and implementing boundaries can be done with tools like OpenTelemetry.

Workshops on related topic

Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
JSNation 2024JSNation 2024
108 min
Leveraging LLMs to Build Intuitive AI Experiences With JavaScript
Featured Workshop
Roy Derks
Shivay Lamba
2 authors
Today every developer is using LLMs in different forms and shapes, from ChatGPT to code assistants like GitHub CoPilot. Following this, lots of products have introduced embedded AI capabilities, and in this workshop we will make LLMs understandable for web developers. And we'll get into coding your own AI-driven application. No prior experience in working with LLMs or machine learning is needed. Instead, we'll use web technologies such as JavaScript, React which you already know and love while also learning about some new libraries like OpenAI, Transformers.js
Can LLMs Learn? Let’s Customize an LLM to Chat With Your Own Data
C3 Dev Festival 2024C3 Dev Festival 2024
48 min
Can LLMs Learn? Let’s Customize an LLM to Chat With Your Own Data
WorkshopFree
Andreia Ocanoaia
Andreia Ocanoaia
Feeling the limitations of LLMs? They can be creative, but sometimes lack accuracy or rely on outdated information. In this workshop, we’ll break down the process of building and easily deploying a Retrieval-Augmented Generation system. This approach enables you to leverage the power of LLMs with the added benefit of factual accuracy and up-to-date information.
Let AI Be Your Docs
JSNation 2024JSNation 2024
69 min
Let AI Be Your Docs
Workshop
Jesse Hall
Jesse Hall
Join our dynamic workshop to craft an AI-powered documentation portal. Learn to integrate OpenAI's ChatGPT with Next.js 14, Tailwind CSS, and cutting-edge tech to deliver instant code solutions and summaries. This hands-on session will equip you with the knowledge to revolutionize how users interact with documentation, turning tedious searches into efficient, intelligent discovery.
Key Takeaways:
- Practical experience in creating an AI-driven documentation site.- Understanding the integration of AI into user experiences.- Hands-on skills with the latest web development technologies.- Strategies for deploying and maintaining intelligent documentation resources.
Table of contents:- Introduction to AI in Documentation- Setting Up the Environment- Building the Documentation Structure- Integrating ChatGPT for Interactive Docs
Hands on with TensorFlow.js
ML conf EU 2020ML conf EU 2020
160 min
Hands on with TensorFlow.js
Workshop
Jason Mayes
Jason Mayes
Come check out our workshop which will walk you through 3 common journeys when using TensorFlow.js. We will start with demonstrating how to use one of our pre-made models - super easy to use JS classes to get you working with ML fast. We will then look into how to retrain one of these models in minutes using in browser transfer learning via Teachable Machine and how that can be then used on your own custom website, and finally end with a hello world of writing your own model code from scratch to make a simple linear regression to predict fictional house prices based on their square footage.
Scaling Databases For Global Serverless Applications
Node Congress 2022Node Congress 2022
83 min
Scaling Databases For Global Serverless Applications
WorkshopFree
Ben Hagan
Ben Hagan
This workshop discusses the challenges Enterprises are facing when scaling the data tier to support multi-region deployments and serverless environments. Serverless edge functions and lightweight container orchestration enables applications and business logic to be easily deployed globally, often leaving the database as the latency and scaling bottleneck.
Join us to understand how PolyScale.ai solves these scaling challenges intelligently caching database data at the edge, without sacrificing transactionality or consistency. Get hands on with PolyScale for implementation, query observability and global latency testing with edge functions.
Table of contents- Introduction to PolyScale.ai- Enterprise Data Gravity- Why data scaling is hard- Options for Scaling the data tier- Database Observability- Cache Management AI- Hands on with PolyScale.ai
The Hitchhiker's Guide to the Machine Learning Engineering Galaxy
ML conf EU 2020ML conf EU 2020
112 min
The Hitchhiker's Guide to the Machine Learning Engineering Galaxy
Workshop
Alyona Galyeva
Alyona Galyeva
Are you a Software Engineer who got tasked to deploy a machine learning or deep learning model for the first time in your life? Are you wondering what steps to take and how AI-powered software is different from traditional software? Then it is the right workshop to attend.
The internet offers thousands of articles and free of charge courses, showing how it is easy to train and deploy a simple AI model. At the same time in reality it is difficult to integrate a real model into the current infrastructure, debug, test, deploy, and monitor it properly. In this workshop, I will guide you through this process sharing tips, tricks, and favorite open source tools that will make your life much easier. So, at the end of the workshop, you will know where to start your deployment journey, what tools to use, and what questions to ask.