What's in a Node.js Bug – A Case Study

She is fueled by a passion for Node.js and its community.

Anna performs a deep dive into the anatomy of a regression that affected Node.js users in development and production in the past year, and analyzes how it gives us insight into how JS engines work under the hood and how Node.js itself is currently being developed.

This talk has been presented at Node Congress 2025, check out the latest edition of this JavaScript Conference.

FAQ

The talk is about a Node.js bug related to character encoding and performance issues.

Anna is very passionate about character encoding and has given talks on the topic.

The speaker is Anna, a staff engineer at MongoDB and a former Node.js core contributor.

Character encoding is the conversion of characters into byte sequences to communicate with the operating system. It is important for representing different languages and characters in software applications.

Common character encodings mentioned include ASCII, ISO 8859-1 (Latin-1), UTF-8, and UTF-16.

The bug was discovered through bug reports in the Node.js core GitHub repo, indicating issues with UTF-8 and character encodings.

UTF-8 is popular because it is backward compatible with ASCII and can represent a wide range of characters.

It is advised to be careful with using the latest Node.js versions in production and to test against them if possible.

The bug involves a non-deterministic issue with UTF-8 and character encodings, triggered after many iterations in a Node.js application.

Anna is a staff engineer at MongoDB working on the developer tool suites.

node.js

Anna Henningsen

23 min

17 Apr, 2025

Comments

Video Summary and Transcription

I'm going to talk about character encodings, specifically a Node.js bug related to UTF-8. Two popular encodings are UTF8 and UTF16. The bug was discovered in August 2024 and was traced to a specific pull request in the Node.js core repository. The bug is caused by an incorrect implementation of the fast write string method. Lessons learned include the importance of naming conventions and thorough testing beyond coverage analysis.

1. Introduction to Character Encodings

Short description:

I'm going to talk about a Node.js bug. I'm Anna, a staff engineer at MongoDB and a former Node.js core contributor. I'm passionate about character encoding. Let's discuss what character encodings are. Character encodings are conversions that everyone agrees on to talk to the outside world. Early standardized character encodings include ASCII and ISA88591. Unicode came along to represent more characters by assigning a number to each character and then assigning a sequence of bytes to that number.

Yeah, hi, everyone, and thanks for joining this talk. So, I'm going to be talking about a Node.js bug.

Some quick intro for myself. Hey, I'm Anna. I am currently a staff engineer at MongoDB working on the developer tool suites. If you have ever used MongoDB, I'm sure a lot of you have at some point. I am working on a couple more things. I am also a former Node.js core contributor, have been quite active in the past, and a technical steering committee member on that side. I am very, very passionate about character encoding. I have given talks on that topic before. I know it may seem a little boring sometimes, or not like it's, you know, it isn't like bleeding edge technology, but it's still something that I think always makes for interesting conversations. This is how you can reach me. I also uploaded the slides at this link here, if you want to look for them at some point later.

Before we actually talk about the Node.js bug that I was referring to, let's do a quick refresher about what character encodings actually are specifically. So, like in a general, like typical application, you have your code that you run. That's a separate program. And that somehow talks to an operating system or kernel, which, you know, takes care of talking to the outside world for your application. And the way we've built software that happens to be the case that this operating system or kernel mostly, you know, receives and sends out information in the form of byte sequences. But your application typically works with the logic of like character sequences, which are typically called strings. And so, in order to like be able to work with strings inside your application, you need to have some kind of conversion that everybody agrees on in order to talk to the outside world. And those conversions are called character encodings.

Some of the early standardized character encodings, everybody knows ASCII, I'm sure that was like a way to encode most English language characters, still is. But that means that there are characters that you cannot represent. And then over time, obviously, you know, people recognize the need to be able to represent other characters. So, one of the more popular ones that is also like special for historical reasons is this ISA88591, which at least covers a lot of like Central European characters. But obviously, that's, you know, that also reaches its limits. You want to be able to represent Chinese characters, you're going to have to come up with something new where you cannot represent every character by a single byte. And so, Unicode came along and this was like starting to be really popular in the late 90s, early 2000s. And that essentially converted this process into a two-step process where through each character that you want to be able to represent, you assign a number. And then to that number, you assign a sequence of bytes.

2. Character Encodings in JavaScript

Short description:

Two popular character encodings relevant today are UTF8, which is backwards compatible with ASCII and does something special for byte sequences outside the ASCII range, and UTF16, which is not compatible with ASCII. JavaScript engines have different representations for strings, with some being concatenated and internally represented as multiple strings. JavaScript engines are smart about representing strings internally.

And so, two of the more popular ones and the ones that are particularly relevant for this conversation today are 1UTF8, which is like, you know, backwards compatible with ASCII for as far as the byte sequences are concerned. It just like it starts to do something special for byte sequences that are outside the ASCII range. And UTF16, which is a way that's like not compatible with ASCII at all. But if you look at these in more detail, you can see that there are still like some shared representations of characters and some, you know, shared history when it comes to how strings exactly are being represented.

And so, I think most of you as JavaScript developers will have heard the, you know, claim that JavaScript uses UTF16, right? That's how strings work in JavaScript. And that's like, that's not entirely wrong, but it's also not true. And so, if we look into the source code of the JavaScript engine that Node.js uses and that Google Chrome uses, and a lot of JavaScript applications use these days, we can look at the source code of the we can look for all the strings that this implementation defines, and we can see there's actually a lot of very different representations for strings. And there are two that are particularly worth highlighting, which I'll show you in a second.

So, if you want to inspect, for example, how V8 internally represents this string here that I'm building together, it's like a concatenation and then a repetition and a substring of that. There actually is a way to do that. You need to expose V8 internals for that. So, this is not something you could do in a production application. But there is this debug print helper that V8 provides and you need to pass a special flag in order to enable it. And you can actually look at, like, you know, hey, what does this string contain? And so, for example, for this one, it's going to say it's a const string type, which means it's concatenated. It is actually internally represented as the concatenation of two strings, not a single sequence in memory. And so, if we break this apart, we can see that, like, yes, it's a concatenation of a single byte string or one byte string is what V8 calls it. And a slide string, which is represented by two byte strings. And we'll get into a bit why that's terrible naming. But the point that I'm trying to make here is JavaScript engines are going to be smart about how they internally represent strings.

3. Node.js Bug and UTF-8 Character Encodings

Short description:

In August 2024, a Node.js bug related to UTF-8 and character encodings was discovered. The bug reports indicated that the issue was sporadic and non-deterministic, leading to an investigation into how JavaScript engines and Node.js core were handling strings. A test script was created to reproduce the issue, which involved converting strings to buffers using UTF-8 and converting them back. The bug was traced to a specific pull request in the Node.js core repository related to buffers and one-byte strings.

So, anyway, let's look specifically at what happened in this Node.js bug that this talk is supposed to be about. So, like, this happened in August of 2024. You can see, go to the Node.js issue tracker. These are all from the Node.js core GitHub repo. And you can see that there are a ton of bug reports about something weird happening with UTF-8 and character encodings. And if you look at these reports in a bit more detail, there is something interesting about them that they all share. And this is, like, these specific snippets, like, it's flaky. It's not deterministic. It fails after it has already run for a byte. It happens sporadically.

So, like, that's very interesting when you look at a bug the way that happens. Because, like, most bugs are either, like, you know, something's broken or it's not. But here it seems like there's something more going on. And that actually leads into a very interesting rabbit hole about, you know, how JavaScript engines work these days and what Node.js core people are working on these days. But, like, as with any bug, let's start with trying to get a minimum of reproduction. So, like, I'm writing this test script. It's very simple. It takes the string. It converts it from a string to a buffer using UTF-8, converts it back using UTF-8. So, in theory, we should just get the same string back. However, apparently, that doesn't happen. And if we measure, like, how long it takes for that to happen, we can see that it is actually non-deterministic. It's, like, somewhere around 10,000 iterations of this loop. But not exactly. And so, it's still good enough as a reproduction to see whether this is failing or not, right?

So, we can do the next thing that you typically do when you have a bug that you don't really know where it's coming from. You do git bisect. And git bisect points to this particular pull request in the Node.js core repository. So, some change that has to do with buffers, with one by the strings, as they call it here in this PR and in the V8 API. And, like, yeah. So, this has something to do with it for sure.

4. Node.js Bug: Set Fast Method Calls

Short description:

The addition of set fast method calls in the Node.js code allows C++ methods to be exposed as JavaScript functions. There are three different methods for ASCII, Latin1, and UTF-8, which share the same implementation of the fast write string method. This shared implementation is significant in understanding the cause of the bug.

And we'll look a bit more into, like, the exact diff here. And so, there is this addition of set fast method calls where this is in the part of the file where C++ methods get exposed to JavaScript as JavaScript functions. And we can see that these different, three different methods, one for ASCII, one for Latin1, which is another name for ISO 88591, and for UTF-8, they share the same implementation of this fast method, which is just fast write string. But different ones of the slow one. And that's already interesting, because, like, you know, let's assume that these functions do what they're named after, right? And so, like, they convert JavaScript strings into byte sequences. They write them into a buffer. That's why it's called write string. And the fact that they all share the same implementation in the fast case, that's already interesting, right? So, if we look at that particular method a bit more, this fast write string, we kind of know where this bug is coming from. And we have all the puzzle pieces now at this point.

5. Node.js Bug: Fast API and Performance Optimization

Short description:

Node.js core contributors care about performance and the need for fast software. The introduction of set fast method calls in Node.js aims to optimize core work by eliminating boundary questions between C++ and JavaScript. This is done through the fast API, an experimental treasure trove of optimizations for JavaScript runtimes. However, the fast API has specific preconditions and only works one way, allowing calls from JavaScript to C++. It requires data to be laid out in a specific way and prohibits actions that trigger garbage collection.

So, what specifically happened here and why? So, first of all, this PR comes from the fact that Node.js core contributors care about performance. Part of that is the fact that in recent years, as y'all know, there have been alternative JavaScript server-side run times, or, you know, outside of the browser run times that are claiming to be faster. And I'm sure in some cases they actually are. Also, there's obviously some, like, actual advantage to having fast software. It means your applications are spending less time doing things that can translate to real savings if you run your application in the cloud and you suddenly need less resources. That's always very, very nice for your wallet.

So, doing this kind of work is also a good way to contribute to Node.js in a way. Because I know a lot of people want to. And, you know, you don't always have, like, the if you come with the idea of wanting to contribute, you don't always have the idea of, like, what you would want to contribute and what your areas that you would want to work on. But, like, making things faster is something that you can always do, you know, for most pieces of software, at least.

So, these set fast methods calls that we saw earlier, there is something that's called fast API embed. And we kind of need to understand what this specifically is to understand why this bug is happening and why it is happening in this specific way. So, like, in the past, one of the easiest ways to optimize Node.js core work has been to eliminate boundary questions between C++ and JavaScript. So, like, your JavaScript application calls into C++. Or vice versa, if there's some event happening on the event loop, which happens in C and C++, and it needs to inform your application that something happened, it needs to call back into JavaScript. And these have always been, like, fairly expensive parts of your applications. Because there's a lot of, like, setup work that JavaScript engines do before and after you enter one of these specific, you know, areas of your code, if you want to call it that. And so, to solve this problem, we had to introduce something called the fast API. As far as I know, it's still technically experimental. But, like, has been proving itself to be a real, you know, treasure trove of optimizations for JavaScript runtimes. And so, that eliminates a lot of this overhead of C++ and JavaScript boundary crossings. It only works one way. So, you can only call C++ from JavaScript code, not the other way around. But it has some preconditions. Like, you cannot always just make everything use the fast API and be done with it and, you know, never worry about it again. It only applies in some very specific situations that your JavaScript engine can detect at runtime. So, like, your data needs to be laid out in a way that allows for this optimization. The fast API function may not do anything that would trigger garbage collection. So, like, no creating new objects. No creating new strings, you know.

6. Node.js Bug: Fast Write String Method

Short description:

This bug in Node.js occurs when a particular function, buffer.from, is called a lot in a separate thread, triggering optimization. The bug arises due to the incorrect implementation of the fast write string method, which copies byte sequences from one byte string to another, causing issues when used with different character encodings.

And it cannot call back into JavaScript. Because that could obviously also trigger garbage collection. Because you don't know what your JavaScript functions do. And so, the idea is, you write two implementations of a C++ function. One that is slow and one that is fast. And the fast calls are only being used after V8 detects that your code has run a lot. And so, it needs to optimize it. It wants to optimize it. And it recompiles your JavaScript function that was called a lot to be able to make use of the fast API. And one particular thing that's worth calling out about this is that V8, again, is a very smart piece of software. It may do this asynchronously.

So, like, it may take your JavaScript code, notice it has been called a lot, notice that it should be optimized, but then do that optimization on another thread in your application. So, like, it doesn't stop your code from running. It just, like, says, hey, this function needs to be optimized, recompile it, hands it off to another thread. And once it's done, it starts using that new implementation of your function. Super smart, really, if you think about it. And so, we go back to the example from earlier, and we look at this, and it kind of all comes together, right?

So, like, the reason why this happens after a flaky number of iterations and only after a bunch of them is that this function, this buffer.from in this case, needs to be called a lot in order to make the engine realize, hey, it's worth spending time to optimize this. It's worth spending energy on. Because it happens in a separate thread, it only happens after an unknown, but undeterministic number of iterations. So, that is definitely where this bug comes from. And so, I told you there's, like, two of these string representations that are worth paying special attention to, right? And so, one of them, one byte string, that's V8's name for ISA88591, and two byte string, that's V8's name for UTF-16. And those are just, like, the, so, six stands for sequential. So, those are the sequential representations where a string is really just, like, one sequence of bytes in memory without any special substring or slice or, you know, concatenation or anything like that.

So, if we look again at this slide from earlier, where we looked at the implementation of the fast write string method, we can see, okay, so, this fast write string method has the prerequisite, the second argument it gets, is a one byte string. And if you remember the previous slide, that means it's an ISA88591 string. And so, it copies this into the destination from the source, from this ISA88591 string, and this method doesn't do any other checking. So, if you remember, we were using this implementation for UTF-8 as well. And so, that's not what it should be doing. It just literally copies byte sequences from something that is one thing into something that's supposed to be something else. That's also why this bug, for example, didn't reproduce with all the characters.

7. Node.js Bug: Lessons Learned

Short description:

This bug occurs due to the mismatch between UTF-8 and ISA88591 encodings, where certain characters trigger the bug while others are preserved. The bug was not caught during testing due to limitations in coverage and the absence of result checks in benchmarks. The Node.js testing process also failed to catch the bug, as it only occurs when a specific piece of code runs frequently. There are several lessons to learn from this, such as the importance of naming conventions and the need to thoroughly test code changes beyond coverage analysis.

Like, if you look at this slide again, you see it only happened with the accented E, where, for example, the other characters were, you know, preserved, because those are the ones where UTF-8 and ISA88591 actually differ. This also, for example, wouldn't happen if the string contained any Chinese characters or emoji, because then it wouldn't have this prerequisite fulfilled of being a fast one byte string, like something that is representable using ISA88591. And so, yeah, that's where this comes from.

And you obviously would be completely right to think, like, something should have caught this, right? It's very unfortunate. So, like, if you look at the coverage report on the pull request that I referenced, which, by the way, I absolutely don't want to blame the person who opened that pull request. This is something that is very easy to miss. Yeah, the coverage PR, the coverage data for that PR, it indicates that the new code paths were taken during testing. They just, like, didn't happen to be taken in the test that would cover these specific encoding questions. So, coverage doesn't really give you the answer here. There were also benchmarks that actually would have caught this issue, but, you know, you're writing benchmarks, those aren't tests, and so there was nothing in there that checked the results of these operations, which, you know, if you're writing benchmarks, that's absolutely understandable.

And Node.js also has this cool thing where it actually does something where it runs a bunch of popular NPM packages against a development build of Node.js. So, again, this is something that happens before any release goes out on the Node.js core repo. Again, there were some failures being caught. They were all looked at and none turned out to be related to this particular bug, most likely because this particular bug only is being triggered if, you know, a particular piece of code runs a lot, and that just didn't happen to be the case for any of the tests in these NPM packages. It's unfortunate, so this actually made it into a release. There are a bunch of things that we can learn from it, though, I think, and so some very direct software engineering things, you know, you've heard it before, naming is a hard problem. If one byte string had been called something like latin1str, or iside8591str, this might not have happened because the assumption that it would only be triggered for ASCII strings wouldn't even have been a thing. Also, coverage is only very limited, and what it tells you, you need to actually test the change that you're making and not just make sure that your new and old code paths are covered.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Node Congress 2022

26 min

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Top Content

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.

node.js security

ESM Loaders: Enhancing Module Loading in Node.js

JSNation 2023

22 min

ESM Loaders: Enhancing Module Loading in Node.js

Top Content

Gil Tayar

Microsoft, Israel

ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.

node.js

Towards a Standard Library for JavaScript Runtimes

Node Congress 2022

34 min

Towards a Standard Library for JavaScript Runtimes

Top Content

James Snell

Workers team @Cloudflare

There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.

javascript component library node.js

Out of the Box Node.js Diagnostics

Node Congress 2022

34 min

Out of the Box Node.js Diagnostics

Colin Ihrig

Member of the Node.js Technical Steering Committee

This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.

node.js

The State of Node.js 2025

JSNation 2025

30 min

The State of Node.js 2025

Matteo Collina

Node.js TSC committee member. Pino & Fastify author.

The speaker covers a wide range of topics related to Node.js, including its resilience, popularity, and significance in the tech ecosystem. They discuss Node.js version support, organization activity, development updates, enhancements, and security updates. Node.js relies heavily on volunteers for governance and contribution. The speaker introduces an application server for Node.js enabling PHP integration. Insights are shared on Node.js downloads, infrastructure challenges, software maintenance, and the importance of update schedules for security.

node.js

Node.js Compatibility in Deno

Node Congress 2022

34 min

Node.js Compatibility in Deno

Bartek Iwanczuk

Deno core team member

Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

node.js deno js runtimes

Workshops on related topic

Node.js Masterclass

Node Congress 2023

109 min

Node.js Masterclass

Top Content

Workshop

Matteo Collina

Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate

node.js

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

Top Content

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

node.js cloud graphql fastify

Building a Hyper Fast Web Server with Deno

JSNation Live 2021

156 min

Building a Hyper Fast Web Server with Deno

Workshop

2 authors

Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

node.js deno backend

0 to Auth in an Hour Using NodeJS SDK

Node Congress 2023

63 min

0 to Auth in an Hour Using NodeJS SDK

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher

javascript node.js authentication

GraphQL - From Zero to Hero in 3 hours

React Summit 2022

164 min

GraphQL - From Zero to Hero in 3 hours

Workshop

Pawel Sawicki

How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.

node.js web development graphql beginner friendly

Mastering Node.js Test Runner

TestJS Summit 2023

78 min

Mastering Node.js Test Runner

Workshop

Marco Ippolito

Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.

testing typescript node.js