English versionEN

Parsing Millions of URLs per Second

Member of Node.js Technical Steering Committee

With the end of Dennard scaling, the cost of computing is no longer falling at the hardware level: to improve efficiency, we need better software. Competing JavaScript runtimes are sometimes faster than Node.js: can we bridge the gap? We show that Node.js can not only match faster competitors but even surpass them given enough effort. URLs are the most fundamental element in web applications. Node.js 16 was significantly slower than competing engines (Bun and Deno) at URL parsing. By reducing the number of instructions and vectorizing sub-algorithms, we multiplied by three the speed of URL parsing in Node.js (as of Node.js 20). If you have upgraded Node.js, you have the JavaScript engine with the fastest URL parsing in the industry with uncompromising support for the latest WHATGL URL standard. We share our strategies for accelerating both C++ and JavaScript processing in practice.

This talk has been presented at Node Congress 2024, check out the latest edition of this JavaScript Conference.

FAQ

Optional components of a URL include the username, password, hostname, port number, pathname, search query, and hash. Even the hostname is optional if you have a file URL.

The ADA URL parser is significantly faster than alternatives, capable of parsing 6 million URLs per second, and is 6-7% faster than curl. It achieves these results through various optimizations and efficient coding practices.

The URL specification supports various types of URLs including non-ASCII format URLs, file-based URLs, JavaScript URLs, percent-encoded URLs, and URLs with IPv4 and IPv6 addresses.

The ADA URL parser's benchmark and source code are available on GitHub at github.com/adurl/gs-url-benchmark.

Elzen Zipli is a senior software engineer at Sentry, an OGS technical steering committee member, and an OpenJS foundation cross-project council member.

The purpose of the talk by Elzen Zipli is to discuss how to parse millions of URLs per second and to explain the improvements in URL parsing performance in Node.js.

Node.js 18 introduced a new URL parsing dependency which resulted in performance improvements of up to 400% in URL parsing.

ADA URL parser is a high-performance URL parser named after Elzen Zipli's daughter. It supports the full WhatWG URL specification, has no dependencies, and is highly portable. It can parse 6 million URLs per second and is used by Node.js and Cloudflare workers.

The ADA URL parser uses several optimizations, including perfect hashing, memoization tables to reduce the number of branches, and vectorization to process multiple bytes at once instead of one by one.

You can reach Elzen Zipli via his GitHub account at github.com or via X (formerly known as Twitter) at X.com.

node.js

Yagiz Nizipli

14 min

04 Apr, 2024

Comments

Video Summary and Transcription

Today's talk explores the performance of URL parsing in Node.js and introduces the ADA URL parser, which can parse 6 million URLs per second. The ADA URL parser includes optimizations such as perfect hashing, memoization tables, and vectorization. It is available in multiple languages and has bindings for popular programming languages. Reach out to Ada URL and Daniel Lemire's blog for more information.

Available in Español: Analizando Millones de URLs por Segundo

1. URL Parsing and Performance

Short description:

Today's talk is about parsing millions of URLs per second and achieving a 400% improvement. We will explore the state of Node.js performance in 2023 and the impact of a new URL parsing dependency. We'll also discuss the structure of a URL and the various components involved.

Hello. Today I'm going to talk about parsing millions of URLs per second. My name is Elzen Zipli and I'm a senior software engineer at Sentry. I'm an OGS technical steering committee member. I'm an OpenJS foundation cross-project council member. You can reach me from my GitHub account, github.com, and from X, formerly known as Twitter, from X.com.

Software performance in the last decade has changed drastically. The main goal was to reduce cost in cloud environments such as AWS, Azure or Google Cloud. The latency has been a problem, and in order to improve it, we need to optimize our code more than ever right now. Reduce complexity, parallelism, caching, and performance brings those kinds of things. And most importantly, the climate change. Faster computers resulted in better tomorrows and better climate.

So state of Node.js performance 2023. This is a quote from that. Since Node.js 18, a new URL parsing dependency was added to Node.js 8. This addition bumps the Node.js performance from parsing URLs to a new level. Some results could reach up to an improvement of 400%. State of Node.js performance 2023 and this is written by Rafael Gonzaga, which is a Node.js technical steering committee member. This talk is about how we reach 400% improvement in URL parsing. Another quote from James Snell from Cloudflare and also Node.js TSC. Just set a benchmark for a code change, go from 11 seconds to complete down to about half a second to complete, this makes me very happy. This is in reference to adding Ada URL to Cloudflare.

So let's start with the structure of a URL. For example, there is HTTPS user pass at example.com, 1 2 3 4, which is the port number, then we have Foo, Bar, Buzz, and QUU. So it starts with the protocol, HTTPS is the protocol, it ends with the slash. Then we have the username and password. This is an optional field in all URLs. Then we have the host name, which is example.com. Then we have the port, which is 1 2 3 4. And then we have the path name, which is slash Foo slash Bar.

2. URL Parsing and Assumptions

Short description:

URLs have various optional components, different encodings, and support for different types of URLs like file-based URLs, JavaScript URLs, and path names with dots. Implementations like PHP, Python, curl, and Go follow different URL parsing specifications. We challenge the assumptions that URL parsing doesn't matter and URLs are free.

And then we see the search, which starts with a question mark Buzz. And then we have the hash, which is QUU. So port number, path name, search, hash, username, password, they're all optional. Even host name is optional if you have a file URL. But this is just an example about how this structure of a URL is. There are also, despite the structure of the URL, there is also different encodings that the URL specification supports, such as non-ASCII format, which is the first one. Then we support file-based URLs, which is what you see in Unix-based systems, file, slash, slash, slash, Foo, Bar, Buzz, Foo, Bar, Test, Node.js. Then we have JavaScript URLs, which is JavaScript colon alert. Then we have percent encoding that starts with a URL that has subsections, sub strings that has a percentage character in dash. And then we have path names with dots, which is like example.org slash dot A slash A dot dot slash B, which basically resolves into a different URL according to the URL specification. Then we have IPv4 addresses with hex and octal digits, 127.0.0.0.0.0.0.0.1, which is 127.0.0.0.1. And we have also IPv6 and so on and so forth. According to what we do URL, if we put in this input string, HTTPS711home dot dot slash Montreal. PHP in PHP, it's unchanged. In Python, it's unchanged. In what we do URL, which is implemented by Chrome, Safari and all the browsers, including Ada, it's xn dash dash 711 and so on so forth. In curl, it's a lot different. And as you see, in Go runtime, it's a lot different as well. This is mostly because of different implementations and also from all other subsystems, all other languages doesn't follow the what we do URL strictly. For PHP and Python, they basically parse the URL from start and string without any making allocations. And for curl and Go, they implement a different specification called RFC 3787. Or similar, I'm not really quite sure about. So we have these old assumptions like does URL parsing really matter? Is it the bottleneck to some performance metric? URLs are free, you don't gain anything by overlaying. This is what, these were the assumptions that we broke with our work. And you will see why.

3. Creating HTTP Benchmark

Short description:

Let's create an HTTP benchmark using Festify to test the assumptions. Two endpoints are used, one returning the URL unchanged and the other parsing it with new URL and returning the corresponding href. The results of the comparison are shown at the bottom.

So let's create, let's, let's understand if these assumptions are true. Let's create the HTTP benchmark using Festify. And there are two endpoints that you get from using a post, which is slash simple. It basically has a URL in the JSON body and the other one as well. But in the first one, we are not returning, we are returning the URL without doing anything. In the second one, we are parsing it with new URL. And then we are returning the href, which is the string that corresponds to it. And then at the bottom, you will see the example input that we sent and the comparison between them.

4. Overview of ADA URL Parser

Short description:

In slash simple, we almost have 60,000 requests per second. But if we parse it, we have around 50,000, 50, 52, maybe 52,000. URL parsing was a bottleneck in node 18.50. Announcing ADA URL parser, named after my daughter's Ada Nisiply. It's a full what-vg URL supported parser with no dependencies or ICU, over 20,000 lines of code, used by Node.js and CloudFlare workers, and can parse 6 million URLs per second. It's faster than alternatives in C, C++ and ROS.

In slash simple, we almost have 60,000 requests per second. But if we parse it, we have around 50,000, 50, 52, maybe 52,000. So URL parsing was a battle bottleneck in node 18.50. So this is actually run on node 18.50, before ADA, before any optimizations that are done in URL parsing.

So announcing ADA URL parser, it's named after my daughter's, Ada Nisiply. It's a full what-vg URL supported parser. There is no dependency, full portability. This means that this it doesn't include ICU. It's over 20,000 lines of code. It's a six month's work of 25 contributors. It's Apache 2.0 and MIT license. It's available at github.com. It's right now used by Node.js and CloudFlare workers.

Overall, it can parse 6 million URLs per second. This benchmark that I'm sharing with you right now is run on Apple M2, LLVM 14. It has a wide range of realistic data sources. And it's faster than alternatives in C, C++ and ROS. And for libraries that implements what-vg URL, it's a lot faster as well. So on the right side, you will see Wikipedia 100k, which is 100,000 URLs parsed and analyzed from the Wikipedia domain. Top 100 is the most traffic getting websites in the world. And the third one is Linux files. Basically we crawl the Linux operating system and stored every path. Then we have user base and the HTTP that we found from the internet. As you can see, Ada is almost twice as fast from the second alternative. It's around 6 to 7% faster than curl right now. So in order to do that, we have some tricks that in general will give you an idea about how we achieve these amazing results.

5. Optimizations for URL Parsing

Short description:

Trick one is perfect hashing. We reduced the number of branches. Memoization tables were used to reduce if statements and store already parsed values. Vectorization allows processing 16 by 16 elements at a time. The code base is improved by 60, 70, 80 percent. A JavaScript benchmark is available for testing. Ada C++ library is safe and efficient.

Trick one is perfect hashing. This means that we reduced the number of branches. And if you can see we have names of an array of string view, HTTP, HTTPS, W, FTP, WSS file. Then we have these contexts called HTTP, not special URLs, HTTPS, WS, which corresponds to web sockets, FTP, WSS and file. These types correspond to what-vg URL scheme types. And in order to get the scheme type from here, we used an algorithm to perfectly hash, perfectly find the correct position inside the names array of the input that we have. And this is one of the examples.

The second trick is of course memoization tables. In order to reduce the number of branches and reduce if statements, what we did was that we used bitwise operations as well as getting the already parsed values from a table itself. By doing that, we have an is bad charge table that contains of 255 characters and it stores zero or one according to if it's a bad character or not. This is a really great example about improving the performance of a function and within the cost of increasing the size of the binary as well.

The third one is use vectorization. So do not process byte by byte when you can process 16 by 16. New processors in the world right now support 16 by 16 vectorization iteration through the array, so we don't need to iterate one by one. And for example, this example has tabs or a new line. In order to understand if a particular string has a tab or a new line character, we use the following example. I'm not going to dive into this for the sake of today, but the information is out there and there are optimizations available to increase the iteration and execution time of a basic for loop with such certain tricks. So on top of these efficient C++ JavaScript bridge, these optimizations, we provided an efficient bridge between the JavaScript and the C++ implementation. This is done particularly for Node.js integration so that the serialization costs the string to string conversion from C++ to JavaScript is reduced as much as possible. So passing multiple strings is expensive, and pass one string with offset. So basically we have an href and we return eight different integers that correspond to protocol end, username end, host start, host end and so on and so forth. If we know protocol end, then you can take the substring of the href by taking zero to protocol end for example, and if you have a username and so on and so forth. This is kind of the optimizations that improves the code base by 60, 70, 80 percent.

Here's an example JavaScript benchmark. It basically takes lines and it tries to parse it and it adds the length of the href to a value and then it counts the good URLs and the bad URLs. This is done to eliminate just JITs compiler optimization so that that code to disable that code elimination in V8. The benchmark is available at github.com slash adurl slash gs url benchmark and please take a look at it and if there's something that we missed, please take the time to create an issue on the GitHub repository. This particular benchmark on node 18.15.0 ran around 0.8 million URLs per second. At that time dno 1.32.5 was doing 0.9 million, bun 0.5.9 was around 1.5 million, and on node 20.1.0, it's right now 2.9 million URLs per second. The Ada C++ library is safe and efficient.

6. Testing, Language Availability, and Contact

Short description:

We wrote it in modern C++. We extensively tested with sanitizers and fuzzing. Minor bugs were quickly fixed. Ada is available in multiple languages including JavaScript (Node.js) and has bindings for Rust, Go, Python, and R. Reach out to Ada URL and Daniel Lemire's blog for more information.

We wrote it in modern C++. We tested extensively. We tested with sanitizers. We did fuzzing testing. We have lots of unit tests which particularly contributed to web platform tests as well.

A few minor bugs were reported in the past couple of months, mostly related to the standard. We quickly fixed under 24 hours.

Ada is available in the language of your choice. In JavaScript, it's available in Node.js. We have C bindings at GitHub. We have Rust, Go, Python, and R. Often, this is the only way to get support in those particular languages.

Thank you for listening. You can reach out to Ada URL from Ada URL.com. You can reach out to my blog at and you can reach out to my co-author, Daniel Lemire's blog from lemire.me. Thank you.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Node Congress 2022

26 min

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Top Content

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

The talk discusses the importance of supply chain security in the open source ecosystem, highlighting the risks of relying on open source code without proper code review. It explores the trend of supply chain attacks and the need for a new approach to detect and block malicious dependencies. The talk also introduces Socket, a tool that assesses the security of packages and provides automation and analysis to protect against malware and supply chain attacks. It emphasizes the need to prioritize security in software development and offers insights into potential solutions such as realms and Deno's command line flags.

node.js security

ESM Loaders: Enhancing Module Loading in Node.js

JSNation 2023

22 min

ESM Loaders: Enhancing Module Loading in Node.js

Top Content

Gil Tayar

Microsoft, Israel

ESM Loaders enhance module loading in Node.js by resolving URLs and reading files from the disk. Module loaders can override modules and change how they are found. Enhancing the loading phase involves loading directly from HTTP and loading TypeScript code without building it. The loader in the module URL handles URL resolution and uses fetch to fetch the source code. Loaders can be chained together to load from different sources, transform source code, and resolve URLs differently. The future of module loading enhancements is promising and simple to use.

node.js

Towards a Standard Library for JavaScript Runtimes

Node Congress 2022

34 min

Towards a Standard Library for JavaScript Runtimes

Top Content

James Snell

Workers team @Cloudflare

There is a need for a standard library of APIs for JavaScript runtimes, as there are currently multiple ways to perform fundamental tasks like base64 encoding. JavaScript runtimes have historically lacked a standard library, causing friction and difficulty for developers. The idea of a small core has both benefits and drawbacks, with some runtimes abusing it to limit innovation. There is a misalignment between Node and web browsers in terms of functionality and API standards. The proposal is to involve browser developers in conversations about API standardization and to create a common standard library for JavaScript runtimes.

javascript component library node.js

Out of the Box Node.js Diagnostics

Node Congress 2022

34 min

Out of the Box Node.js Diagnostics

Colin Ihrig

Member of the Node.js Technical Steering Committee

This talk covers various techniques for getting diagnostics information out of Node.js, including debugging with environment variables, handling warnings and deprecations, tracing uncaught exceptions and process exit, using the v8 inspector and dev tools, and generating diagnostic reports. The speaker also mentions areas for improvement in Node.js diagnostics and provides resources for learning and contributing. Additionally, the responsibilities of the Technical Steering Committee in the TS community are discussed.

node.js

The State of Node.js 2025

JSNation 2025

30 min

The State of Node.js 2025

Matteo Collina

Node.js TSC committee member. Pino & Fastify author.

The speaker covers a wide range of topics related to Node.js, including its resilience, popularity, and significance in the tech ecosystem. They discuss Node.js version support, organization activity, development updates, enhancements, and security updates. Node.js relies heavily on volunteers for governance and contribution. The speaker introduces an application server for Node.js enabling PHP integration. Insights are shared on Node.js downloads, infrastructure challenges, software maintenance, and the importance of update schedules for security.

node.js

Node.js Compatibility in Deno

Node Congress 2022

34 min

Node.js Compatibility in Deno

Bartek Iwanczuk

Deno core team member

Deno aims to provide Node.js compatibility to make migration smoother and easier. While Deno can run apps and libraries offered for Node.js, not all are supported yet. There are trade-offs to consider, such as incompatible APIs and a less ideal developer experience. Deno is working on improving compatibility and the transition process. Efforts include porting Node.js modules, exploring a superset approach, and transparent package installation from npm.

node.js deno js runtimes

Workshops on related topic

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking

Node Congress 2025

98 min

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking

Featured Workshop

2 authors

Large Language Models (LLMs) are powerful, but they often lack real-time knowledge. Retrieval-Augmented Generation (RAG) bridges this gap by fetching relevant information from external sources before generating responses. In this workshop, we’ll explore how to build an efficient RAG pipeline in Node.js using RSS feeds as a data source. We’ll compare different vector databases (FAISS, pgvector, Elasticsearch), embedding methods, and testing strategies. We’ll also cover the crucial role of chunking—splitting and structuring data effectively for better retrieval performance.Prerequisites- Good understanding of JavaScript or TypeScript- Experience with Node.js and API development- Basic knowledge of databases and LLMs is helpful but not required
Agenda📢 Introduction to RAG💻 Demo - Example Application (RAG with RSS Feeds)📕 Vector Databases (FAISS, pgvector, Elasticsearch) & Embeddings🛠️ Chunking Strategies for Better Retrieval🔬 Testing & Evaluating RAG Pipelines (Precision, Recall, Performance)🏊‍♀️ Performance & Optimization Considerations🥟 Summary & Q&A

node.js database

Node.js Masterclass

Node Congress 2023

109 min

Node.js Masterclass

Top Content

Workshop

Matteo Collina

Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate

node.js

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

Top Content

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

node.js cloud graphql fastify

Building a Hyper Fast Web Server with Deno

JSNation Live 2021

156 min

Building a Hyper Fast Web Server with Deno

Workshop

2 authors

Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

node.js deno backend

0 to Auth in an Hour Using NodeJS SDK

Node Congress 2023

63 min

0 to Auth in an Hour Using NodeJS SDK

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher

javascript node.js authentication

GraphQL - From Zero to Hero in 3 hours

React Summit 2022

164 min

GraphQL - From Zero to Hero in 3 hours

Workshop

Pawel Sawicki

How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.

node.js web development graphql beginner friendly