1. Introduction to Node Congress 2024
Hello and welcome to Node Congress 2024. NearForm focuses on delivering modern and elegant solutions. Paolo introduces himself and talks about HTTP versions. Node has stable implementations for HTTP 1 and 2, and they're working on HTTP 3.
Hello and welcome to Node Congress 2024. This is Milo, a new HTTP parser for Node.js.
First of all, let me introduce NearForm. We are a professional services company which is focused on delivering the most modern, performant, and elegant solutions to our digital partners. We are active in several countries in the world and we're always looking for new talents, so please apply.
Being reckless sometimes pays off. Why is that? Let me prove it to you. First of all, I want to introduce myself. Hi again, I am Paolo. I'm a Node Technical Steering Committee member and Staff Dx Engineer at NearForm. You can find me online at the end of the slide that you can see. And also on the right hand side you see where do I come from. I come from Campobasso in Italy, in the smallest region which is Molise that the rest of Italy pretend does not exist. But it's their loss, not mine. Go on.
We all love HTTP. Why is that? Because it's the most pervasive and mostly used protocol ever. Which version are you? Well, the thing is that despite being 30 years old, only three versions of HTTP actually exist. Two were only draft, 09 and 10, so I don't count them as the existing version. The ones that are made to the final version are 11, 2 and 3. 11 is by far the most used, is the historical one, is the one that you also probably know and is still in place and will not go anywhere anytime soon. 20 was actually created to address some of the problems of the TCP socket by using the speedy protocol. Thus, the results were not really successful. Now we also have 3, which instead use QUIC, which use UDP, which makes things more complex, especially for system administrators. I'm sorry for you folks, really.
What about Node? Node has a stable implementation for HTTP 1 and HTTP 2. In that case you're good to go. About HTTP 3, we're still not quite there yet. We are still working on the QUIC implementation, but we will get there. That's a promise.
2. HTTP Parsing and Introduction to Milo
Now focus on the topic of this talk, which is HTTP parsing. The current Node HTTP parser is called LLHTTP, written by Fedor Indutny in 2019. It is the default since Node 12 and works brilliantly. LLHTTP is backward compatible with HTTP 09 and 10, which brings unnecessary complexity and vulnerabilities. To address these problems, Milo was developed as a solution. Milo is written in Rust, a flexible and performant language. The choice of Rust was deliberate to explore its potential for contributing to Node with Rust code.
Now focus on the topic of this talk, which is HTTP parsing. What is the current Node HTTP parser as of today? It is called LLHTTP. It has been written by Fedor Indutny in 2019 and it is the default since Node 12. It works brilliantly. On the right hand side you can see the state machine that it actually uses, made of 80 states, so it's very, very complex. The magic is in its founding parsing machine generator, which is LLParse. LLParse gets input state machine definition in TypeScript, which has a very specific subset of the oval language, and generates a C state machine. In other words, LLParse transpiles from TypeScript to C. Bad signs today. You can easily see how such a transpiler can be hard to debug and to release. Also, in addition, LLHTTP has been always backward compatible with HTTP 09 and 10, and this brings unnecessary complexity to address edge cases. It also has been lenient and tolerant against broken HTTP implementation, like, I don't know, usually embedded device or similar. This is very dangerous because it opens the door to vulnerabilities and other backdoors and so forth. These are usually the problems of LLHTTP, which brought me to the decision to write Milo, as you will see in a bit. Milo is the solution, of course, otherwise you wouldn't be here, so of course we have a solution. We start fresh. Sorry for the horrible pun, I really apologize. This is Milo. Not the Milo that you actually expected, but this guy was also Milo. What you're seeing is a, for people that don't know that, is a Tamiya, basically it's a Tamiya squirrel, sorry, it's a Japanese squirrel, and this one in particular was named Milo. It was one of my wife, which at the time was girlfriend's pets, and it was also the very first one I chose to name my new software against. Basically, I have now the habit of naming my software against my current or former pets, and I have plenty of them. You know, cats, dogs, horses, fishes, you name it, whatever. Anyway, this is Milo, or a Milo. I will show you the other Milo in a bit. Actually speaking of the last Milo, the one that you're actually here for, let's drop the bomb. Milo is written in Rust, period. Why is that? The language has been proven flexible and powerful and performant to achieve this specific task. It's low level to the performances, but it's not low level on the definition. For instance, I did not know Rust at all before writing Milo, and I purposely made this choice, I made an experiment with myself to see how hard it would be for a new contributor to embrace Rust in order to contribute to Node if Node contains some Rust code.
3. Milo's Architecture and Code Generation
I'm not here to start a new language flame or criticize LLHTTP or Feather. Milo's architecture is inspired by LLHTTP but with a simpler state machine. Leveraging Rust macros, Milo generates Rust code from YAML files, enabling flexibility and powerful code generation. The Rust macro system is incredibly powerful, allowing for code generation without limitations. Milo has a small memory footprint and provides compiled code that resembles the original state machine.
It's not that hard. It can be done. Also, I am not here to start a new language flame. I'm not a troll, or at least not too much. So, be gentle, please. Do not start a language flame.
Also, I'm not criticizing LLHTTP or Feather at all, because Feather is a very good person, and LLHTTP was an amazing and performant parser that I loved. Its architecture is the inspiration and the basis of Milo, obviously. I still have a state machine, but much simpler, I have much less states. We dropped from 80 to 32, and I chose to use a declarative way to write states, using no code restriction in Rust.
Now, it's not black magic. Nothing is black magic in IT, right? I just leverage the macros. The Rust macro system is one of the most powerful, if not the most powerful, I have ever seen. Basically, the idea that you can think of is that before getting to completion of your code, if you use a macro, I'm not talking about procedural macros, you can basically run another Rust part that generates Rust code that is eventually compiled. So, basically, the macro in Rust will produce Rust code, but you don't have any code limitation, you can do whatever you want. For instance, I load the list of methods, states, and so forth, from YAML files that are not present or embedded at runtime, because they're passed at compile time, I generate Rust code, and I create a Rust executable.
Also, there is a tool which is made specifically to debug the procedural macro in Rust, because in Rust, you usually use the scene and quote crates, and with cargo span, you can see what these crates are actually doing with your code, so you can have an additional step before the compile time. Examples are usually more than a thousand words. Let's take a look. So, even if you're not a Rust programmer, you can easily resemble the similarity between the left hand side and right hand side. Now, on the left hand side, you have an actual state in Milo, which is the state after receiving a chunk and going back to the next chunk length, eventually, if you receive a slash r slash n. Otherwise, if you receive other two characters that you're not expecting, you fail the parsing. Or, if you don't have at least two characters to make the comparison, so you just usually have one, you suspend execution, so you stop parsing for now, and you return to the caller. On the right hand side, after transpiling the macros, which for the record are the ones that end in the exclamation mark, you have the compiled code. Basically, state becomes a function with a specific signature that you don't have to remember because it's implicit in the macro. CR LF becomes slash r slash n, but with expanded syntax that Rust is expecting. Same goes with MOV2 that becomes MOV2 parser, a state, and a size. Finally, you can also return a constant. But you don't have to remember all these details because the macros will do that for you. Now, what is the memory? We know that Milo is performance, right? But what about the memory footprint? Milo has a very small memory footprint.
4. Data Copying in Milo
Milo allows developers to opt-in for copying data being parsed, making it easier and improving developer experience. By default, Milo prioritizes performance and does not copy any data. The opt-in feature automatically copies the unconsumed part of the buffer for the next iteration.
First of all, of course, I have to remember some flags, some counters, usually 32 bits, and a few 64-bit counters, so probably less than a few hundred bytes. That's it.
Then, if the developer opts in for this behavior, which is disabled by default, you can eventually copy some of the data which is being parsed in Milo. By default, Milo does not copy any data. The data is analyzed on the fly and returned to the caller without even copying the memory pointer.
If you opt in, rather than returning the unconsumed number of bytes to the caller, Milo will eventually opt in to copy the unconsumed part of the buffer and prepend it on the next iteration of the execution. Rather than remembering you to do that, Milo will do that for you automatically. This is just developer experience. It's a totally opt-in, and it can make life easier in some cases. Otherwise, by default, Milo goes for performance.
5. Milo in Action
Milo follows the latest RFCs for HTTP strictly, without any exceptions. The main function in Rust creates a parser using the milocreate method and sets callbacks for data handling. The payload consists of a data offset and size, and the parse method is called to process the data. The common interface approach is used across different languages for consistency. Running the example with Cargo prints the debugging information.
Also, Milo cuts backward compatibility and latency. We are strict. Period. We follow the latest RFCs for HTTP, which are 91.10 and 91.12. By the letter, we don't have any exception. Period. We should not, should, should, finger crossed, have any vulnerability by following the spec to the letter.
Now, I think that you're pretty tired of hearing me speaking, so let's get to the action. Let me show you Milo in action. This is Rust. Now, even if you don't know Rust, you should be capable of following my explanation. Let's just look at the main function. You create a parser using the milocreate method. You declare a message, and then you set a certain number of callbacks. These callbacks all have the same signature. They return a pointer and eventually a payload. The payload is made of a data offset to the buffer input and the size. So basically it's a pointer and a length. If there is no payload, data and size will both be zero, so they are easy to detect in the code.
Once you have that, you can eventually play with the data and print at runtime. Done. Last, you call the parse method. Now, I should have imported parse in the useMilo, but you trust me on this. I forgot. Anyway, if you're a Rust programmer, you're wondering why I haven't implemented parse as an implementation of the parse struct. There is a reason, which comes from WebAssembly and also C++, and I will show you in a bit. But there is a reason for that. I try to have a common interface in all three different possible languages rather than three different implementations. I know it's not very sound for Rust, but you know. If you run the example above with Cargo, you will see the printing of the debugging information.
6. C++ Workflow in Milo
The C++ workflow in Milo is straightforward. Cargo supports generating static libraries, which can be statically linked in any C or C++ executable. Cbindgen, created by Mozilla, generates fully-working C or C++ header files from a small TOML file.
So the position of, for instance, the status code, the header name, the header value and the body, and the payload. If you are a Node developer or contributor, you know that Node uses either JavaScript or C++. So in this case, we are worrying about C++, and I got you covered, don't worry. The C++ workflow is pretty easy.
First of all, Cargo out-of-the-box supports generating static libraries. These static libraries, for instance, in Mac OS and Linux are file.ai, can be statically linked in any C or C++ executable. So basically, statically imported. And we have a tool that generates the headers files, it's called Cbindgen, created by Mozilla, and generates a fully-working C or C++ header file out of a very small, close to 3-4 lines of code, TOML file. Done. Very easy.
7. C++ Version and WebAssembly in Milo
And this is the C++ version. You do a parser, declare a message, set a callback. The only thing is that in this case, reinterpret casts make the code slightly harder to read. At the end of the day, you do Milo parse with the parser, the message, and specify the length. C++ with Milo, with main and example. That's it. Now, let's look at the fun story. Node supports several architectures, but SmartOS or Solaris, the Rust support is at experimental level. So we chose WebAssembly as a different solution. WebAssembly has always been a first-class citizen in Rust. Now we have Wasm-BinGen, which makes it easy to generate a fully working JS package with WebAssembly file and JavaScript glue code. This is Milo in Node.js with WebAssembly. We import Milo, prepare a message, and play with memory. WebAssembly exports a shared memory space between WebAssembly and JavaScript layer.
And this is the C++ version. Now, even if you're not a C++ programmer, you can easily spot the similarity with Rust 1. You do a parser, you declare a message, and then you set a callback. You can recognize how the signature is the same. The only thing is that in this case, reinterpret casts are making the code slightly harder to read, but you get the idea.
At the end of the day, we do the same that we did in Rust, and with the same signature, you do Milo parse with the parser, the message, and in this case, you also have to specify the length. This is just a thing quite common in C++. And this is the output. C++ with Milo, with main, and example. That's it, and you still get the body when you execute the code.
Now, that's a fun story. When I originally presented Milo at the NodeCollab Summit in Bilbao in September of last year, I thought that people would be happy that I gave them a way to embed Milo out of a static library and a header file without having to install the Rust toolchain. That was not the case. That was quite the opposite. They wanted to install the Rust toolchain rather than downloading a compiled file. But that was not really the issue. The real issue that we all found while speaking together is that Node supports several architectures.
And unfortunately, one of these architectures is SmartOS, which is a dialect of Solaris. Unfortunately, for SmartOS or Solaris, the Rust support is at experimental level. So there is support, but not very extensive. And therefore, we chose not to include this kind of thing in Node in a production environment, mission-critical, and so forth. We need to find a different solution, which is WebAssembly. WebAssembly has always been a first-class citizen in Rust. Rust has always been able to compile to the Wasm architecture. Now we also have a toolchain, which is Wasm-BinGen, which is a battle tested, which makes it very, very easy to generate a fully working JS package, an NPM package, made of a WebAssembly file and JavaScript glue code. Basically, this glue code internally loads Wasm file transparent to the developer, as you will see in a bit.
Let's see in action. This is Milo in Node.js with WebAssembly. We import Milo, we prepare a message, and now we have to play a little bit with the memory. The concept is that the WebAssembly exports a memory space, which is shared between WebAssembly and the JavaScript layer.
8. Memory Allocation and Parsing in Milo
If you put data inside that memory space, it's accessible by both layers. We allocate a memory buffer, create a buffer using that memory, and create a parser and callback. In JavaScript, we don't need to bring the parser with us. We copy the message in the shared buffer, then call milo.parse with the parser, pointer to shared memory, and length. The reason for using milo.parse is performance, as it avoids deserialization. The parser is passed as an integer to the WebAssembly, where it is rebuilt as a parser within the performant WebAssembly space. If you find a better way, ping me.
If you put data inside that memory space, it's accessible by both layers, without the availability of serialization and deserialization. Therefore, what we do is that we allocate up a memory in that buffer, and we receive a pointer to the start of that memory. After that, we can create a buffer using that memory buffer, sorry for the ambiguity, passing the pointer and the length. Finally, we get to the meat. We create a parser, and as we've always been doing for C++ and Rust as well, we create the callback, which has the same signature. In this case, we're omitting the parser. First of all, because in JavaScript, we can bind the callback, so we don't need to actually bring the parser with us, or we have it in the local scope and so forth. We can play with the message, we can get the offset and so forth to extract the payload, and we print our runtime, what we got. Now, in order to trigger the parsing, we have to do two operations. First of all, we have to copy the message in the shared buffer, so we do buffer.set from the string as a buffer, and then we call milo.parse, parsing the parser, the pointer to the shared memory, and the length. That's it. The reason why I'm using milo.parse rather than parser.parse, which is also what I was saying in Rust, is because in WebAssembly, if I created the parser as a class, every time I call a method inside the WebAssembly from JavaScript, I have to deserialize the reference to the object, so it would be very expensive. Instead, this way, parser is nothing but an integer, which I can pass to the WebAssembly, so it is a primitive type that can be transparently switched back and forth. And when I get to the Rust, basically, that integer is a memory pointer that is eventually rebuilt as a parser, but within the WebAssembly space, and therefore is performant. Sorry for the very accurate explanation. You can rewind and listen with smaller speed. But that's the idea. That's the general idea why I have this complex structure, and that's why I chose not to use imp in Rust. That's the explanation. If you find a better way, ping me, because I'm interested.
9. Final Steps and Conclusion
This is the output, which is the same as the C++, nothing changed. What is missing? Performances with milo compared to LLHTTP in Node using the C++ version. The Node.js integration for the WebAssembly part is missing. Working on WebAssembly with Undis abilities to integrate milo. Fixing the performance issue of milo in WebAssembly. Considering implementing SIMD in WebAssembly. Migrating the LLHTTP bus test suite to milo to ensure no regressions. Quoting Albert Einstein: a person who never made a mistake never tried anything new.
This is the output, which is the same as the C++, nothing changed. And that's milo. We're done. You made it.
Now, what is missing? These are the performances, first of all, with milo compared to LLHTTP in Node using the C++ version. This is a preliminary run.
What is missing? First of all, the Node.js integration for the WebAssembly part. I'm now working on WebAssembly with Undis abilities as memory, a smaller software to play with in order to integrate milo. Then I would like also to fix the performance issue of milo in WebAssembly, because at the moment LLHTTP is faster. I'm also thinking about implementing SIMD in WebAssembly, if I'm able to.
Finally, the last step, which is just for verifying the correctness, is to migrate the LLHTTP bus test suite, which was also inherited by the previous parser, LLHTTP parser, to milo, in order to ensure that no regressions are introduced by milo. After that, milo will be ready.
Now, something that I usually do in my talks is to finish with one quote from a person way smarter than I can possibly be. In this case, I've got one from Albert Einstein, which says, a person who never made a mistake never tried anything new.
The key here is that you always have to try something new to dare to even the impossible, even if you might not get to a solution, which is what I did when I wrote milo. I didn't know Rust, I never wrote actually an HTTP parser, and I tried both of them. I learned a lot on milo, on WebAssembly, on HTTP parsing, on Rust, and on architecture. It has been an amazing trip.
I hope to get milo merged in Node as soon as possible. That's all I have for today. Once again, thanks for attending this talk. Feel free to ping me on Twitter, on LinkedIn, on GitHub, on email, anytime you want. Once again, thanks for attending Node Congress 2024, and thanks for attending my talk. Cheers.
Comments