Video Summary and Transcription
James Snell discusses challenges in current stream APIs, proposes a new, faster, and simpler streams API while questioning the existing Web Streams model. Discussion includes the necessity for a new API due to excessive ceremony, confusing locking mechanisms, and complex specifications, especially in controller APIs. Challenges with controller API optimization, hidden buffering, and excessive memory usage are highlighted. The talk delves into the cost and complexity of promises in JavaScript, optimizing readable streams, and managing complexity in stream implementations. Optimizing stream processing in JavaScript, data chunk processing, back pressure policies, multi-consumer patterns, and efficient branch cursor management are explored. The new API aims at simplifying stream processing by eliminating unnecessary operations, reducing promise overhead, and focusing on iterators, transforms, and back pressure for efficiency and simplified design.
1. Challenges and Proposed Solution for Stream APIs
James Snell discusses challenges in current stream APIs, proposes a new, faster, and simpler streams API while questioning the existing Web Streams model.
All right. Hello, Node Congress. This is James Snell. I am happy to be joining Node Congress again. It's one of my favorite remote conferences. We're going to be talking today about stream APIs, in particular, the fact that after implementing the Web Streams, readable stream, writable stream, transform stream, after implementing this in Node and a couple of times in Cloudflare workers and looking at all the various implementations of the specification that are out there, there are some challenges. There are some things about this design, the spec that really make it difficult to optimize and perform well. And we see these problems time and time again.
The Node implementation of readable stream, for instance, is an order of magnitude slower than Node streams. And many benchmarks that we see across multiple runtimes in various frameworks like Next and React and quite a few others, we consistently see readable streams performing significantly slower than alternatives like Node streams. And I got to a point here recently where I came to the realization that while Web Streams was fantastic for what it was, it really worked on trying to meet that goal of having a unified streaming model across all these runtimes in a way that is compatible with the Web and Node and back end and on the edge. It provides that common model. It was designed at a time where better options weren't necessarily available. And it's been going on for almost 10 years now that we've had this available.
What better options might there be? I really started to stop and take a look and just see how could a theoretical new streams API work and would it actually be a net improvement over the state of the art with Node streams and Web streams? And what might the ergonomics of that look like? How does it actually deal with improvements and deal with the complexity of streams in general? And so I came up with an experiment. And the experiment was can we design a new streams API and can it be simpler and faster, but still meet the needs of streaming data? And that's basically what I want to focus on. It's just presenting this basic idea. The idea is not to say, hey, let's go off and do it this way. The idea is really to start a conversation to really figure out if we can do this a better way and if so, what would that look like? So here we go.
2. Challenges of Existing Stream APIs
Discussing the necessity for a new API due to excessive ceremony, confusing locking mechanisms, and complex specifications, especially in controller APIs.
All right, so why a new API? Motivation, upstream issues, there's just excessive ceremony. Too much boilerplate. Anybody that's created the streams with the underlying source, underlying sink, got a reader, had to deal with reader locks, these kinds of things. The read loop. It becomes very clear that there's just a lot of boilerplate repetitive code. So we'll get to that in a second.
Confusing locking, reader writer locks that are easy to leak. You can lock a stream, which means you get a reader for it or get a writer for it. You start using it. If you forget to release that lock, you know, a method release lock. If you forget to use it, then no one else can use that stream. And it's really easy to kind of just lose that or forget to unlock after which it just doesn't work anymore.
We have a complex specification, there's only 70 plus abstract operations, multiple overlapping state machines. Is the stream itself closed or error? Is the controller in there closed or error or closing? What state is the underlying queue in? What state is the reader in? There's so many different state machines that are at play and they are overlapping and interact with each other in fairly complex ways. Controller API confusion. So we have readable stream and writable stream, writer and reader. But they also have this internal API about controllers that have a completely different API than writers and readers. And these controllers have a completely different lifespan than the actual stream itself.
3. Challenges with Controller API and Memory Usage
Discussing the challenges with controller API optimization, hidden buffering, and excessive memory usage in streaming cases due to unbounded accumulation of data in branches.
I mean, you can detach them, use them externally from the stream in a way that makes it very confusing about, you know, where is this thing? Where's the data going at any given time? What is the state of this? But more importantly, that controller API makes it extremely difficult to optimize the way streams work in any real way, at least when it comes to the observable behaviors of what's going on in this in this implementation. So, you know, we'll talk a little bit more about that hidden buffering, readable stream and transform streams and teas and all these different things create additional buffers and back pressure in different ways of accumulating data in hidden collections internally. And what this ends up causing is significant memory thrashing, GC thrashing, pretty much exceedingly high memory usage for just basic streaming cases.
For instance, if you weren't aware, when you take a readable stream and you tea it into two branches per the spec, when you're reading from one branch, all of the data you read accumulates unchecked in the branch you did not read. And the way that the spec is written, that data accumulates in an unbounded buffer with no back pressure. So you could just be reading one branch thinking, you know, it's going to be fine. You can come to the other branch later and realize you just exploded your heap usage with your application.
Now that's for the spec. We'll get to the fact that, you know, the spec actually says one thing on how to implement all of these APIs. But then all of the implementers are free to do whatever else they want, which makes for an interesting dynamic. There are a large number of tests, what we call web platform tests, that verify the behavior of all these readable streams implementations. And they go into obscure details such as the exact order in which certain events are observable, whether or not certain promises resolve in a particular order, all these kinds of verifications on the observable behavior. But then, you know, the spec says, you know, basically implement however you want. Those aren't necessarily compatible ideas. In some cases, you can optimize. In other cases, you can't. So we'll talk about that a little bit more.
4. Promise Overhead and Readable Stream Complexity
Discussing the cost and complexity of promises in JavaScript, along with the challenges of optimizing readable streams and potential lack of support for async iterators in certain implementations.
And then it's just promise overhead. You know, promises are a very convenient syntax to use or very important tool to use JavaScript. But they have a definitive cost. Every read creates multiple promises internally. Every write creates multiple promises internally. Again, this is per the spec. There are places where you can optimize these away. But a lot of these promises end up creating observable behaviors, which means you can't always optimize them. You can't always get rid of them. And in the cases where you can't get rid of them, the performance cost is actually quite steep.
So let's talk about some of the complexity here. Web streams. You can get a readable stream somehow. You get a reader from that. Let's just say you just want to accumulate the chunks. Now, before we had async iterators, which readable stream does now support in the spec. Not all implementations support it. I think, for instance, Chrome, I don't think supports it yet. You would enter this while loop, you know, await a single read, get a value and an indication whether it's done. If done, return is true. You're done. You break out of the loop. Otherwise, you process the chunk you just read. And you always get one chunk at a time.
In my chunk here, it could be a unit array. It could be a string, could be an object, could be a number, any value. A readable stream of this type here with the reader read can be any value. And then afterwards, you have to remember it's released a lock, which is actually quite easy to forget. What you don't see here are all the layers of complexity underneath, right? You know, this readable stream, depending on what it's doing, if it's coming from the operating system, then this readable stream might just be doing a very optimized read down to a file descriptor. But if this readable stream is coming from user code, if some library said new readable stream and passed in the pull algorithm and the start algorithm, all of the stuff that feeds data, then a stream could be completely unoptimized.
5. Hidden Complexities of Stream Implementations
Discussing the hidden complexities and significant overhead of streams due to unoptimized paths, memory leaks, and broken back pressure signaling causing unbounded buffer growth in transform streams.
You have no idea where this what work is happening or how many promises are underneath this, whatever it's. It actually hides quite a bit of complexity. Some of those paths can be optimized. Many of them cannot. And it's the fact that many of them cannot be optimized in any way, which makes this the stream's overhead much more significant.
All right, so some of the things we see, memory leaks. You know, there was a time when I was working on the Cloud for Workers implementation, we saw transform buffers growing unbounded under load. This is because the transform, the way the transform stream works, is it's a push in queue. You have a writer and a reader, you start adding stuff to the writer, right? And then the writer side has the back pressure that it goes through this transform function in the middle.
Well, that transform function operates without back pressure signaling. There is a way where when you write it can be an async function and it will single signal to the back end to the writer. I mean that, that, you know, hold on, wait. But if that transform accepts the data synchronously and pushes it over to the readable side, which has its own buffer, then the readable side can just keep working and just keep pushing data into the into the transform, which means this buffer can grow and grow and grow unbounded with just a broken back pressure signaling in the middle. Now, yes, that is an implementation detail. That is a bug that we had to fix in Cloud for Workers, but the design of the spec of the design of the way that writable stream transforms human readable stream work allows that to basically just happen if an implementation is not careful.
6. Memory Consumption and Backpressure Issues
Discussing issues with NodeFetch implementation, memory consumption in teed strings, memory growth in different branches, React server-side rendering complexities, and memory explosions in AWS SDK due to backpressure signaling ignorance.
NodeFetch as part of implementation had this bug where unconsumed bodies in a teed string exhausted connection pools. What? Well, the unconsumed branch would just hold on to that file descriptor representing that connection until that thing was fully consumed. So if you left this thing and didn't consume it, it would eat up a file descriptor, leaking that, and then the connection would and you would not be able to send any more connections. Firefox NT with this problem in workers and in Node as well.
Zero in memory when branches are consumed at different rates. This just means that it just continues to grow. There's data in memory just continues to grow and grow and grow on solar reading branches. And then there's just complexity books, React, server-side rendering, deadlocks with Nested Suspense. And there were some benchmarks that came out in ecosystem. Theo, I'll publish these last October.
Well, in one of those requests in that benchmark, Next ended up creating something like two hundred and fifty readable stream instances in a single request, each one pumping just a single chunk of data through another readable stream. You know, and just the way that this whole orchestration was was occurring just ended up meaning that, you know, you know, layering buffers upon buffers upon buffers upon buffers, just you know, just to turn data through the pipeline. It was horribly inefficient AWS, SDK, memory explosion on slow processing. This is the same problem. This is just data accumulating in these buffers, primarily transform streams or writable streams where the producers of data are just completely ignoring backpressure signaling.
7. Managing Complexity in Readable Streams
Discussing challenges of managing complexity in readable streams, the impact of unnecessary complexity on performance, and the need for optimizing user-created streams for speed.
You know, any signals that tell you when that data and that buffer has been completely full. You know, and there's been some arguments about this. Complexity is the job, right? Readable streams are complex. System engineers, it's our job to manage this complexity. Yeah, but unnecessary complexity is a liability.
If it doesn't need to be there, it doesn't need to be that complex. Why do we then why do we need to deal with it? Simpler API allows us to focus on the real problems, not the quirks of the API. Naive implementations are going to be bad. Yes, you take a spec.
If you write codes exactly that spec and typically these specs are written to the lowest common denominator or the most generic algorithms, these things are going to be slow. Yes, that is absolutely true. But the complexity performance pitfall can't always be optimized away. You can't always hide things in readable streams and readable streams.
That's particularly true when somebody calls new readable stream, when they're constructing the stream themselves. You know, it's only the cases where the runtime is providing a stream I can fetch or when you're reading from the file system. Those are the cases that a runtime can optimize for and make really fast. But those aren't the bottlenecks. The bottlenecks are the streams that the users are creating and the user code. Those are the things that need to be faster.
8. Optimizing Stream Implementations
Discussing the optimization of stream implementations, including the importance of design principles, back pressure, and batch chunk processing efficiency.
And it's just a very little way of optimizing the stream implementation to make those optimizations possible. Readable stream is here to stay, but it doesn't have to be the only option. We have no streams. We have readable streams. We can have another one, another option on the table. These options can be designed to work together, to exist parallel with each other. Design principles include streams as iterables, using async iterable unit data, and avoiding unnecessary class hierarchies.
One approach is to pull through transform data only when needed, optimizing performance by not processing data until necessary. Explicit back pressure ensures proper memory management by requiring attention to signals. Batch chunks yielding one Uint8 array at a time is less efficient than yielding an array of them. By considering these optimizations, stream implementations can be more efficient and effective.
Optimizing readable streams involves utilizing the language's capabilities, such as async iterable unit data, to streamline operations. Transforming data only during reading enhances performance by minimizing unnecessary processing. Explicit back pressure ensures memory usage is controlled and signals are acknowledged. Batch chunk processing efficiency can be improved by yielding arrays instead of individual items.
9. Optimizing Stream Processing in JavaScript
JavaScript language usage, pull-through transform, explicit back pressure, and synchronous vs. asynchronous data consumption paths in Node streams.
This is part of the JavaScript language. We don't need any other classes. We don't need any other class hierarchy. We can just use what the language provides.
Pull through transform. What this basically means is instead of pushing data into a transform that operates on the data and then pushes it out to another queue, let's only transform the data as we're reading it. So as we're walking through this iterable, as we pull a chunk of data through the stream, then we'll apply the transform only when we're ready to have it, only when we're ready to process it.
Explicit back pressure. Right now with readable stream and writable stream, an implementation by default can completely ignore back pressure signaling, completely ignore what it's doing to the memory. Batch chunks yielding one Uint8 array at a time is far slower than yielding an array of them. There is a synchronous path of consuming data and an asynchronous path of consuming data. Readable streams force you down the async path. Node streams kind of do sync and kind of do async. It rides this line in the middle, but there's no promise overhead in Node streams. But Node streams are event emitter based. The event emitter is primarily an asynchronous API.
10. Data Chunk Processing and Back Pressure Policies
Pushing data chunks, bytes in streams, clean async separation, back pressure policies like strict block, drop oldest, and drop newest.
As a stream is operating, as you're pushing data in, you're pushing individual chunks, individual Uint8 arrays into the stream. All of those that you push in might be available all at once when the consumer is ready to pull the data, when they're ready to actually consume the data. So yielding a batch of these rather than one at a time allows us to amortize the cost of actually consuming the data. If the data is already there in the buffer, let's just provide it all at once.
In bytes only streams, in Node and readable streams, they support any value type, but the majority of the use cases, particularly like you're talking about system streams like Fetch or reading file or those kinds of things, are just bytes. But it's important to point out nothing in the design here requires it to be bytes.
Clean async separation, parallel APIs, no ambiguity, no Zalgo, no concern over timing. There is a synchronous path of consuming data and an asynchronous path of consuming data. Readable streams force you down the async path. Node streams kind of do sync and kind of do async. It rides this line in the middle, but there's no promise overhead in Node streams. But Node streams are event emitter based. The event emitter is primarily an asynchronous API. With the new design that I'm experimenting with, there's very clear separation. There's a path which is only sync and a path that is only async. And then we have minimal dependencies. Abort signal in the design right now is the only non intrinsic API dependency.
So let's talk about back pressure policies. Back pressure policies, strict block, drop oldest, drop newest. Very straightforward. The default being strict, which means if you're not paying attention to back pressure, you'll get an error. Block is what readable stream and Node streams do now. The data will accumulate indefinitely as you're writing it. Drop oldest, drop newest are brand new. This are ways where if you can deal with lossy streams, if you fill up the back pressure, either the most recent entry or the one or the oldest entry in the queue are automatically dropped. This would be for, you know, live streams or data streams where you can tolerate a certain amount of loss.
11. Stream Data Accumulation and Namespace Object
The data accumulation with drop policies for lossy streams, API fundamentals with the stream namespace object, and the process of creating and consuming streams through iterators.
The data will accumulate indefinitely as you're writing it. Drop oldest, drop newest are brand new. This are ways where if you can deal with lossy streams, if you fill up the back pressure, either the most recent entry or the one or the oldest entry in the queue are automatically dropped. This would be for, you know, live streams or data streams where you can tolerate a certain amount of loss.
Some of the API fundamentals here with this alternative design, we have the stream namespace object. Right now, this isn't a class. This isn't something that you can create. This isn't like new stream. This is just a namespace object similar to the console or the global this in the language stream is just something you import and you have these methods push from duplex. These things are going to return various artifacts.
The stream itself is just an iterator. You know, here you can create with a generator right here. So here we have a synchronous generator yielding individual chunks, which are strings. In this case, these are automatically going to be converted to TF eight. You enter a raise that stream from is what actually returns the stream here. And what that's doing is taking that generator and wrapping it in this async iterator that returns the multiple one or more eight arrays. Right. And then we can take this. We can say, push. Right. And in this case, we get a writer and a readable. And again, the readable is just an async iterator. And the writer is just a simple class that has right. And the common operations that you would expect. And this is how you push data in to that readable. Right. So this is the approach where we essentially becomes an identity transformer. You're writing on one side and reading it out on the other side. Consuming streams is just a matter of consuming an iterator. Again, this is a language level primitive.
12. Data Consumption and Transform Options
We have four weights for chunking data and limiting production amounts. Data consumption involves pull through transforms with stateless and stateful options for modifying chunks. Pull pipelines manage data transforms efficiently.
We have four weights. That's going to give you chunks of readable chunks here is an array. It can be one or more bits of data that are that are in queued. We can specify a limit here with this, with this design. If you want to be able to limit the amount of data that is produced, you can specify that you can await text. You can accumulate all of this as text or as bytes. There's multiple ways of actually consuming this data transforms. The options here, again, these are pull through. These transforms only exist. They're only run when you actually call next, when you actually pull the next chunk of data off that iterator.
And we have two forms in this design. There's a stateless and a stateful. Stateless is just a function. You know, yeah, it is called for every batch of chunks that is pulled through. So when you call next and we get that next, it's going to pass it through this function. You do whatever operations you want on it and return the modified chunks. Or if you're done, you just you know, you'll get no in your internal and pass that down the line.
The stateful transform is a little more complicated. This is the case where it becomes an object. This is again not a class, just an object with this async transform or if it's a sync transform, a sync generator that receives the source, which is your the underlying async iterator that we're consuming. And we are going to forward that and this would be for if we're maintaining state for a compression stream or an encryption stream, something that that might have some internal buffering on it. The buffering is possible with the stateless, just a little more complicated. So we provide these two options. Pull pipelines.
13. Stream Processing and Multi-Consumer Patterns
The async iterator protocol handles data chunks with compress and encrypt transforms. Data processing involves four weight constant chunks for transformation and pipelining to a file writer with back pressure. Multi-consumer patterns like T alternatives support efficient data sharing with minimal memory overhead and shared queues.
You know, this is entirely just the async iterator protocol. The source is just a iterator. It has a next function on it where it pulls the next chunk of data. The stream pull wraps that in a way that allows us to grab multiple chunks synchronously if they're available. Notice that we have compress and encrypt here. Those are transform functions that are attached. So as we iterate over the output, which is again just an async iterator, you know, every time we call next. What is going to happen, that pipeline is first going to call next on source, get a chunk of data, then it's going to pass that to the compression transform, and that's going to flow through the encrypt transform.
And then we're going to get the data here through our four weight constant chunks of output. And then we process those. So when everything that we see on this four weight is going to be the transform to data. And if we want to pipe that to a destination, we can just call stream pipe two. It's going to go through the same process, except what's going to happen is that those chunks of data are going to go to that file writer instead. Again, the file writer here is just the writer object I mentioned earlier. That is the piece that has enforced back pressure. So if this pipe two operation, it is forbidden from overwhelming that writer by default. So it's very important. It can't just keep pushing data, pushing data, pushing it in. It actually is required to follow the back pressure signaling.
All right, multi-consumer patterns. This is where we get into T alternatives. There's two broadcast and push or share broadcast is a push. Imagine that this is a T where you have, you know, some other thing that is pushing data into those streams. So let's say you create a transform stream, you get the writer side and the reader side, and then you call T on that reader side. However, many times you call T, those branches are going to receive whatever data you write to the writer. Same thing here, only much more efficient. We don't have the memory overhead. Basically, every branch, every branch we get off here can have its own transforms applied to it, but they all share the same underlying queue. Imagine them just as cursors on the same data set. We have a single buffer.
14. Branch Cursor Management and Benchmark Results
Each branch has its cursor, ensuring efficient data consumption with strict back pressure signaling. Setting a high watermark is crucial to prevent overwhelming the queue, maintaining faster operation than readable streams. Samples and benchmarks in the provided repo showcase detailed usage and benchmark results.
Each one of these branches has its own cursor as they consume that data, then we can drop off the data that's been consumed. That back pressure signal operates at the rate of the slowest reader. So we're never going to overwhelm the queue. We're never going to have multiple copies in the queue. It is just one queue that is there as data is pulled through. So it operates much more efficiently.
The back pressure signaling is strict by default, which means you just have more control over what is actually happening with your memory. On the pull side here, we just have an async iterator or sync iterator and we're just pulling through all of the branches can get that data. Again, it's one shared queue with a single cursor. The back pressure signaling that is here is based on that slowest reader. So with this one, you do have to be a little more careful. There are ways of overwhelming the queue.
So you want to set a high watermark with this just so you can manage that a bit better, but it still operates much faster than what we have with readable stream, some samples and some benchmarks. I have a repo. I'll share the link with the notes and in the chat for this for this session. So I have some samples. These are also in the repo. I'll share the repo link. It's just right here. GitHub.com J.A. Snell, new streams. All of the samples and benchmarks are there. The samples themselves are pretty straightforward there. We have both HTML and TypeScript. We also have a sample cloud for worker where this is being used. The samples break down into how to actually create the streams, how to consume them. They provide much more detail on how to actually use this experimental new API. So you really get a sense for for how it works, get a feel for the difference in style that it has. Benchmark results. This is where it gets really interesting.
15. Optimizing Performance with Built-in Primitives
Faster data processing with JavaScript's built-in primitives surpasses other stream implementations. Significant performance boosts achieved by eliminating unnecessary operations like buffering and promise overhead. Async iteration with batch processing enhances performance by reducing micro-task deferrals and buffer reads.
These numbers are produced from node, but I've seen similar numbers with Dino, Bun, all of the browsers, Chrome, Firefox, Safari, similar numbers with with Cloud Flare workers. The point is, is that no matter what the underlying readable stream implementation is, this is faster and it's also faster than node streams in many cases. And this really comes down to the fact that this is just using built-in language primitives rather than additional layers of abstraction and classes and promises and all these things on top that the runtime implementations provide.
This is leveraging the bare bones, basic implementation of features that are built into the JavaScript language. Some of these numbers, you look at this chain transforms 86 times faster, 70 times faster. I've seen this up to one hundred and twenty times faster, and these may seem hard to believe, but if you really break down what is going on, what's happening at the memory, what's happening with the CPU, you know, all of the operations that are elided, that are completely unnecessary, just, you know, they're just gone from this approach.
The transform still work. The transformation still happens. We're still pumping data through the pipeline, but there's all this buffering is gone. All of the promise overhead is gone. All the micro-tasks is gone in some cases or at least significantly reduced. So these numbers become much more understandable of why we're able to get such a high multiplier on performance. Batching, you know, that async iterator of a unit array of arrays rather than a single one ends up becoming the single most significant performance boost because rather than reading just one chunk of data at a time, one unit at a time where everyone is a promise, everyone is a micro-task deferral, everyone has to go off and read the buffer.
16. Efficiency and Simplified Design
Async iteration and smaller data chunks improve performance. New API excels in reducing async overhead, particularly promises. Simplified design with iterators and transforms, no class hierarchy or additional controllers, ensuring API consistency.
We can just say, oh, I have ten chunks available. Here's ten chunks right now. That ends up being a major, major performance difference over what we have now with the readable stream. What the performance really shines chain transforms 70 to 100 times faster, 120 times faster. Simple pipelines can be up to 20, 22x faster. These are pipelines that are doing like moderately complex transformations.
Async iteration just on its own is just faster. Smaller tiny chunks of data is faster. Again, there's just less overhead with this approach. The new API really excels when async overhead dominates. When I say async overhead, I mean specifically promises. Promises are horribly expensive in some cases. I've talked about this before. You know, go back and look at my old broken promises workshop from a number of years ago.
This is a long term problem in the language where promises are convenient, but they have a cost and they're definitely not cheap. So just a summary. Benefits of design. It's a simple mental model. We're just dealing with iterators. There's no classes. A writer is just an object that has to write an end and these methods on it. Transforms are just functions or an object with a function. There's no classes, no class hierarchy, no controllers, no readers, no writers, no you know, is it a BYOB read or a default read? The APIs are consistent.
17. Simplified Stream Processing
Elimination of writer-controller distinction. Focus on iterators, transforms, and backpressure. Reduced promise overhead with batch processing. Type safety preference for bytes, optional protocol extensions for flexibility. Simplified implementation with streamlined state machines, leveraging language features for ease.
When writing data, the distinction between writers and controllers is eliminated. The approach now revolves around iterators, iterables, and transforms as functions and simple objects. Explicit backpressure is emphasized, with no default unbound buffers. Batch processing significantly reduces promise overhead. The design focuses on type safety with a preference for bytes, although flexibility exists in implementation. Protocol extensibility is achievable through optional extensions, maintaining familiar patterns.
The implementation simplifies with fewer abstract operations, no locking, and streamlined state machines. Runtime implementations can leverage language-level features like generators for easier implementation. This simplification allows a clear focus on individual tasks within the pipeline, such as transforming data or producing data. Coordination of error and closing states is abstracted away, leaving only essential data production and transformation responsibilities to the components.
18. Flexible Design and Implementation
Design flexibility with data types. Optional protocol extensions for flexibility. Streamlined implementation with fewer operations and simpler state machines. Leveraging language features for ease of implementation, focusing on specific tasks and simplifying error handling.
But again, that's just an arbitrary decision here. Nothing in this design actually requires it to be bytes because nothing in the stream pipeline actually operates on the bytes. It just sees Uint data arrays and passes along. The only place where we say, you know, where there's anything byte related is when we take a string and we automatically encode that as a UTF-8 set of bytes. And that's only, you know, at certain layers, you know, as the data is coming out or right when the data is flowing out.
Protocol extensibility, this is something I didn't get into. It's in the repo in the reference implementation of this alternative approach that there are ways of extending it. There's ways that you can make it more flexible if you need it, but they're all optional. It's very familiar patterns. We're just talking about 408 and function calls. That's it. Easier implementation. There's fewer abstract operations, no locking, simpler state machines. The runtime implementations don't really have to implement their own state machine here. They can rely on what's already in the language level as far as generators and async iterables and sync iterables are concerned, which really simplifies the model.
You can focus a transform just on what the transform is doing, for instance. You can focus the iterator on just producing the data. You don't have to coordinate error states and closing states and all these different overlapping state machine concerns. It's just an iterator that produces data. That's it. Getting started again in the repo. I have this. You can play with this. Now, once you've checked out the repo, just npm test runs all that. There's samples, there's benchmarks, all there in the repo. Again, the repo is located at github.com. J-A-S-S-N-E-L-L slash new dash streams. You can go take a look at it. I highly encourage you just to get it, play with it and see what's going on.
19. Encouraging Exploration and Engagement
Encouragement to explore resources, including readme and API doc. Author's book in progress and blog post on stream API challenges. Invitation to engage with the material and participate in discussions.
I highly encourage you just to get it, play with it and see what's going on. Some of the key resources, you can take a look, there's a readme, there's an API doc, there's a migration guide. All these things designed to just help you get familiar with this so you can experiment and play with this new idea.
So that's it. I know I went through that really fast. I'll be on the chat. I'll be able to walk through any questions folks have. The repo link here, again, go check it out.
As a side note, I am writing a book. It's JavaScript in depth through Manning. There's the link. It's not quite done yet. I'm probably about nine, 10 chapters into writing it. I've got a few more to go, but it is out in Early Access. You can take a look from there. And if you want to take a look, here is a book. You can just go to Manning's page and go check it out.
I did write a blog post on this topic for the Cloudflare blog. There's We Deserve a Better Streams API for JavaScript. This really breaks down a lot of the rationale. It goes through some more of the examples, a lot of the things of what's wrong with a readable stream and why we might want to get away from it. What are the parts that are difficult to optimize around? You know, talking about back pressure, good and dirty, broken and practice. It really goes through the details. It's a long post, but I think well worth the read. Anyway, that's it. I will wrap there and then we can move the conversation to chat. Thank you all and hope you enjoy the rest of the sessions.
Comments