Video Summary and Transcription
This talk discusses how TypeScript definitions are automatically generated for CloudFlare workers using runtime type information. The encoding and transformation of type information is explained, with runtime API types being encoded in C++ and further processed in TypeScript. The process of improving type definitions and achieving compatibility is also covered, including fixing problems with iterators and using human input to improve developer ergonomics. The talk concludes with a plan to build Types as a Service, a Cloudflare worker that dynamically generates NPM packages containing TypeScript definition files.
1. Introduction to CloudFlare Workers
In this part, I will discuss how we automatically generate TypeScript definitions for CloudFlare workers using TypeScript. We will cover the problems with the handwritten types approach we used a few years ago, our auto generation approach using runtime type information, transformations to improve other types, manual type overrides to improve developer ergonomics, and compatibility dates. We will also explore the workers runtime's run time type information system.
Hello, everyone. Welcome to my talk on how we automatically generate TypeScript definitions for CloudFlare workers using TypeScript.
As a quick introduction, I'm Brendan. I created MiniFlare, a fully local simulator for CloudFlare workers. I mentioned CloudFlare workers a few times already, but what are they? Workers are functions as a service platform. You write some HTTP handling code, publish it to our platform, and we give you a URL you can hit to run it. We deploy your code to all CloudFlare edge locations so your users get low latency access wherever they are. Importantly, there's practically no cold start time. Because our run time is based on V8 isolates, not containers or virtual machines. In addition to standard web APIs, we provide nonstandard APIs for key value storage.
The key points that we've got so far are custom V8 run time based like Node or Dino. We implement mostly web standard APIs like browsers, but also implement some nonstandard APIs specifically for server-side use cases. With all this in mind, why do we want types? We want type checking to prevent errors at run time and we want auto completion in IDEs. To start, we'll cover how we used to do handwritten types a couple years ago, then we'll look at our auto generation approach using runtime type information. After that, we'll look at some transformations we can do to improve other types, and then we'll focus on how we allow manual type overrides to improve the developer ergonomics. And finally, we'll have a quick look at compatibility dates.
First off, how we did types a few years ago. Like most run times, we had an npm package providing global types for our run time. Importantly, this package only included CloudFlare specific APIs, like HTML rewriter, and not web standards. There were quite a few problems with this approach. Firstly, these types were handwritten, making them error prone to update. Secondly, these types weren't typically updated by the team implementing the run time, making them slow to update with new APIs. On the right, we've got an example of a type error using a new API, even though that code should run fine in Rutgers. However, the biggest problem was with our reliance on libwebworker. This included APIs like message channel, which we didn't implement, meaning code type checked wouldn't run in our runtime. And workers at the time weren't fully spec compliant. We were also missing experimental APIs that we had implemented. So there are a few problems here. Now that we've seen all the problems, what should we do about them? So the workers runtime provides a run time type information system. This allows us to query types at run time, and was originally used for automated fuzz testing.
2. Encoding and Transforming Type Information
Run time type information is encoded as cap and proto, which we use as a language and platform-independent binary format. It's basically a typed version of JSON. We can encode all runtime API types in C++ and do further processing in TypeScript. This allows us to use the TypeScript compiler API for rendering TypeScript. With a bit more work to convert cap and proto types into TypeScript, we have auto generated types. The types still aren't perfect, but all the information we need to fix these problems are already in the types.
Run time type information is encoded as cap and proto, which we use as a language and platform-independent binary format, kind of like protocol buffers if you've used those before. It's basically a typed version of JSON.
This is our cap and proto schema for type information. You can see that this kind of maps to TypeScript types. The really nice thing about cap and proto is that it can generate encoding and decoding code for you from your schema for many different languages. Importantly, this means that we don't have to implement all stages of the auto generation pipeline in the same language.
We can encode all runtime API types in C++ and do further processing in TypeScript. This allows us to use the TypeScript compiler API for rendering TypeScript. What we want to do is build the interfaces from run time type information. We'll try to build the interface we've just seen. We'll build from the bottom of the tree, so we'll start with the key parameter for the get method, then we'll build the get method's return type and the method itself, then we'll build the interface containing the get method, then we'll create a placeholder source file and printer so we can print out the interface to a string and log that to the console. If we run that, we get what we expected.
All that for three lines of TypeScript, this is a very verbose API, but with a bit more work to convert cap and proto types into TypeScript, we have auto generated types. This pretty much solves all of our problems from earlier. We've got exactly the APIs that implemented in the runtime with the minor spec differences. But the types still aren't perfect. For example, iterators don't have correctly typed values, we can't use any global functions or constants, and we don't have function overloads, so TypeScript can't narrow return types given arguments.
Luckily, all the information we need to fix these problems are already in the types. What we need to do is transform them into a form TypeScript recognizes. So we'll start with fixing iterators. This is what our types look like at the moment. We want to use TypeScript's built-in iterable iterator type instead, and the transition kind of looks something like that.
To fix globals, we need to extract service worker global scopes members into the global scope, and this will need to include superclasses too. So, again, we need to perform something like this. So how do we actually do this? Let's look at a simpler example. Say we want to replace all strings with numbers. We can write a TypeScript transformer for this that recursively visits all nodes. If we find a string token, we replace it with a number. Then we can use the TS transform API to apply this to an AST node. We start at the root KV namespace declaration, and we do a depth first search until we find a string.
3. Improving Type Definitions and Compatibility
Applying this technique to our other transformations allows us to fix the first two problems, so iterators are now typed correctly, and we can use global functions and constants. We need some additional human input to determine the correspondence between input and output types in the C++ code for KV namespaces. We also need human input to improve developer ergonomics, allowing partial TypeScript code to be inserted alongside C++ code. With that, we've solved all the problems from earlier and have a solid set of type definitions for workers. We put breaking changes behind compatibility flags, but generating types for all possible combinations of flags is infeasible. Instead, we plan to build Types as a Service, a Cloudflow worker that dynamically generates NPM packages containing TypeScript definition files based on the selected compatibility date and flags.
When we do, we replace it with a number, and we keep going until all nodes have been replaced. Applying this technique to our other transformations allows us to fix the first two problems, so iterators are now typed correctly, and we can use global functions and constants.
For overloads, we need something a little more complicated. What we want in our TypeScript types is something like this, where we have a correspondence between input and output types. For example, when we specify text, we expect a string. When we specify a ray buffer, we expect an array buffer, and so on. The problem is that in the C++ code for KV namespaces, there's no correspondence between input type and output result. How do we know what to generate? We need some additional human input here.
There are other places where we need human input too, to improve the developer ergonomics. We need this function for function overloads. We also need additional input to add type parameters. Sometimes you want to rename types to be less verbose, sometimes you want to replace the auto-generated definition with something different, and sometimes you want to hide the type because it shouldn't be exposed to end users. We solve this by allowing partial TypeScript code to be inserted alongside C++ code. This is encoded with the runtime type information in cap and proto, and we merge this with the generated definition. We can use this for all the cases we've highlighted, and because we're co-locating overrides with the C++ code, it's much easier for runtime developers to add these in and keep them up to date.
With that, we've solved all the problems from earlier, and we have a solid set of type definitions for workers. But there's one more thing we can do to make them even better. Sometimes people make mistakes. Developers are no different and sometimes we introduce bugs into our code. Sometimes fixing those bugs would introduce breaking behaviour changes, but we don't want to break existing deployed code. So when you upload your worker code, you have to specify a compatibility date.
We put breaking changes behind a compatibility flag which have a date when they're enabled by default. If your worker's compatibility date is after a flag's default on date, then that flag will be enabled. The problem is that some flags change the public API surface and therefore the TypeScript types. For instance, the global navigator flag adds a new navigator constant to the global scope. There are currently 41 compatibility flags, although not all of them change the type surface, but if we generate a version for the types for all possible combinations of flags, we end up with 2 trillion type definitions files which is kind of infeasible.
Instead our solution at the moment is to generate an entry point of the types for each compatibility date that changes the public API surface. This doesn't completely solve the problem as users can still selectively enable and disable flags. What we're planning to do is build Types as a Service which will be a Cloudflow worker that dynamically generates NPM packages containing TypeScript definition files based on the selected compatibility date and flags. So I'd encourage you to try out Cloudflow workers if you haven't already. You can find all of the TypeJet Reclamation scripts in the Cloudflow WorkerD GitHub repo and find me on GitHub and Twitter. Thank you very much for listening.
Comments