Video Summary and Transcription
ASTs, or abstract syntax trees, are used by popular tools like Babel, TypeScript, ESlint, and Prettier to improve the developer experience. They have various use cases including compiling and code analysis. Editor tooling and writing tools can be enhanced using ASTs, with examples including formatting with Prettier, type annotations in JetBrains editors, and code mods for framework upgrades. Ts-morph is a useful tool for code transformations, while Tree Sitter is a portable tool that supports many languages and can be used to build IDEs or text editors in the browser.
1. Introduction to ASTs
Today I'll be talking about ASTs and how trees aren't just foliage. ASTs, or abstract syntax trees, represent the important bits of our code. Popular tools like Babel, TypeScript, ESlint, and Prettier use ASTs to improve the developer experience. ASTs have various use cases including compiling and code analysis. ESLint, an AST-based tool, is widely used for code analysis.
Hi, everyone. My name's Chris, and today I'll be talking about ASTs and how trees aren't just foliage. First, who am I? I stream programming way too much on Twitch, I'm a lazy YouTuber, and I'm a senior front end engineer at Fairwinds. At Fairwinds, we can help you optimize and manage your Kubernetes clusters at scale.
So first, what is an AST? It's a type of syntax tree. Concrete syntax trees represent every detail of our code. That means the parentheses, the brackets, the semicolons, the indentation, all of it. Abstract syntax trees actually represent the important bits of our code. So if we think about an if statement, the clause as well as the block are separate nodes but still related and linked together. Let's take a look at an image that explains it a bit more. So here we have an example of a statement sequence. Within that statement sequence, we see a while loop and a return statement. Within the while loop, we see the condition and the body. And within the condition, we see a comparison operator and the things being compared. All of these things kind of branch and tie together, almost like a family tree, which is kind of interesting.
Some popular tools that most of us are using day to day are Babel, TypeScript, ESlint, and Prettier. All of these use ASTs to make our developer experience better. So what are they doing? What are some common use cases for them? Why would you want to use ASTs to write dev tools? Well, some use cases are compiling, code analysis, editor tooling, and code mods. For compiling, we could think maybe about TypeScript being turned into JavaScript via TSC, which stands for TypeScript Compiler. Back in the day, turning ES6 into ES5 via Babel is one way to go. Now we use more modern versions of JavaScript in browsers that might not support them yet. That's what Babel is doing. And technically, Babel used to be called six-to-five when it was limited to ES6 and ES5. We also have JSX being turned into react.createElement. That's another compiler step. And compilers in general for other languages, you know, to turning C into assembly. That's another way of compiling using ASTs. When we think about code analysis, ESLint is probably the first thing we think of. But there's some older tools like JSHint that were not AST-based. The reason the ESLint is the one we all care about and think about these days is because it's easier to write a rule or an extension to ESLint via ASTs rather than the way JSHint was doing it back in the day.
2. Editor Tooling and Writing Tools with ASTs
For Editor tooling, we could consider formatting with Prettier, type annotations in JetBrains editors, syntax highlighting with semantic highlighting in VS Code, and code mods for framework upgrades. Writing tools using ASTs involves using TypeScript (ts), unist, tsmorph, and tree sitter. Unist provides a baseline for other tools like HAST, MDEST, ZAST, SAS, SCS, and less. Ts-morph is useful for code transformations like react-code-mod.
For Editor tooling, we could consider formatting. Prettier is breaking our code down into an AST and building it back up again with the standard indentation, where the brackets go, all those things in a pure standard formatted way that we don't have to care about or manually do by hand, which is awesome. Type annotations in JetBrains editors are really cool. So you could get the names of arguments to a function annotated. If you're not passing an object with props, if you're passing ordered arguments, having those named is really nice as a developer experience. They can also annotate inferred types when you're assigning a variable or a return value.
Another really interesting Editor tooling aspect is syntax highlighting. Keep in mind that most syntax highlighting is text-made grammar-based, which is fundamentally regular expressions under the hood, but VS Code ships a feature called semantic highlighting that uses knowledge from the language servers for various languages to give us a little bit more accurate highlighting of what's going on in our code.
And finally, code mods. So think about framework upgrades. Nowadays, we don't have to do import React from React, and if you have a codebase that is doing it, if you've updated your bundlers or compilers, you might not need to do that anymore. So instead, we could have React code mod remove that for us automatically throughout the entire codebase instead of going file by file, manually removing those things. Another option is turning function dot bind this into an arrow function, which is what a dot bind this for a function is doing under the hood, but it's just syntactic sugar. All of these things are just nice for framework and library authors to implement as things that help us upgrade and stay up to date, because they don't want to treat and deal with bugs for legacy versions of their framework anyway.
So how could you write some tools using ASTs? What are some tools or write tools? Well, let's look at ts, that is TypeScript. Let's look at unist, tsmorph, and tree sitter. Those are the ones that I'm really going to dig into. So ts is literally TypeScript. You would just import TypeScript, do ts dot create source file from the text snippet that you have, and then you can iterate every node within that tree using for-each-child, looking for the node type that you're caring about. Here, I'm looking for an A class declaration. Unist is very interesting because there is no standard for how ASTs should specify props and how we can understand what parts of the code they're from and what type of node it is, etc. One thing to remember, though, is that unist is not intended to be self-sufficient, it's a baseline for other things to use. So, HAST for HTML, MDEST for Markdown, ZAST for XML, or SAS for CSS, SCS, and less. All of these tools add their own props to that node type that help give us a little more information about what the code is doing and more metadata that we can use to manipulate that code or understand it better. One important thing is there's something like 32 utility functions written for unist, and there's probably even more. All of these utility functions can be used for any unist compatible syntax tree, which means you don't have to write a lot of those like visitor pattern functions by hand, which is really cool. ts-morph is very interesting because it is useful for code transformations. Think code mods like react-code-mod. I believe react-code-mod uses JS code shift, but I could be wrong about that. But still, ts-morph is something I have experience with from my own usage, and it's quite useful.
3. AST Tools and Tree Sitter
One thing to remember is that ts-morph has functions for manipulating code, but you may need to write custom functions for traversing your code. ESLint allows you to create your own AST-based tools and enforce coding standards. Tree sitter is a portable tool written in pure C that supports many languages and can be compiled to wasm. It can be used in the browser to build IDEs or text editors without a server. Some tree sitter parsers may not compile to wasm properly, but you can still support over 40 languages in the browser.
One thing to remember, though, is it is not unist compatible. I believe it's using a very similar node to TypeScript, so you're not going to be able to use any of those utility functions from the unist ecosystem. ts-morph does have some really good functions for manipulating code, but as far as traversing your code, you may want to write some custom functions for that yourself.
And eslint is another interesting way of creating your own AST-based tools. You could write your own rules that enforce coding standards at your organization and that's awesome! One thing to remember though is it's not the same AST structure that you've become familiar with with unist or TypeScript. The reason for that is very simple. It existed before those things were prevalent enough to, you know, build on top of.
And another interesting part of writing a rule for ESLint is that you pass an object to ESLint that has a prop that is a selector and a function for that prop that is a callback function that ESLint is going to call as it traverses the tree. Why is this important? Well, if every rule traversed the tree from top to bottom on its own, ESLint would be really slow. So instead it can do a single traversal, call your callback function when you hit a node that you're looking for, and then from there you can navigate relative to that node that you found. So you can still crawl the tree a bit, but you don't want to do an entire traversal most of the time.
And last but not least, tree sitter is very interesting. It ships with NeoVim by default, I believe, and it supports a lot of languages. It's written in pure C, which makes it really portable. So it's usable in many languages, and that means that it's also compilable to wasm, which is cool. Like you can use it in your browser now. So you could write basically an IDE or a text editor of some kind with you know parser specific knowledge right there in the browser with no server necessary. One thing to remember about the wasm output is that some of the parsers for tree sitter don't necessarily compile to wasm very well. I ran into it a couple years ago, maybe a year ago, where I think like eight of the 40 plus parsers that I had available did not compile to wasm properly which is fine. But maybe they've improved it and there's always room for more to be created. So yeah you could support 40 plus languages in the browser writing your own tool. So yes, I know I kind of glossed over a few things, I jumped a little quickly, but hopefully you have enough keywords and knowledge to go research some of these things and write some tools that will help improve your life and the developers on your team's lives around you.
Comments