Video Summary and Transcription
This Talk explores the power of Abstract Syntax Trees (ASTs) in software development. It covers the use of ASTs in maintaining React examples, automating dependency identification, and introducing generic typing. The Talk also discusses using ASTs to reason about and generate code, as well as their application in building ESLint plugins and code mods. Additionally, it highlights resources for exploring ASTs, testing AST scripts, and building Babel plugins.
1. Introduction to ASTs
I'm going to talk to you about ASTs and the power they provide. Let's get started.
I should have an intro like that all the time. That would be really good. So, yeah, as you've heard there, I'm going to talk to you about ASTs, because these are a topic which, before I joined AG Grid, were always a little bit scary, always a little bit like, oh, I don't know if I know how to use those. But once I got into it, I got your hands into it, got the hand dirty, you realize there is so much power using them. And that's what I want to convey to you today and maybe get you started on your journey with ASTs. So let's get started.
2. AG Grid Mission and Key Tasks
The mission for us at AG Grid is to build the best JavaScript data grid. We have two key tasks: maintaining React examples and introducing generic typing. We want to provide code that aligns with your needs and ensure seamless integration with TypeScript.
So the mission for us at AG Grid is to build the best JavaScript data grid. And so there's a number of tasks that we need to do to make that happen. And when I joined the company two years ago, there were these two key tasks that needed to be done.
We needed to maintain thousands of React examples so that when you come to our docs, you can see, I want this feature, I want it in React. I'm still using classes because my company hasn't upgraded yet. Or I'm using hooks or I'm using hooks with TypeScript. So we want to give you the code exactly the way that you want to consume it.
And then the second one is we love TypeScript. So we want to introduce generic typing across our product. So if you give us an interface for your row data, we want that to flow through all of our interfaces so you get lovely auto-completion and type checking. So the question is, how am I going to do this?
3. Introduction to ASTs and React Examples
In our docs, you'll find numerous examples and different ways to import AgGrid. We're adding generic typing across interfaces, avoiding the slow and error-prone brute force approach. ASTs provide a powerful tool to complete tasks efficiently. Let's start by maintaining thousands of React examples and understanding abstract syntax trees.
So in our docs, you'll see a lot of these examples. As I said, we've got three versions for classes, hooks, hooks with TypeScript. And then there's two different ways you can import AgGrid across 500 different feature examples. And you can see we're up to 3000 demos. So hopefully you'll be like, I'm not going to do that manually. Because if you're going to do it manually, you're going to have to employ lots of people, you're going to make mistakes, your documentation is going to be buggy and no one is going to win. So there's no real brute force approach for that.
And then if we look at the second task, which we'll try and solve, is we're adding this generic typing across hundreds of interfaces. So you can pass this interface to our AgGrid React component, and then we need it to flow through everywhere else where you might have a callback or handlers. So for this one, you could do the brute force approach. You could find where you use row data, go in there, change the type, rebuild. You know, that type is then, you need to move down the hierarchy a bit more, so you find those interfaces, add it in, and you see it's just this kind of cycle which is a bit slow and tedious and a bit error-prone as well. So we don't really want to do that. You could, but probably don't want to do it.
So this is where we get to this quote, which is or maybe not from Abraham Lincoln. So give me six hours to chop down a tree and I'll spend the first four hours sharpening the axe. So the idea here is, instead of just jumping straight into a task, we're going to make sure we're using the right tool. And if we have got an axe to hand, we're going to make sure we know how to use it and that it's sharp and ready to go. And so ASTs, that's going to be our axe and that's how we're going to complete these tasks. So, let's tackle our first one. We want to maintain thousands of React examples, but the starting point is that we've got all of these as Vanilla TypeScript examples. So that is our starting point and we need to get via the AST into React examples.
So I should probably stop there and say, well, what is an abstract syntax tree? I've been saying these things, these great products or great tools, but what is it? So a kind of a bit more formal definition is an abstract syntax tree is a hierarchical representation of the structure of a program. So it captures the syntax, the semantics in a tree-like structure. Or you can think of it, take your text file of your code and it turns it into a tree which you can write code against. So you'll have nodes and it's all kind of recurses down. So there's a lot of recursion in ASTs, but we've all passed our interview questions with traversing trees. So you should all be experts at this. So as a simple case, this is what an abstract syntax tree would look like. So we've got this sum of A and B.
4. Understanding ASTs and Tree Structures
We have a variable declaration node with the name 'sum' and a binary expression inside it. The tree structure allows for nested expressions.
So we've got this sum of A and B. It's a variable declaration node which then has a name, which is the sum. And then inside that variable declaration, it then goes through to a binary expression. So you can see you've got the operator and you've got the right and left identifiers nicely switched in my diagram, just to keep you on your toes. But this is the idea. So you take your code and it forms this tree. And you can see how this tree can just keep going. Say A isn't an identifier, maybe that's another expression, and then that would go down further into another tree.
5. Writing a Code Generator and Using ASTs
The first step in writing a code generator is to categorize the source code. For AgGrid, we have static config, callbacks, and standalone functions. ASTs are used to identify these sections. TypeScript makes it easy to create ASTs and extract information. We generate the AST, traverse it, and pass the model to the React code generator. The callback changes from a variable to useState and from a function to useCallback.
So the first part of writing a code generator is to categorize the source code that you're gonna feed into it. So part of the grid options for AgGrid is we'll have these static config. So like what's the default column definitions, the column definitions themselves, and then we've got these callbacks.
So they're gonna be turned into React callbacks. And then you might have these standalone functions, which are just like an artifact of the example for like clicking a button to make something happen. So we're gonna split these out, and we're gonna use ASTs to identify these different sections. And this is a great tool to introduce yourself to ASTs. And we'll have a live demo of this in a bit. But you can put your code in there, select what parser you want. I'm focusing on TypeScript, but there's lots of different parsers that you can use, and they all have their own version of the AST. And with this, you can drill down, okay, this is what a function looks like. And so from this, you can learn, okay, when I'm looking for this, I need to match this kind of structure.
So in TypeScript, it's quite easy to create one of these ASTs. You give it the source text, you use the TypeScript compiler itself, say, create a source file, and that gives you back an AST. And then you can write code to traverse that tree and we'll pull out all the information we want. So this is one of the matches that we've implemented. It's basically saying when I'm going through the nodes, is this a function call? And if I've found something which looks like a function call, then I'm gonna pull out the information of what's the name of this function, what are the parameters, and what's the body. Because we're gonna need to reorganize these and put these into the React format. And the TypeScript gives us some automatic or built-in ways of calling functions on a node to find out what it is. So it might look like something like this. But these are kind of like real small details and it will be different depending on what you're actually trying to write. But the concept is you can use TypeScript to identify these nodes.
So we'll generate the AST, we'll traverse it, pull out all of the code into our model, and then we're gonna pass that model into our React code generator. So this is what that callback might look like. So instead of a variable, we've now got useState. Instead of just a function, we've got useCallback. We run our example. When you click the button, the row heights are meant to change, but nothing's happening. Now I'm guessing some of you have probably already noticed what was wrong with it. If I go back.
6. Using AST to Automate Dependency Identification
No dependency. Our code generator can't simply transform the function. We need to identify the dependencies from the TypeScript code. Can we use the AST to automate this? Yes, of course.
Can you see what's wrong? No dependency. So we've got a stale closure. So even though we're updating the state, our effect isn't re-running and picking up that change. So what this means is our code generator can't simply just transform the function. We've got to work out as well, well for a callback, what are the dependencies that I need to put in my array. So when we ran into this issue, the gut response was, oh no. We're going to have to do this manually. We're going to have to go through these examples and update the dependencies. And you know, you think, oh no, I don't want to do that. Because it needs the dependencies to work, but we need to identify them from the typescript code. So the question really is, can we use the AST to automate this? And I won't drag on, yes. Of course we can, otherwise I wouldn't be here speaking. I'd be still updating the react examples. So let's have a look at what this might look like. And this is also, I guess, the example of how you can use this tool to work things out about how you need to write your code and what you're looking for.
7. Exploring Dependency Identification with ASTs
So we can come in here and if we start clicking on things, it will highlight it on the right. We can have a look and say, well actually this is in the return statement. It's defined here. We can see from our body, our function that params actually comes from the parameters. The only one that I need to check is swimming height. We can update our generator and then swimming height will be automatically picked up as a dependency. Any new React examples that are written, they'll also flow through the same process. This is a very simple version of the React hooks exhaustive depth check.
So we can come in here and if we start clicking on things, it will highlight it on the right. So here I've selected default height. So this is, I guess, a constant. So we can have a look and say, well actually this is in the return statement. So, okay, it's an identifier, so that's something I've used, but where is it defined? Well, it's defined here. And if we go here, we can say, well this is a variable declaration. So it's like, well actually this variable has been defined within my function, so I'm not gonna need to add that to my dependency array.
So if we look for some more variables in here, so, well we could see sport, and that's an identifier. But that's not actually something which should be in the dependency array because, well, it's actually the params object, which we might need to check. So here we can click on the params one and see where that exists in the tree. So this is a property access expression. And you have to traverse down that tree to find the params. And then once you've got to the bottom where you can't traverse anymore, then you can find out, well, this identifier itself is the one I should check. But again, we can see from our body, our function that params actually comes from the parameters. So we've got to build up this kind of logic of, well, actually, the only one that I need to check is swimming height. So that's something which there's an identifier within my function body. Oops. That hasn't been either defined inside as another variable or it's coming from the parameters. So we come back.
So then you might end up with some logic like this saying, well, find me all the access properties. So look for all those identifiers. And then is it a local variable? Variable. Is it one of the arguments that has been passed in? And then also have some kind of global check because we might be using things from the global scope that we don't want to including our dependencies. And once we've got this kind of logic, we can update our generator and then swimming height will be automatically picked up as a dependency put in there and then our example starts working again, which is great. And then that also means this is future proofed. So any new React examples that are written, they'll also flow through the same process. And we can use the AST there just to automatically get this right. So you're not only saving time now, you're saving future time as well. And now what we've done here is a very simple version of the React hooks exhaustive depth check. So the code for this is much more complicated because they've got to cover a lot of use cases, but it's the same process.
8. Introducing Generic Typing with tsmorph
The task involves introducing generic typing across the code base, which includes updating interfaces and passing generics to the row node. It's a challenging task with hundreds of interfaces and types. To simplify the process, I recommend using tsmorph, a project that wraps the TypeScript compiler and provides a higher-level abstraction. With tsmorph, you can extract interfaces and reason about them easily. By looking for the data property and checking its type, we can make further progress.
If you look at the source code for this, you'll see it's a book talking about nodes and where is this access? Where has it been defined? And so hopefully it gives you an idea of how these code linters are working and what they're doing.
So then we think, relax, we've done that task, take a bit of a break and then get onto our second task, which is introducing generic typing across the code base.
So yeah, so this is something which, I, before I worked at aggrid, this is something that I wanted. I had added my own interfaces in and so as a task I came in and said, can I do this, can I make it better? And so we've got this, I've got permission but now I've got to actually do the work.
The kind of changes that this involves is taking, so I does filter parse params and see we've got this data, as an any, which we want to be a generic. And also we need to pass that generic down to our row node. So whenever someone's accessing data off the row node, they have the correct interface there. So there's a lot of interfaces and there's a lot of chaining of interfaces that would have to go through an update.
So why is this difficult? Well, there's about three to 400 interfaces and types and there's this interface. So it's gonna be possible to do it by brute force but you might miss some or it's just gonna be quite tedious. And that's where I wanted to suggest an even sharper axe to you.
So there's this project called tsmorph. And what this has done is it's taken the TypeScript compiler and it's wrapped those matcher functions with a much higher level abstraction. And so this is gonna make it a lot easier to I guess, write your logic in a way where you don't actually have to know the difference between the different types of node or it's just a variable expression or is this an interface. So it does a lot of the work for you.
So with this, we import ts-morph and we'll point to a ts-config file and that will build us a project. And then from that project, you can say, give me this source file. So here I'm saying, give me the grid options source file. So that gives it to me. And then I can say, well actually give me all the import declarations because I want to know where the files are that contain all of the interfaces used on our grid options. And then we iterate over all those imports, get the source files. And then from each of those source files, we extract all the interfaces. So you can see the code we're writing here because of ts-morph is actually quite straightforward. We're not having to really get nitty gritty, but we're extracting all the information. We're collecting all these interfaces so that we can then reason about them. And then we're going to do something a bit like this. So we've got all our interfaces. We're then going to look for the data property on them. So using this like get property. And then we'll look at the type, say, is that an any type? And then this is the next stage, which probably I should have highlighted a bit more in the slides.
9. Using ASTs to Generate and Reason About Code
We can use abstract syntax trees to reason about and generate code. AST explorer.net is a great site for experimenting with code. Understanding ASTs can help with contributing to code linters. Finding documentation about ASTs can still be challenging, but there are resources available. A good article on dev.ca provides examples of writing a Babel plugin.
Is that we're not just reading now, we're going to start setting, and we're going to start changing our code via these calls. So we've got this set type. So we're going to set it to TData. And then we're also going to add a type parameter to that interface. And give it the same matching name. And we're going to record that we have updated this interface. And then once we've updated all that level of interfaces, we can run through and look for this hierarchy chain. So if we find an interface that is extending another one, which we've updated, then we repeat the process. And so add the generic types to that again. And push that into our queue and repeat the process so that we're going to traverse these trees automatically.
That looks like the same slide. Yep. And then at the end, you do this project.savesync. And then that will do, push all those changes into your code files, generating, well, it won't generate the PR for you, but then you can see, we've made a lot of code updates just from our single generator. And the nice thing about this is, you can then say, well, actually it wasn't quite right, I'm going to tweak it, you know, discard all those changes, run it again. So that instead of individually tweaking all these files, we've got a really quick way of getting a big code change done, but in quite a controlled and structured way.
So my takeaways I think I've, yeah, just about right time. So you can use abstract syntax trees to both reason about and generate code. AST explorer.net is a great site for you to put your code in and start playing around and seeing what it might look like. And then hopefully this will help me to understand and maybe even contribute to code linters. So with that, I'm going to leave you to sharpen your AST axes. Thank you. I think it's been like four years ago I had the plan to write a ES-lint plugin and I had to look at an AST. And what I found back then is that it's doable but it's really daunting because it's a big file, even like for a small piece of code, like five lines of code, it's a really long AST. And it's really hard, it was back then, to find any documentation about what stuff means. Is that still the case? Yes, I would say that's probably still the case. Great motivation for the people here, yeah. But then I mean, let me follow that. I think there's also, then there's probably a lot more content out there where people have written examples about this is how I've written my first Babel plugin. So there's a good article on dev.ca that I can share as well.
10. Using TS Morph and Generating ASTs
There is content available with cut-down examples to get something running. AG stands for Agnostic Grid, supporting TypeScript, React, Angular, Vue, and more. TS Morph can be integrated into a pipeline tool like Codacy. You can generate ASTs from vanilla JS, but TypeScript preserves types. AST is a general technology applicable to various code types.
Where someone has said, well I've written a plugin which pulls out the debug statements or turns asserts into console.logs. So I think there is content out there now where people have given, I guess, cut down examples of, this is all you need to do to actually get something running. So the situation improved, but it's still a bit painful. Still a bit painful, yeah.
Side question, what does AG stand for? AG stands for Agnostic. Agnostic Grid. Agnostic Grid, so yeah, so we support within a TypeScript or React, Angular, Vue as solid. So yeah, so it's agnostic and you can use it within any of those frameworks. So it could be AST grid also. Yes.
All right, first question from the audience from Anonymous. Can we integrate TS Morph into a pipeline tool like Codacy? Do you think we should do that? I don't see why not. Because, yeah, so TS Morph is just an npm package. And so I guess if you've got access to node in your build pipeline, then you could run it and have those changes. I guess you'd want to make sure that it's not changing code too much before you commit it and have controls in that place, but yeah, I don't see why not. All right, cool, thank you.
A question from Chris. Can you generate ASTs from vanilla JS also instead of TypeScript like you are doing? And if so, are there any limitations? So the reason I use TypeScript for this was that I wanted to preserve types and bring types across from a TypeScript code to be able to write React hooks with TypeScript. And so yeah, there's JavaScript, parsers. The fact that this was TypeScript is no restriction. It's just, you've got then extra type information. So the JavaScript AST will probably be simpler as well. Yeah, I think AST is a general technology. Correct me if I'm wrong, but you can even get an AST for CSS file, right? Yeah, yeah, yeah. There's no... Yeah, it's a general concept. So I think Babel is another one which will produce its own AST. Yeah, it's just a way of defining any code. If it's front-end code, back-end code, styling, content, interaction, it doesn't matter. So it's, like we said, daunting to get into, but interesting anyway, and you can do a lot.
11. Building ESLint Plugin for Class Names
I built the class names package with ESLint to enforce consistency in the code base. Although it took a few days to develop the ESLint plugin, it future-proofed the code and saved time in the long run. People in the code base are grateful for this improvement.
Yeah. The thing I built with the ESLint was the class names package, which is kind of, everyone uses it, I think, right? And the code base where I was working, people were importing it SCX, and other people were using it as class names. So it was inconsistent, and I just thought, well, I'll just force this with a little ESLint, or I could have just done a find and replace, and the entire code base would have taken you 10 minutes, including getting the PR too. I spent, like, three, four days writing this ESLint plugin just to enforce this. But then, that's where you've then future-proofed the code. So the find and replace method is good for one-offs, but if you want to practically then enforce that going forward, it's good to have that kind of rule. So, you know, you'll get that time back. Yeah, there's probably still people in that code base thanking me, oh, Mattin built this, oh, thank you, Mattin.
ASTs and Code Mods
There are limited online resources for common problems solved with ASTs. The use cases can be quite specific, and the code examples often have assumptions baked into them. Writing a completely generic solution is challenging, but tools like code generators can provide targeted scopes. The tool to generate code examples is developed once and rarely updated. However, escape hatches can be built for edge cases. As for code mods using ASTs, outside of AG Grid, an exciting project called Mitosis by builder.io offers ASD converters to support a wide range of frameworks.
All right, next question from Anonymous. Oh, I didn't mark this one as complete. Are there resources online for common problems solved with ASTs? Except, for example, the callback example? The callback example, as in for- just general, yeah, like I said, there's not a lot of content. There's not a lot of content. You have got to go looking, which is a bit unfortunate, and I think that's probably because people aren't, maybe they're not aware, maybe there hasn't been enough talks, you know, telling people how to use these ASTs. So maybe I need to go and write a blog post about this. But then also the use cases can be quite specific. So like the code that we've got, it's got assumptions baked into it, which really simplify and reduce the problem scope. So, you know, that's why, you know, the code that we've got to identify dependencies, you know, you can fit in this window, whereas the official React exhaustive depth check is, you know, thousands of lines long. So yeah, so I think that's where it's tricky because to write it completely generically is really hard and that's why we've got these tools to do it for us. But when you want to use that, it's nice to have a targeted scope for it. And is that, by the way, is that tool to generate the code examples, is that like an ongoing thing that's continuously being developed or it's developed once it's done? Now, we never look at it, hopefully never touch it again. Yeah, it's pretty much never looked at. There'll be sometimes where there might be an edge case, but then this is where the beauty of these kind of tools is that you can build yourself escape hatches. So you don't have to spend... It's like the 80, 20 rule. So we can get the code generator 80% of the way, but that last 20% to make it completely foolproof would take you like so much longer to implement. Whereas sometimes it's good just to have an escape hatch and we'll just manually write an example, which doesn't work. And that gives us the best of both worlds. We can write a really complex example, you know, just manually write the react, but then for the majority of the work, which is just, you know, take this example, convert it, then they will just hands off.
A question from anonymous. Have you ever built code mods using ASDs? And do you think it would be possible to transpile code from any language to another language using ASDs? So let's start with the first question. Have you ever built any code mods? Not outside of AG grid. No. Then I think what I would say is I'd point you at a really exciting project from builder.io called Mitosis. So for their builder.io components that they've got, they themselves write them in this Mitosis language, which then they've got ASD converters to React, Svelte, Angular, a much wider range than what we cover. So that is an interesting product to look at if you yourself need to support lots of different frameworks with your core component. So there's people out there doing this. So yeah, so that's Mitosis.
Exploring AST Testing and Babel Plugins
Mitosis is a recommended resource to explore ASTs. Testing AST scripts can be done using a snapshot test approach or by comparing the generated AST with the expected output. Codeshiftcommunity.com provides AST examples for those interested. Many Babel plugins can be built using ASTs, including one to remove debugger statements from a codebase.
So yeah, so that's Mitosis. Yeah, we've seen Miska's talk and mentioned it. And yeah, I already have it on my to do list so check it out, Mitosis.
Question from Nick. Is there a good way to test AST scripts? Important question. Yeah, so I guess the nice thing about AST script is like you could have quite a nice snapshot test approach where you have a well defined input or lots of different kind of inputs, pass them into your AST and then validate what comes out. So they really do quite lend themselves nicely to testing. Or if you want to do it across the code base, you know, have a clean checkout and see the Git diff that it produces. Yeah, it's pretty proven technology by now, so it's pretty stable. So it's, yeah, you can test it, of course you can just have a script and generate the AST and compare the string with the output you expect, right. Yeah. So you have some stability. Seems, yeah, good question, Nik. I think it's, if you're depending on it with such a big code base, then it's smart to test stuff.
Another question, but a note from anonymous. Codeshiftcommunity.com features AST examples. So anyone here, you can see the link up there. If anyone is interested in looking at ASTs, if you're doing something at your company and you think, oh, my God, I can do this with ASTs, there are some examples at codeshiftcommunity.com. So thanks a lot, anonymous. Thank you. For sharing that with us.
All right, that is the last question. Oh, we have a new one coming in. Any ideas on any Babel plug-in that can be built using an AST? There's a lot already. Yes, there's, yeah, you could do a lot. And so that's where, yeah, I should have linked this article, but it was basically, they started really simply by saying, well, can I remove debugger statements from a codebase using a Babel plug-in? And so the lines, the code to do that is really quite small. So yeah, so there, there are resources out there. I should have got the link, but yeah, I'm sure there are a lot and that you can write your own as well. All right.
Comments