English versionEN

Manual to Magical: AI in Developer Tooling

Learn how I built an AI-assisted codemod tool, exploring AI performance and automation. See its impressive ease and speed, inspiring you to create your own AI-driven solutions.

This talk has been presented at JSNation US 2024, check out the latest edition of this JavaScript Conference.

FAQ

RedwoodJS is a JavaScript and TypeScript framework designed for productivity in one-person projects or projects at smaller companies, providing a quick setup for full stack applications.

RedwoodJS uses code mods to help users upgrade from one version to the next, ensuring they can benefit from the latest features and security updates.

Code mods are important because they help users keep their RedwoodJS applications updated with the latest features and security patches without manual effort.

Toby uses the OpenAI SDK to generate code mods, leveraging AI to make the process quicker and more efficient.

Toby found that some AI frameworks like LangChain and Llama Index had limited TypeScript support, leading him to choose OpenAI for better documentation and quicker iteration.

Codemodder is a CLI tool developed by Toby to assist with generating code mods using AI, particularly through the OpenAI SDK.

The temperature setting in AI models affects the creativity of the output. For programming, a setting of 1 is recommended for balanced results.

Static analysis tools like ESLint, Prettier, and the TypeScript compiler are used to verify the correctness of AI-generated code mods, helping to identify errors.

The key strategy is to iterate and collaborate with AI, using static analysis tools and reinforcement techniques to refine outputs.

Users can contact Toby via his website tobi.dev or on Twitter as @tobi_dev.

case study

Tobbe Lundberg

18 min

21 Nov, 2024

Comments

Video Summary and Transcription

RedwoodJS is a productive JavaScript TypeScript framework that uses code mods to help users upgrade and ensure they have the latest features and security updates. The speaker developed a CLI called Codemodder, using OpenAI's excellent documentation and SDK. They experimented with reinforcement techniques to improve AI understanding and generated test case descriptions. The AI's creativity control parameter didn't work well for programming, so the speaker asked the AI to generate other possible inputs and let the user verify them. Verifying the code mod is done through static code analysis tools like ESLint and TypeScript compiler. Iterating between generating and testing, the speaker often ends up with an error-free code mod. Automating the evaluation process and following standard research and experimentation rules is key to improving the output. Settling for 'good enough' and measuring the impact of changes through error count is important. Collaboration with the AI using available tools, iterating, and aiming for objective performance evaluation is recommended. Codebots are great for developer experience but time-consuming to write. The speaker encourages using an AI framework with good documentation, iterating quickly, and using clear prompts. The temperature setting is not necessary for code-related outputs. Connecting with the speaker and getting inspired to build AI-powered developer tools is also mentioned.

Available in Español: Manual to Magical: AI in Developer Tooling

1. Introduction to Code Mods

Short description:

RedwoodJS is a productive JavaScript TypeScript framework for smaller projects. We use code mods to help users upgrade and ensure they have the latest features and security updates. Designing robust code mods requires writing unit tests. To make the process faster, I asked an AI to generate the code mods. I chose OpenAI because of their excellent documentation and SDK. The CLI I developed is called Codemodder.

Hello, everyone and thank you so much for deciding to listen to my talk. My name is Toby and I'm tech lead at RedwoodJS.

RedwoodJS is the most productive JavaScript TypeScript framework for one person projects or projects at smaller companies. It's perfect for when you want to get up and running with a full stack application as quickly as possible. At Redwood, we love code mods. We use them to help our users upgrade from one version of Redwood to the next. I really care about DX at Redwood and we have found that code mods are the friendliest way to help our users upgrade.

We don't just give our users a starter template and then leave them to their own devices to keep that up to date. We help our users upgrade forever because we think it's important that our users can take advantage of the latest features in your versions of Redwood and also get all of the security updates and patches to the libraries that we depend on. When a user first sets up Redwood, we know exactly what their code looks like. But over time, as the user develops their app, they most likely make changes to the code. So when it comes time to run the code mod, it can easily have been two, three or even four years since they first set up their app. And it most likely have gone through a lot of changes since then. So our code mods, they never know what the code that they will have to modify will actually look like. So we have to design them to work on a wide variety of different inputs and at the next time. And also to make the code mods as robust as we like them, we spend a lot of time writing unit tests for the code mods. This is all very time consuming.

I wanted to come up with a way to make it quicker and easier, faster to write code mods. And given the title of the talk, I'm sure that you have figured out what my approach was. And that was to ask an AI to help with generating the code mods. I have previous experience just using the OpenAI API directly with fetch calls. But for this project, I wanted to use an actual AI framework or SDK. So of course, I looked at two big ones, LangChain and Llama Index. Unfortunately, I found that their Python roots were a bit too pronounced and that the TypeScript support felt second class. So I went looking for other TypeScript-first AI frameworks and SDKs. An obvious choice here would be Vercel with their AI SDK. For me, I decided to go with OpenAI because I think their docs were absolutely the best. And OpenAI with their SDK was also the tool that let me move the quickest, which is important for an internal tool like this. So I decided to write the CLI and named it Codemodder. It's probably not the best name.

2. Implementing the CLI and Generating Test Cases

Short description:

I should have asked an AI for a better name, but Codemodder it is. My initial approach didn't work well, so I switched to the more powerful OpenAI playground. I experimented and used reinforcement techniques to improve AI understanding. Then, I implemented the prompt in my CLI and generated test case descriptions. The temperature parameter doesn't fully control creativity.

I should have asked an AI to help me come up with a much better name, shouldn't I? But Codemodder it is. That's what we're gonna work with here.

So my first initial naive approach was to try to one-shot this. Just ask the AI to please help me write a code mod that changes router imports to import from rego.js.v.router. But of course, that didn't work very well.

So just using the standard chatttypt.com web interface, I iterated it for a little while. But I pretty quickly moved on to using the OpenAI playground instead. It's much more powerful. It lets you pick different models. You can also tweak some of the model parameters, which we will get more into a little bit later.

So I went through my iterations and experimentations in the playground. I came up with a system message that I think was very good. It was clear, detailed, and had steps for the AI to follow. After that system message, I had the user messages with the details about the specific code model that I wanted. But even so, given this, the AI sometimes struggled to remember or it just chose to ignore some of the instructions I had given in the system message.

So a technique that I used here that worked very well for me was reinforcement. So at the end, after the system message and after the user messages, I take one or two of the instructions that the AI struggled to follow and reiterated them at the end, just to reinforce those rules a little bit more. And that made all the difference.

So having a prompt that I was happy with in the playground, I wanted to try to implement it in my CLI. So I came up with an initial plan. CLI would be provided input, which would be a short example of what the code could look like that the code model would have to work on, and the expected output of the code model. And then a short description of the code model itself.

So I would then ask the AI to generate a few different test case descriptions. And then finally, the test cases themselves for those descriptions. So when generating the descriptions, I wanted them to be creative or I wanted them to cover a wide variety of different inputs, because as I mentioned, we don't really know what kind of input the code model will have to work with, depending on what our users have made. What kind of changes our users have made to the initial template that we provide them with.

So I have a template, and there is a parameter called temperature, it ranges from 0 to 2, you can tweak with the models. The default is 1, and if you go up towards 2, the output gets more and more creative, and if you turn it down towards 0, the output is more and more deterministic. So I figured turning up the temperature would be perfect to get more creative test case descriptions. So what I learned is that the temperature doesn't really control how creative the entire output will be.

3. Generating and Verifying the Code Mod

Short description:

The AI's creativity control parameter doesn't work well for programming. Generating test case descriptions didn't work out, so I revised my plan. Instead, I asked the AI to generate other possible inputs and let the user verify them. With the generated inputs and user verification, I asked the AI to generate the code mod. Verifying the code mod is done through static code analysis tools like ESLint and TypeScript compiler. Iterating between generating and testing, I often end up with an error-free code mod. If the AI can't fix all errors after 5 iterations, I start over. Collaborating with the AI is crucial throughout the process.

It only controls how creative the next token in the output stream will be. So it works well for truly creative work, like a blog post, or writing poetry, or even generating images or music. But for something as deterministic as programming, in my experimentation I found it always produced the best results to just leave it at 1. So I'm going to save you the time and tell you to just leave it at 1, and don't bother tuning that parameter, because it most likely won't help you.

So anyway, so having test cases, I wanted to ask the AI to generate the actual code mod, and then I thought, take those test cases to verify that the code mod actually does what I want it to do. So I'd run the test cases, and if there were any errors, I would construct a new prompt, feed that back into the AI, and ask it to generate a new code mod with the test case fixed. So with the prompt, I'd go back, generate a new code mod, and then run through it all again. Unfortunately, I could never get the AI to generate a very good list of test case descriptions, or correct test cases for those descriptions. So this plan never worked out for me. So I had to revise it, and this is the final plan that I came up with and what I have implemented.

So anyway, the first step is the same. I let the user provide input, the expected output, and a description. For the second step, instead of asking it to generate test case descriptions, I just asked the AI to generate other possible inputs, like given this input, what could it possibly look like in a user's actual project? So I pause the program execution, and I ask the user, like me, if I'm running the CLI, ask me to take a look at the generated inputs, and if needed, even modify them before I let the program continue, and this made a huge difference. This is key. So again, break up the execution and pause, and let the user verify that the AI is on the right track.

Alright, given the initial input, the output, and the description, plus the generated set of inputs, I can now ask the AI to generate the code mod for all this. But then when it comes to verifying or testing the code mod, we don't have any unit tests because I couldn't get the AI to generate those for me. But what we do have is ESLint, Predur, and the TypeScript compiler. We can run all of those static code analysis tools and look at their outputs. So if I get any ESLint or TypeScript errors, I will construct a new prompt, so I'll hack into the AI at step 3 there, and generate a new code mod, and then run it all again. And with this, if I just iterate between 3 and 6 here, like 3 or 4 times, I very often end up with a code mod without any ESLint or TypeScript errors. It's a great starting point for some final tweaks before we have a code mod that we can actually ship to our users. Sometimes though, the AI will never manage to fix all of the errors in the code. So what I do is after 5 iterations, I just stop. I give up. At that point, it's better to just start over from the beginning, so start over from step 1, and run it all again. Sometimes I have to do that 2 or 3 times, but in the end, I very often end up with a very good code mod that we can just do some final tweaks to before we start writing test case for it, and then ship it to our users. So as I mentioned, a key thing here is to collaborate with the AI. So as I said, we paused at the step after generating possible inputs to let the user verify, modify that generated list of inputs. I would also have an AI collaboration or assistance at the end where we use our static analysis tools like ESLint and the TypeScript compiler.

4. Using Test Cases and Iteration

Short description:

Even with test cases, iterating multiple times and using automation is crucial for improving AI output. Working with AI requires a different mindset as achieving 100% correctness is unlikely. Settling for good enough can be a challenge. Automating the evaluation process and following standard research and experimentation rules is key to improving the output.

Even if I had test cases, I would totally also use those. So if you end up writing an AI-assisted tool, and you do have pre-system test cases, please use them as well. Run the code through your test cases and feed any output back into the AI, and I'm sure it will greatly improve the output that you get. And often, just doing this once isn't enough. You have to iterate. Like I said, at the last step, we iterate up to 5 times, and if that doesn't work, we start all over from the beginning. So in total, it might take 10, 15 iterations until we have a code model that we are happy with.

One thing that was a bit of a challenge for me was to switch my mindset. Working with the AI is very, very different compared to the normal code that I work with. I'm used to having—I get a bug report. I identify the bug, I fix it, and I ship my bug fix, and now the code in that part of the application is 100% correct. But that's just not how it works with AI. You're probably never going to arrive at 100% correct. You're going to have to settle at good enough, and that can be a challenge.

And also, as I was iterating on my prompt, I had a lot of ideas that I wanted to try, but it's difficult to know if you're actually making the output better or not. You have to not just look at this single input to know if the output is better. You have to look at all of them. And to make that viable, you really should try to automate it as much as possible. You have—so, you have—the key here is to think as a scientist and realize that standard research and experimentation rules apply. Only do one change at a time. Test with a multitude of different inputs, and have a way to consistently evaluate correctness or objectively measure how well the AI is performing.

5. Measuring Impact and Final Remarks

Short description:

To objectively measure the impact of changes, count the number of errors across inputs. Codebots are great for developer experience but time-consuming to write. Choose an AI framework with good documentation, iterate quickly, and use clear prompts. Temperature setting is not necessary for code-related outputs. Collaborate with the AI using available tools, iterate, and settle for 'good enough'. Measure and tweak to improve, aiming for objective performance evaluation. Get inspired to build AI-powered developer tools and connect with me at tobi.dev. Thank you!

So for me, in this test, in this case, the way I objectively measure is I just count the number of ESLint and TypeScript errors. If the average number of errors goes down across all of the different inputs, I know that the change I made was a net positive. So sometimes, even if the tweak I make make it worse for the one input that I'm currently experimenting with, it might actually have made the average across all inputs better. So then I want to keep that change.

So, in summary, codebots are great for developer experience, but they're very time-consuming to write. When picking an AI framework, find one with good documentation and pick one that you can iterate quickly with. And for some specific tips on writing AI-powered developer tooling or just AI prompts in general, have a clear, detailed step-by-step prompt if you can, and use reinforcement if you have to. Don't bother with the temperature setting, just leaving it at one. That's probably going to be the best for you when working with code-related outputs. Don't bother! Two, do collaborate with the AI, either manually and or automatically with whatever tools you have available, static advances, test cases, anything like that. And iterate with this assistance.

So at the end, if you can measure how good output is, and it's not good enough, tweak something and run again. But you do have to settle at good enough. It's probably never going to be 100% correct. And to know if you're actually improving things, you need to find a way to objectively measure how good the AI is performing. So I hope I inspired you to go write your own AI-powered developer tools and that I gave you some hints to make it enjoyable and faster to get to a useful tool for you. Thank you so much for listening to my talk. If you want to get in touch with me, you can find me at tobi.dev. Or I'm tobi.dev on Twitter, you can also scan the QR code there to find all the details. Thank you.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Debugging JS

React Summit 2023

24 min

Debugging JS

Top Content

Watch video: Debugging JS

Mark Erikson

Replay.io

Debugging JavaScript is a crucial skill that is often overlooked in the industry. It is important to understand the problem, reproduce the issue, and identify the root cause. Having a variety of debugging tools and techniques, such as console methods and graphical debuggers, is beneficial. Replay is a time-traveling debugger for JavaScript that allows users to record and inspect bugs. It works with Redux, plain React, and even minified code with the help of source maps.

best practices case study javascript web development debug

A Framework for Managing Technical Debt

TechLead Conference 2023

35 min

A Framework for Managing Technical Debt

Top Content

Tejas Kumar

Author of the "Fluent React" bestselling book, software engineer with 23 years of experience, and host of the developer-loved ConTejas Code podcast.

This Talk discusses building a voice-activated AI assistant using web APIs and JavaScript. It covers using the Web Speech API for speech recognition and the speech synthesis API for text to speech. The speaker demonstrates how to communicate with the Open AI API and handle the response. The Talk also explores enabling speech recognition and addressing the user. The speaker concludes by mentioning the possibility of creating a product out of the project and using Tauri for native desktop-like experiences.

case study artificial intelligence

A Practical Guide for Migrating to Server Components

React Advanced 2023

28 min

A Practical Guide for Migrating to Server Components

Top Content

Watch video: A Practical Guide for Migrating to Server Components

Fredrik Höglund

ephem.dev

React query version five is live and we'll be discussing the migration process to server components using Next.js and React Query. The process involves planning, preparing, and setting up server components, migrating pages, adding layouts, and moving components to the server. We'll also explore the benefits of server components such as reducing JavaScript shipping, enabling powerful caching, and leveraging the features of the app router. Additionally, we'll cover topics like handling authentication, rendering in server components, and the impact on server load and costs.

react react query next.js case study react server components react 18

Power Fixing React Performance Woes

React Advanced 2023

22 min

Power Fixing React Performance Woes

Top Content

Watch video: Power Fixing React Performance Woes

Josh Goldberg

Open Source enthusiast, TypeScript contributor, writing a book on Typescript

This Talk discusses various strategies to improve React performance, including lazy loading iframes, analyzing and optimizing bundles, fixing barrel exports and tree shaking, removing dead code, and caching expensive computations. The speaker shares their experience in identifying and addressing performance issues in a real-world application. They also highlight the importance of regularly auditing webpack and bundle analyzers, using tools like Knip to find unused code, and contributing improvements to open source libraries.

react performance case study

Monolith to Micro-Frontends

React Advanced 2022

22 min

Monolith to Micro-Frontends

Top Content

Ruben Casas

Postman

Microfrontends are considered as a solution to the problems of exponential growth, code duplication, and unclear ownership in older applications. Transitioning from a monolith to microfrontends involves decoupling the system and exploring options like a modular monolith. Microfrontends enable independent deployments and runtime composition, but there is a discussion about the alternative of keeping an integrated application composed at runtime. Choosing a composition model and a router are crucial decisions in the technical plan. The Strangler pattern and the reverse Strangler pattern are used to gradually replace parts of the monolith with the new application.

case study micro-frontends developer challenges micro frontends react

Workshops on related topic

Building a Shopify App with React & Node

React Summit Remote Edition 2021

87 min

Building a Shopify App with React & Node

Top Content

Workshop

2 authors

Shopify merchants have a diverse set of needs, and developers have a unique opportunity to meet those needs building apps. Building an app can be tough work but Shopify has created a set of tools and resources to help you build out a seamless app experience as quickly as possible. Get hands on experience building an embedded Shopify app using the Shopify App CLI, Polaris and Shopify App Bridge.We’ll show you how to create an app that accesses information from a development store and can run in your local environment.

case study e-commerce shopify shopify with react

Build a chat room with Appwrite and React

JSNation 2022

41 min

Build a chat room with Appwrite and React

Workshop

Wess Cope

API's/Backends are difficult and we need websockets. You will be using VS Code as your editor, Parcel.js, Chakra-ui, React, React Icons, and Appwrite. By the end of this workshop, you will have the knowledge to build a real-time app using Appwrite and zero API development. Follow along and you'll have an awesome chat app to show off!

case study web development realtime react chat app

Hard GraphQL Problems at Shopify

GraphQL Galaxy 2021

164 min

Hard GraphQL Problems at Shopify

Workshop

5 authors

At Shopify scale, we solve some pretty hard problems. In this workshop, five different speakers will outline some of the challenges we’ve faced, and how we’ve overcome them.

Table of contents:
1 - The infamous "N+1" problem: Jonathan Baker - Let's talk about what it is, why it is a problem, and how Shopify handles it at scale across several GraphQL APIs.
2 - Contextualizing GraphQL APIs: Alex Ackerman - How and why we decided to use directives. I’ll share what directives are, which directives are available out of the box, and how to create custom directives.
3 - Faster GraphQL queries for mobile clients: Theo Ben Hassen - As your mobile app grows, so will your GraphQL queries. In this talk, I will go over diverse strategies to make your queries faster and more effective.
4 - Building tomorrow’s product today: Greg MacWilliam - How Shopify adopts future features in today’s code.
5 - Managing large APIs effectively: Rebecca Friedman - We have thousands of developers at Shopify. Let’s take a look at how we’re ensuring the quality and consistency of our GraphQL APIs with so many contributors.

case study scalability graphql

Build Modern Applications Using GraphQL and Javascript

Node Congress 2024

152 min

Build Modern Applications Using GraphQL and Javascript

Workshop

2 authors

Come and learn how you can supercharge your modern and secure applications using GraphQL and Javascript. In this workshop we will build a GraphQL API and we will demonstrate the benefits of the query language for APIs and what use cases that are fit for it. Basic Javascript knowledge required.

case study web development graphql

0 To Auth In An Hour For Your JavaScript App

JSNation 2023

57 min

0 To Auth In An Hour For Your JavaScript App

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.js backend + Vanilla JS frontend) to authenticate users with One Time Passwords (email) and OAuth, including:
- User authentication – Managing user interactions, returning session / refresh JWTs- Session management and validation – Storing the session securely for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.

case study authentication

Build a knowledge base with Gatsby, Contentful and AWS

React Summit 2022

152 min

Build a knowledge base with Gatsby, Contentful and AWS

Workshop

Abdelrhman Adel

In this workshop, we will go over how to build a knowledge base using Gatsby, a static site generator Framework that uses React and graphQL, Contentful, a Headless CMS to drive the content and deploy it to AWS S3.

case study gatsby graphql aws