It only controls how creative the next token in the output stream will be. So it works well for truly creative work, like a blog post, or writing poetry, or even generating images or music. But for something as deterministic as programming, in my experimentation I found it always produced the best results to just leave it at 1. So I'm going to save you the time and tell you to just leave it at 1, and don't bother tuning that parameter, because it most likely won't help you.
So anyway, so having test cases, I wanted to ask the AI to generate the actual code mod, and then I thought, take those test cases to verify that the code mod actually does what I want it to do. So I'd run the test cases, and if there were any errors, I would construct a new prompt, feed that back into the AI, and ask it to generate a new code mod with the test case fixed. So with the prompt, I'd go back, generate a new code mod, and then run through it all again. Unfortunately, I could never get the AI to generate a very good list of test case descriptions, or correct test cases for those descriptions. So this plan never worked out for me. So I had to revise it, and this is the final plan that I came up with and what I have implemented.
So anyway, the first step is the same. I let the user provide input, the expected output, and a description. For the second step, instead of asking it to generate test case descriptions, I just asked the AI to generate other possible inputs, like given this input, what could it possibly look like in a user's actual project? So I pause the program execution, and I ask the user, like me, if I'm running the CLI, ask me to take a look at the generated inputs, and if needed, even modify them before I let the program continue, and this made a huge difference. This is key. So again, break up the execution and pause, and let the user verify that the AI is on the right track.
Alright, given the initial input, the output, and the description, plus the generated set of inputs, I can now ask the AI to generate the code mod for all this. But then when it comes to verifying or testing the code mod, we don't have any unit tests because I couldn't get the AI to generate those for me. But what we do have is ESLint, Predur, and the TypeScript compiler. We can run all of those static code analysis tools and look at their outputs. So if I get any ESLint or TypeScript errors, I will construct a new prompt, so I'll hack into the AI at step 3 there, and generate a new code mod, and then run it all again. And with this, if I just iterate between 3 and 6 here, like 3 or 4 times, I very often end up with a code mod without any ESLint or TypeScript errors. It's a great starting point for some final tweaks before we have a code mod that we can actually ship to our users. Sometimes though, the AI will never manage to fix all of the errors in the code. So what I do is after 5 iterations, I just stop. I give up. At that point, it's better to just start over from the beginning, so start over from step 1, and run it all again. Sometimes I have to do that 2 or 3 times, but in the end, I very often end up with a very good code mod that we can just do some final tweaks to before we start writing test case for it, and then ship it to our users. So as I mentioned, a key thing here is to collaborate with the AI. So as I said, we paused at the step after generating possible inputs to let the user verify, modify that generated list of inputs. I would also have an AI collaboration or assistance at the end where we use our static analysis tools like ESLint and the TypeScript compiler.
Comments