The loop iteration will not complete until this step succeeds. And of course, the whole point of Ralph is to have an AI coding agent run autonomously. And to do that, you cannot have it keep checking in with a developer or a person, you know, to ask, hey, can I edit this file? Can I use this command? So often people use it with the skip permissions flag, which can be quite dangerous. So people have actually almost wiped their entire file systems with this option, because Cloud Code sees, hey, this Mac is running out of storage space, but I need more storage for this task. So let me just go find some stuff that I can delete so I can continue my work, which is pretty bad, of course. So I want to highlight this line in the script, which is running Docker sandbox run Cloud, which is basically a sandbox version of Cloud Code, in which it still runs without the permission checks, but it cannot actually go and destroy your whole laptop or other device that you have, which is the better way to work, in my opinion, much safer.
So let's go and talk about this specification that we have been talking about. You don't actually have to write it all by hand, and I recommend using an LLM to help you write this thorough spec, but it requires some iteration with the LLM, actually. So here's an example of Cloud Code, and here I tell it what I want to create, and it's not a super clear description yet, but it doesn't matter because we tell Cloud Code, hey, I want you to have a discussion with me about this, and to interview me to get clarity on the implementation. So then Cloud will continue asking you questions over and over again, until it thinks that it has clarity, and then you tell it to make the specification files. And of course you get this output, you have to go over this line by line and review the whole thing to make sure that it actually aligns with your vision, and if it doesn't match, then you have to tell Cloud, or you just edit it yourself, until you get something that you 100% agree with. And the other file is the implementation plan, which is basically a sectioned set of tasks that will match the specification, and you also have to check this one line by line to make sure that it matches your vision so that Cloud doesn't go out and do a bunch of stuff that you don't want it to do. So you have to really be thorough here in this preparation step.
For this example, I used it to create a boilerplate for an internal hackathon that we're going to run later this year. And I found the quality to be pretty high, very high actually, especially considering that it was mostly hands-off except for the upfront trade-off of the time investment into the speccing and creating the plan basically, especially considering that it was on the $20 a month plan, and I wasn't even using Opus, I was using Sonnet 4.5, which came out with very good results. So I know we said that it would be an autonomous coding approach, but it doesn't mean that we actually don't have to work. We still have to, you know, especially when we're getting started with Gralf, it's best practice to just take a look at how it's doing in the beginning and see if it's going off the rails or not, and then you can update your prompts or your specification to actually fine tune it until it does what you want it to do. And then you can leave it off to the races. Of course, there are downsides when using Gralf. Of course, it requires a lot more planning upfront. And you have to be careful not to make the tasks too big, or else it will still go ahead and compact in the end because the context window will get too big and that's not good. It's not always the fastest approach. For some tasks, it's still faster to just do it by hand or use some tab complete stuff to do it. And another point is that you actually pretty easily use up all the tokens in your subscription. Of course, there are also a bunch of upsides. You can now run this autonomously while you're doing other tasks or relaxing, whatever. It's great for exploring proof of concepts or other tasks that are very clear in what it needs to do. It's a really good specification that is able to do its best work basically. And you will still easily use up all the tokens in your subscription, which is great when you often see that you actually aren't using the whole subscription that you're paying for.
Comments