English versionEN

Designing Sandboxed Dev Environments for Coding Agents

Naresh Ramesh

Cloudfare

This ad is not shown to multipass and full ticket holders

React Advanced 2026

October 23 - 26, 2026

London, UK & Online

We will be diving deep

Learn More

Bookmark

Sentry

Promoted

Code breaks, fix it faster

Crashes, slowdowns, regressions in prod. Seer by Sentry unifies traces, replays, errors, profiles to find root causes fast.

Get started

Coding agents have crossed the threshold of solving non-trivial problems alongside engineers, as well as running autonomously in many ways. Their most surprising ability has been in their effectiveness at non-coding tasks too the moment universe is represented on a filesystem or accessible from unix-like environments. Agents will soon be writing, accessing and powering software — and this talk covers the fun challenges of designing sandboxed dev environments across these paradigms, enabling agents to run entirely within sandboxes or using sandboxes as programmable, ephemeral tools.

This talk has been presented at AI Coding Summit 2026, check out the latest edition of this Tech Conference.

FAQ

The demo serves to illustrate how an agent can be used to make a pixel animated goose follow the cursor by implementing changes in the code base.

The sandbox environment provides isolation with a micro VM, a container running Ubuntu with pre-installed dev tools, batch sessions, and a file system for the agent to run commands, read and write files.

Networking is crucial for directing traffic from the agent and browser to the specific ephemeral port where the agent runs, ensuring seamless operation and communication.

Persistence ensures that all files and changes made by the agent survive sessions, enabling continuity and quick resumption of tasks.

Micro VMs provide light overhead and fast startup times with hardware-level isolation, making them a preferred choice over traditional VMs or containers for efficient and secure sandboxing.

The preview URL acts as a server location that helps the agent and browser connect to the specific environment and port where the demo is running.

Sandbox environments address isolation, networking, persistence, and security challenges, allowing coding platforms to handle dynamically generated code securely and efficiently.

Isolation is critical to protect against potential attacks since code from unknown users is executed, necessitating a secure boundary for the agent's operations.

The durable object ensures globally unique addressability, enabling precise routing to specific sandboxes for executing tasks.

The text mentions exploring lightweight V8-based isolates for running code, providing similar flexibility as micro VM-based environments but with potentially less overhead.

best practices

Naresh Ramesh

20 min

26 Feb, 2026

Comments

Video Summary and Transcription

A demo of an animated goose following the cursor in a unique environment with real-time updates. Advanced models for long-running agents performing complex tasks. Exploring sandbox environments for agent coding, focusing on isolation, containers, networking, and persistence. Challenges in sandbox orchestration, including handling heavy processing, managing sessions, and security concerns. Importance of agent control, Micro VMs for isolation, and networking in sandbox environments. Strategies for efficient sandbox operations with pre-built images, persistent volumes, and warm pools. Utilizing sandbox primitives for simple usage with focus on isolation, networking, and persistence.

Available in Español: Diseñando Entornos de Desarrollo Aislados para Agentes de Codificación

1. Exploring a Demo Environment

Short description:

A demo of an animated goose following the cursor in a unique environment with a preview URL. The agent processes code changes to enable real-time updates. Models have advanced for long-running agents to perform complex tasks, requiring specific computer design.

I am a big fan of demos, so let's start with a demo. Here it is a very silly pixel animated goose. And I want to make this goose follow my cursor. So let's ask an agent to do that. What's happening behind the scenes here is there is an environment that has spun up. The dev server has started. We have a preview URL. The agent is going through the code base that I had. It's thinking about how to go about implementing this change. And once it reasons with that change, ideally it should be making that change, and we should be able to see it. There we go. This talk is about all the different things that have happened to make this demo work. The most important thing to identify here is that there's no local host here, right? So all of this, it's a real demo. It's running somewhere, and the agent and this preview URL that we use to look at this work had to find its way from the internet, from my browser, to a very specific port in a specific environment. And if I close this tab and come back tomorrow, ideally the agent should be able to pick up where it left off. That's what we will try to understand how it works. So this preview URL is a computer. Of course, it's a server, but a computer is a nicer way to say it, and I like that. And every agent needs a computer. I think we have come to a point where models have become good enough to a point where long-running agents are now real. They are able to do a lot of very powerful things, and the computer that we give it also needs to be designed in a very specific way for it to be useful for the agent.

2. Unraveling Sandbox Environments

Short description:

Discussing the layers of a sandbox environment for agent coding, focusing on isolation, container functions, networking, and persistence. Exploring design decisions for building and utilizing sandbox environments to create various platforms or products, emphasizing the tools agents require for complex operations.

I'm Naresh. In my past life, I worked on coding agents that twice dropped FreeBench, and now I build the sandbox environments that they run in. Let's actually talk about what are all the things that just happened in that demo that we just saw and try to break it down into different layers. The first thing is the isolation boundary itself. There's a micro VM, its own kernel, own network stack, there's no shared kernel attack surface with anything else running on the same machine. And then there is a container that's running inside it. So, you know, it's running Ubuntu, OS we are very familiar with. Installed, or rather, there's a bunch of dev tools that we commonly use, pre-installed.

Then there are batch sessions and a file system. This is where the agent runs commands. It reads and writes files, installs packages, all the things that you and I would do whenever we had to work on a code base like this. And then there is the networking bit. The agent probably ran NPM start inside the container. It's running on an ephemeral port, but getting the traffic from the agent and our browser to that very particular port is slightly non-obvious. And finally, there is persistence. So, all the files that are installed, all the changes made by the agent, they need to survive all of these sessions.

We will actually go through some of the design decisions that go into doing two things. One, to build a sandbox environment like this. And two, how to use a sandbox environment like this to build whatever platforms or products that you are interested in building. Let's start with the runtime. What does the agent actually get to work with? Agents ideally need the same tools that developers need. You could technically just give it a very small repel and say, okay, run the code that you need, but it's super restricted. Ideally, we want to give it a full Linux instance, a real computer, so that it can actually perform non-trivial complex operations.

3. Challenges in Sandbox Orchestration

Short description:

Discussing the challenges of sandbox orchestration, the need for agents to handle heavy lifting, starting background processes for real-time interactions, managing multiple sessions, and considering security issues like prompt injections.

But the challenge is, how much of the heavy lifting to do a lot of these orchestration can the sandbox itself do? Because the less the sandbox does, the more your agent has to. The table stakes are very simple, right? This is the baseline. Run some commands, read and write files, clone repos, install packages, every sandbox, well, every container has this, and it's necessary. But what actually separates CodeRunner from a real-time environment is the next layer. The agent needs to be able to start background processes. In this demo, the dev server was an example of a background process. It needs to keep working in the background, streaming so that the agent can react to output as it happens, not after the command exits.

Continuity, when we work on a terminal, a bash session, basically makes sure that all the commands that we run are sequentially happening, depending on how you use bash, really. But the agent should have the same kind of capability. And it's not just a single session. Ideally, it should be able to manage multiple sessions within the same sandbox. When you want to build interactive experiences, when you want to give your users access to the sandbox environment so that they can also maybe look at what the agent has been up to, or even directly run commands of their own. You ideally want to maybe connect an XGEM instance.

The important thing to remember is that you are running code that you didn't write and executing instructions from a user who you don't control and you don't want to control. So that means that the attack surface isn't just the code, it's everything the agent can reach. And the paradox is that we ideally we want the agent to be able to do a lot and reach a lot of surfaces. But it still needs to have some guardrails around it. So the common problems basically that you're thinking about is, well, what packages did then agent install? Was it prompted to install it? Are there any possibilities of prompt injection that could impact what the agent actually does inside this environment?

4. Agent Control and Isolation Methods

Short description:

Discussing the importance of agent control, isolation methods like Micro VMs for sandbox environments, and networking considerations for sandbox accessibility.

And the paradox is that we ideally we want the agent to be able to do a lot and reach a lot of surfaces. But it still needs to have some guardrails around it. So the common problems basically that you're thinking about is, well, what packages did then agent install? Was it prompted to install it? Are there any possibilities of prompt injection that could impact what the agent actually does inside this environment? You don't know what the user actually asked your agent to build, and every message actually shapes the decisions that your agent is actually taking. You don't review it, so you don't control it. And finally, you also don't have full control over what the agent reads before it writes the code. So, you know, it's reading files in a repo, web pages, a previous turn in the conversation, all of it shapes its output. And all of this actually increases the attack vectors. And this is the reason why isolation isn't optional. And the way we do it matters. So at this point, I think we're all familiar with the idea of, well, actually, let's just run it in a Docker container. Sure. But what's wrapping that container itself? What's the boundary? The simplest way is to just name spaces, shared kernel with the host. It's fast, lightweight, but every syscall from every container hits the same kernel, which is fine for your own code, but not for arbitrary agent-generated code. And the other end of the spectrum is full VMs, wherein you get genuine isolation. But it comes with 100 MB of overhead, five to 30 seconds, cold boot times, all of which is actually really impractical at scale. Micro VMs are actually the middle ground, and I actually don't like calling it a middle ground because they are actually better than both the different directions. They come with very, very light overhead. They are extremely fast to start. They have their own kernel per sandbox. I think the most common implementation is Firecracker, which was built by AWS for Lambda. And it is specifically designed for this hardware level isolation without the full VM overhead. This highlighted column is ideally what you want to use for your sandboxes. The third bit is networking. How does the agent and the browser find the right sandbox? Let's look at this URL. The subdomain in this URL actually contains a couple of things. It contains a port. It contains a unique session token. But there's here that's actually pointing to a specific machine, right? And that's intentional because you can't pre-provision DNS per sandbox. You can't just assign a stable IP. So every time when the request comes in, the machine needs to be resolved by the platform.

5. Sandbox Networking and Persistence

Short description:

Discussing the importance of resolving machines for requests, ensuring correct port connections, and the impact of persistence on cold starts.

So every time when the request comes in, the machine needs to be resolved by the platform. Extract the relevant information either directly from here or through an internal map. Do a fast lookup and open a TCP connection to it. All of this needs to happen behind the scenes.

And the second bit is even after the request reaches the right machine, you also want to make sure that it's connecting to the right port. And it's also established in such a way that, for instance, again, dev servers, they actually do hot reloading, which means there's a persistent WebSocket connection and changes are actually going back and forth all the time. This needs to just consistently work without any additional setup, right?

And the last bit is persistence. And I think this is actually the most important because persistence is not just about what you need to store and restore, but also the way you think about persistence impacts cold starts, right? The reason is, or rather, let's actually start with thinking about, well, what are the two things that dominate cold starts, right? One part of it is the infrastructure itself that needs to come up. How long does it take for the VM to exist? How long before you can actually connect to it? That's one part of it.

6. Efficient Sandbox Operation Strategies

Short description:

Discussing application readiness, optimizing with pre-built images, persistent volumes, snapshots, and warm pools for efficient sandbox operations and reduced startup times based on use cases.

How long does it take for the VM to exist? How long before you can actually connect to it? That's one part of it. And the second is actually the application readiness, right? So when a user comes in, and they're starting a fresh session or they're continuing from an older session, you want your application to reach the state wherein your agent can actually immediately continue working. This is actually the gap that the developer is going to feel because this is where things can actually become really slow. And this is the part where you need to think about, okay, how do I optimize this? There are a few ways. The simplest, easiest to win is pre-built images, right? So you can actually build an image, make sure that it has all the code bases, dependencies, everything that you need. You can also maybe keep rebuilding it in the background if things keep changing. And this is the image that actually puts up within the VM. So that's one thing. The second is if you actually have slightly more dynamic needs, maybe you can't always bake everything into an image, then you use persistent volumes. So you can actually mount these volumes into these different ephemeral containers. And that way, you know, all the data survives, restarts as well. And you could actually take these volumes and attach them to multiple sandboxes as well.

The third one, which is actually things starting to get very powerful, is snapshots. You could potentially take or rather, you know, if the platform supports it, you should actually be able to take the entire disk or even a part of that disk and snapshot it. And even, you know, snapshot the state of the memory itself. And what this allows you to do is you could actually pause a VM and just like resume it at the exact same point, and the agent can actually just continue working on it. Or you could also, you know, snapshot an instance and start multiple sandboxes. This could enable things like forking the agent state and getting agents to actually, you know, pursue different options. And finally, you could also have warm pools. So this way, you know, even before you're, if you're able to predict what is the state of your sandbox before the agent needs to start working on it, you can actually just maintain a pool and get it going. And all of them have, you know, fairly significant impact on the amount of time it takes before the agent can actually start working. These numbers are just illustrative, but you can imagine why at every single step you actually drastically reduce the amount of time it takes for your application to be ready. And these are things to think about. And, of course, the way you optimize also differs on, depends on the kind of use case that you're covering, right? So for instance, code interpreters are just embedded in chat UIs, and, you know, the model's generating some code. You immediately want a response. Maybe you want a warm pool. Coding agents, you know, your users expect that these agents will run for a longer time. So, you start thinking about it differently. RL training environments, you are mainly thinking about aggregate throughput and so on and so forth. So, different kinds of optimizations depending on the kind of use case that you need.

7. Sandbox Usage and Architecture

Short description:

Discussing sandbox primitives, the simplicity of sandbox usage with isolation, starting processes, networking, and persistence methods, and the architecture's focus on reaching specific sandboxes efficiently.

And these are all the primitives that platforms need to build so that you can leverage it however you actually want. So, when all of this is actually done properly, you know, ideally you should just be able to say, okay, I want to just resume an older instance of something that I was working on, and here we go. The same state that we were actually in, we are able to get back to. So, let's look at under the hood and try to understand, well, how does this look from your point of view of you actually using some sandbox? So, it's actually this whole demo that we actually just looked at. In its simplest form, it's actually just, you know, eight lines of code. Let's break it down a bit.

The first one is isolation. So, every session, it gets its own VM. You can think about, you know, separate sandboxes per user, per org. There's all sorts of things that you could actually do. And then there are all the different methods that actually allows you to start processes, wait for a particular process to be ready, read and write files, you know, whenever you actually want to. Then the networking bit, which we spoke about, all the things that we addressed earlier in the networking section can actually just be one line. You get back a URL, and that's all you need to think about. And finally, you want to do some persistence. In this demo, frankly, all I'm doing is just storing my code in a bucket, but, you know, you could also opt into some of the more advanced strategies depending on the use cases like we discussed.

So, this is what, at least from a Cloudflare context where I work, the way this whole architecture looks like can be thought about as, you know, requests from the agent and the browser is reaching the edge, the worker. And from there, there's a durable object, which basically, the most important thing that it offers in this context is it's globally uniquely addressable, which is something that we wanted. We want to reach a very specific sandbox, right? So, the durable object makes sure that you're able to do the routing, and then you are able to reach the particular sandbox, and within the sandbox, you do all the things that you want to do.

Available in other languages:

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Don't Solve Problems, Eliminate Them

React Advanced 2021

39 min

Don't Solve Problems, Eliminate Them

Top Content

Kent C. Dodds

Creator of EpicWeb.dev, EpicReact.Dev, TestingJavaScript.com

Kent C. Dodds discusses the concept of problem elimination rather than just problem-solving. He introduces the idea of a problem tree and the importance of avoiding creating solutions prematurely. Kent uses examples like Tesla's electric engine and Remix framework to illustrate the benefits of problem elimination. He emphasizes the value of trade-offs and taking the easier path, as well as the need to constantly re-evaluate and change approaches to eliminate problems.

remix web development best practices

Using useEffect Effectively

React Advanced 2022

30 min

Using useEffect Effectively

Top Content

David Khourshid

Stately AI

Today's Talk explores the use of the useEffect hook in React development, covering topics such as fetching data, handling race conditions and cleanup, and optimizing performance. It also discusses the correct use of useEffect in React 18, the distinction between Activity Effects and Action Effects, and the potential misuse of useEffect. The Talk highlights the benefits of using useQuery or SWR for data fetching, the problems with using useEffect for initializing global singletons, and the use of state machines for handling effects. The speaker also recommends exploring the beta React docs and using tools like the stately.ai editor for visualizing state machines.

react state management react hook missing dependency best practices

Design Systems: Walking the Line Between Flexibility and Consistency

React Advanced 2021

47 min

Design Systems: Walking the Line Between Flexibility and Consistency

Top Content

Siddharth Kshetrapal

GitHub

The Talk discusses the balance between flexibility and consistency in design systems. It explores the API design of the ActionList component and the customization options it offers. The use of component-based APIs and composability is emphasized for flexibility and customization. The Talk also touches on the ActionMenu component and the concept of building for people. The Q&A session covers topics such as component inclusion in design systems, API complexity, and the decision between creating a custom design system or using a component library.

component library design systems best practices

React Concurrency, Explained

React Summit 2023

23 min

React Concurrency, Explained

Top Content

Ivan Akulov

Google Developer Expert, Web Performance Consultant, Netherlands

React 18's concurrent rendering, specifically the useTransition hook, optimizes app performance by allowing non-urgent updates to be processed without freezing the UI. However, there are drawbacks such as longer processing time for non-urgent updates and increased CPU usage. The useTransition hook works similarly to throttling or bouncing, making it useful for addressing performance issues caused by multiple small components. Libraries like React Query may require the use of alternative APIs to handle urgent and non-urgent updates effectively.

performance react 18 react react concurrent mode deep dive best practices

Managing React State: 10 Years of Lessons Learned

React Day Berlin 2023

16 min

Managing React State: 10 Years of Lessons Learned

Top Content

Cory House

Expert with 20 Years of Experience, 10 Pluralsight Courses Author

This Talk focuses on effective React state management and lessons learned over the past 10 years. Key points include separating related state, utilizing UseReducer for protecting state and updating multiple pieces of state simultaneously, avoiding unnecessary state syncing with useEffect, using abstractions like React Query or SWR for fetching data, simplifying state management with custom hooks, and leveraging refs and third-party libraries for managing state. Additional resources and services are also provided for further learning and support.

react query react indexeddb react react componentdidmount react state management react performance react swr rtk query vs react query react usetransition best practices

TypeScript and React: Secrets of a Happy Marriage

React Advanced 2022

21 min

TypeScript and React: Secrets of a Happy Marriage

Top Content

Matt Pocock

Independent

React and TypeScript have a strong relationship, with TypeScript offering benefits like better type checking and contract enforcement. Failing early and failing hard is important in software development to catch errors and debug effectively. TypeScript provides early detection of errors and ensures data accuracy in components and hooks. It offers superior type safety but can become complex as the codebase grows. Using union types in props can resolve errors and address dependencies. Dynamic communication and type contracts can be achieved through generics. Understanding React's built-in types and hooks like useState and useRef is crucial for leveraging their functionality.

react typescript best practices

Workshops on related topic

React Performance Debugging Masterclass

React Summit 2023

170 min

React Performance Debugging Masterclass

Top Content

Featured Workshop

Ivan Akulov

Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)

performance react profiler react advanced react debugger react performance best practices debug

React Hooks Tips Only the Pros Know

React Summit Remote Edition 2021

177 min

React Hooks Tips Only the Pros Know

Top Content

Featured Workshop

Maurice de Beijer

The addition of the hooks API to React was quite a major change. Before hooks most components had to be class based. Now, with hooks, these are often much simpler functional components. Hooks can be really simple to use. Almost deceptively simple. Because there are still plenty of ways you can mess up with hooks. And it often turns out there are many ways where you can improve your components a better understanding of how each React hook can be used.You will learn all about the pros and cons of the various hooks. You will learn when to use useState() versus useReducer(). We will look at using useContext() efficiently. You will see when to use useLayoutEffect() and when useEffect() is better.

react profiler react hooks react react 18 hooks deep dive best practices

React, TypeScript, and TDD

React Advanced 2021

174 min

React, TypeScript, and TDD

Top Content

Featured Workshop

Paul Everitt

ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.

web development react typescript devtools test driven development react best practices

Master JavaScript Patterns

JSNation 2024

145 min

Master JavaScript Patterns

Top Content

Featured Workshop

Adrian Hajdin

During this workshop, participants will review the essential JavaScript patterns that every developer should know. Through hands-on exercises, real-world examples, and interactive discussions, attendees will deepen their understanding of best practices for organizing code, solving common challenges, and designing scalable architectures. By the end of the workshop, participants will gain newfound confidence in their ability to write high-quality JavaScript code that stands the test of time.
Points Covered:
1. Introduction to JavaScript Patterns2. Foundational Patterns3. Object Creation Patterns4. Behavioral Patterns5. Architectural Patterns6. Hands-On Exercises and Case Studies
How It Will Help Developers:
- Gain a deep understanding of JavaScript patterns and their applications in real-world scenarios- Learn best practices for organizing code, solving common challenges, and designing scalable architectures- Enhance problem-solving skills and code readability- Improve collaboration and communication within development teams- Accelerate career growth and opportunities for advancement in the software industry

patterns javascript best practices

Designing Effective Tests With React Testing Library

React Summit 2023

151 min

Designing Effective Tests With React Testing Library

Top Content

Featured Workshop

Josh Justice

React Testing Library is a great framework for React component tests because there are a lot of questions it answers for you, so you don’t need to worry about those questions. But that doesn’t mean testing is easy. There are still a lot of questions you have to figure out for yourself: How many component tests should you write vs end-to-end tests or lower-level unit tests? How can you test a certain line of code that is tricky to test? And what in the world are you supposed to do about that persistent act() warning?
In this three-hour workshop we’ll introduce React Testing Library along with a mental model for how to think about designing your component tests. This mental model will help you see how to test each bit of logic, whether or not to mock dependencies, and will help improve the design of your components. You’ll walk away with the tools, techniques, and principles you need to implement low-cost, high-value component tests.
Table of contents- The different kinds of React application tests, and where component tests fit in- A mental model for thinking about the inputs and outputs of the components you test- Options for selecting DOM elements to verify and interact with them- The value of mocks and why they shouldn’t be avoided- The challenges with asynchrony in RTL tests and how to handle them
Prerequisites- Familiarity with building applications with React- Basic experience writing automated tests with Jest or another unit testing framework- You do not need any experience with React Testing Library- Machine setup: Node LTS, Yarn

react react testing react testing library test driven development react deep dive testing best practices

Next.js 13: Data Fetching Strategies

React Day Berlin 2022

53 min

Next.js 13: Data Fetching Strategies

Top Content

Workshop

Alice De Mauro

- Introduction- Prerequisites for the workshop- Fetching strategies: fundamentals- Fetching strategies – hands-on: fetch API, cache (static VS dynamic), revalidate, suspense (parallel data fetching)- Test your build and serve it on Vercel- Future: Server components VS Client components- Workshop easter egg (unrelated to the topic, calling out accessibility)- Wrapping up

performance next.js react server components best practices