Let's Talk About Sandboxes
13 min read

Starting around late 2025, people in the infra community began talking about sandbox technology. The general idea goes like this: in the age of agents, every agent should have its own computer, just like a person does. That computer is the sandbox.
If you're not familiar with sandbox technology, you can think of it simply as a virtual machine. Funny enough, nearly 15 years ago when I was interning at EMC, I was working on something closely related to today's sandboxes - Xen. Over the last two decades, we built sandboxes for humans. Now, we're building sandboxes for AI. That feels pretty surreal. But if virtualization has been around for decades, why is everyone suddenly revisiting what seems like an extremely mature technology? Well, because the requirements are completely different now.
In this post, we'll start from real-world sandbox use cases, explore the underlying technology, and discuss what the future of sandboxes looks like.
Starting with Manus
One of the biggest stories in AI in 2025 was Meta's acquisition of AI startup Manus for over $2 billion. The name Manus comes from the Latin motto "Mens et Manus" — "mind and hand" — embodying the belief that knowledge must be put into action to make an impact. From 2022 to 2024, everyone was using ChatGPT to chat. Manus showed the world that AI can do more than just provide information — it can actually get things done. Once AI can do work, it's no longer just a chatbot. It's an agent. So here's the question: what does an agent have to do with a sandbox? Can't the agent just use the user's computer or a regular server?
Let's take Manus as an example. Manus built its agents in the cloud. Suppose Manus is serving 1,000 concurrent users (in reality, far more!). Each user's request spawns an agent, meaning 1,000 agents need to run simultaneously. Now imagine all 1,000 agents on a single server. What happens? Each agent wants to install its own packages, configure its own environment, have its own filesystem, and claim its own CPU resources. So what do you do? Obviously, each agent needs its own office to work in. That office is the sandbox.

Under the hood, Manus used microVM-based sandbox technology to solve exactly this problem. Each agent runs in an isolated sandbox with its own browser, terminal, and filesystem, capable of running Python, JavaScript, Bash, and more in isolation. By the time of its acquisition, Manus had created over 80 million virtual computers — each one a standalone sandbox.
Of course, there's only one Manus. But do we, as individual developers, still need sandboxes in our daily work with agents? Absolutely. If you're a programmer, chances are you've used Claude Code. Claude Code is a coding agent that downloads packages, runs scripts, and reads/writes files on your local machine. We trust Anthropic's engineering and credibility, so we're comfortable letting their agent operate on our computers. Claude Code also promises to mostly stay within a user-specified directory, which gives a sense of security. But when we use something more powerful, more permission-hungry — like OpenClaw (well, congrats Peter for joining OpenAI!), or some brand-new no-name product — would you still dare run it directly on your local machine? I doubt.
OpenClaw is a great example. This open-source AI agent can run shell commands directly on your OS, control browsers, manage local files, can talk with you via Telegram. It's so powerful that people call it "a real-life Jarvis." But power comes with risk — the security teams have found a couple of third-party OpenClaw skills performing data exfiltration and prompt injection without user awareness. This is where sandboxes become a necessity.
Sandbox Technologies
Broadly speaking, a sandbox is just an isolated environment. An EC2 instance can be a sandbox. A Docker container can be a sandbox. A microVM can be a sandbox. But they behave very differently when it comes to AI agent workloads. Let's start with a high-level comparison:
Technology Comparison
| EC2 | K8s + Containers | Docker | gVisor | MicroVM | |
| Startup time | Minutes | Seconds | Seconds | Seconds | <200ms |
| Isolation | Full VM | Shared kernel | Shared kernel | User-space kernel | Dedicated kernel |
| Resource overhead | Very high | Medium | Low | Low | Very low (~5MB) |
| Elasticity | Poor | Good | Manual | Depends on orchestrator | Good |
| Ops complexity | Low (but expensive) | Very high | Low | Medium | Medium |
| Fit for AI agents | Too slow, too heavy | Works but complex | Security insufficient | Compatibility issues | Best balance |
Let's walk through each one.
EC2 / Traditional VMs
EC2 is too heavy. Slow to start (minutes), expensive, and lacking good auto-scaling. If you need to instantly spin up a fleet of agents for parallel deep research, EC2 simply can't deliver. In short, EC2 was designed for long-running services, not for fast-start, disposable agent sandboxes.
K8s + Containers
K8s solves the elasticity problem but introduces operational complexity. And its default container isolation shares the host kernel — if AI-generated code triggers a kernel vulnerability, it could escape the container. Google recently open-sourced the Agent Sandbox project (using gVisor), which is K8s's formal response to the sandbox use case.
Docker Containers
Fast to start, mature ecosystem, developer-friendly. Docker has even launched a dedicated Docker Sandboxes product for AI agents. But the core problem remains: containers share the host kernel, so isolation isn't strong enough.
gVisor
A user-space kernel developed by Google. Its core component, Sentry, reimplements a large number of Linux syscalls in Go, intercepting and handling them in user space. This dramatically reduces the attack surface. More secure than containers, lighter than microVMs, but with compatibility trade-offs — not all syscalls are perfectly emulated. I/O-heavy workloads can see 10-30% overhead.
MicroVM
The current best answer for sandbox isolation. AWS's open-source Firecracker gives each microVM its own dedicated kernel with hardware-level isolation via KVM. Extremely lean: <5MB memory overhead, <125ms startup, up to 150 microVMs per second on a single host. Most sandbox platforms on the market (such as E2B, Daytona, etc.) are built on Firecracker or similar microVM technology.
Takeaway: From an isolation standpoint, microVMs have provided a great answer — security approaching traditional VMs, performance approaching containers. But choosing the right isolation technology is only step one. When you actually start using sandboxes, you'll find a deeper set of questions to answer.
Beyond Isolation: What Actually Matters
The isolation question is largely settled — microVMs win. But the differences between sandbox solutions today are no longer about how they isolate. They're about everything else: how you integrate them, whether they keep state, and what primitives they expose. Four questions separate a great sandbox from a merely functional one:
| Property | What it means |
| Embeddable | A library, not a service. No daemon, no root, no cloud account. |
| Stateful | Persistent sessions — pause today, resume tomorrow. Your agent's environment survives across sessions. |
| Snapshots | Capture a running VM's full state. Fork, rollback, migrate, share. Version control for runtime environments. |
| Hardware isolation | Dedicated kernel per sandbox, not namespace tricks. Table stakes — but not everyone delivers it. |
Embeddable: A library, not a service
Most sandbox solutions assume you're running a separate service. Cloud solutions require calling a remote API, signing up for an account, and managing API keys. Docker requires a background daemon and typically root privileges. K8s, well, don't even get me started.
For a use case as simple as "I just want to safely run a piece of AI-generated code in my Python script," these are all way too heavy.
This reminds me of an analogy from the database world. Before SQLite, if you wanted to use a database in your application, you had to install MySQL or PostgreSQL, configure connections, and manage a separate database process. SQLite's revolution was this: it's an embedded database — no separate process, no network configuration, just a library linked into your application. This made databases ubiquitous — from mobile apps to browsers to embedded devices.
The sandbox space needs the same "SQLite moment." The ideal sandbox should be a library you embed into your application — not a service you deploy alongside it.
Stateful: More than disposable
Most sandbox solutions today lean toward transient design. Disposable sandboxes are great for one-shot tasks: a user sends a request, the agent spins up a sandbox, executes, and destroys it. Clean and simple.
But as agents grow more capable, more and more use cases demand stateful sandboxes. Imagine a long-running coding agent: yesterday it installed a bunch of dependencies, configured the dev environment, and got the tests passing. You come back today to continue working — of course you want all of that state to still be there. Rebuilding from scratch every time is a terrible experience.
| Use case | Transient sandbox | Stateful sandbox |
| One-shot code execution | Perfect | Not needed |
| Coding agent, ongoing development | Rebuild env every time | State persists across sessions |
| Personal agent, daily assistant | Loses accumulated context | Memory and tools always available |
| Deep research, parallel exploration | Fork multiple sandboxes | Branch from checkpoints |
A truly great sandbox should support persistent sessions: assign a sandbox an ID, pause it today, resume it tomorrow, all state intact. Just like closing your laptop lid and opening it the next day — everything is still there.
Snapshots: Version control for runtime environments
This is where things get really interesting.
A Docker image is a filesystem snapshot — it records what's installed. But a VM snapshot captures a running VM's full state — disk, running processes, open network connections, everything. When you restore a snapshot, you're not cold-starting an environment; you're resuming a live one. The browser is already open. The server is already listening. The dev environment is already warm.
This single primitive — capturing and restoring live VM state — unlocks a surprising number of capabilities:

Templates — Not just "Python is installed," but "Python is running, the browser is open, the MCP server is listening." Users restore a template and are instantly in a working state, skipping all startup overhead.
Fork / Branch — From a single snapshot, fork multiple sandboxes to explore different directions in parallel. A deep research agent hits a decision point? Snapshot, fork into two sandboxes, try both paths simultaneously. An eval harness? Fork 100 sandboxes from the same checkpoint, run 100 strategies.
Rollback — Agent broke something? Instantly revert to a previous checkpoint. Like git checkout, but for the entire running environment.
Migration — Move a running sandbox between machines. From your desktop to your laptop, or from local to cloud for a burst of heavy computation.
Crash recovery — Long-running agents periodically snapshot. If they crash, they don't restart from zero — they resume from the last checkpoint.
Docker images can't do any of this. They're static filesystem layers. A live VM snapshot is fundamentally more powerful — it's version control for runtime environments.
Hardware isolation: The baseline
This one is table stakes. Every sandbox needs it, but it's not what differentiates products anymore.
Docker containers share the host kernel — a kernel vulnerability can lead to container escape. You have no control over what code an AI agent generates. An innocent-looking script could trigger a kernel bug and break out of the container.

A microVM gives each sandbox its own kernel. The boundary is hardware virtualization, not namespaces. Even if the code inside exploits a kernel bug, it's exploiting the guest kernel, not yours. The blast radius is contained.
This is a necessary foundation — but most serious sandbox providers already deliver it. The real differentiation lies in the three properties above.
Comparing Existing Solutions
Let's stack these four properties against the current landscape:
| Embeddable | Stateful | Snapshots | Hardware isolation | |
| E2B | No — Cloud API | 24h session limit | Pause/resume only | Yes — microVM |
| Modal | No — Cloud API | No — Transient | No | Partial — gVisor |
| Daytona | No — Cloud API | Yes — Stateful | No | Partial — Docker by default |
| Docker Sandbox | Partial — Requires daemon + root | Yes — Persistent | No | Partial — microVM |
| Self-hosted K8s | No — Very heavy | Yes — Varies | No | Partial — Varies |
As you can see, no existing solution checks all four boxes. That's a gap waiting to be filled.
BoxLite: The SQLite for Sandboxes
This brings me to work on a project I think fills this gap: BoxLite.
BoxLite's positioning is clear — be the SQLite of sandboxes. It's an embeddable micro-VM runtime written in Rust:

Here's what the core features look like in practice:
Embeddable. No daemon, no root, no Docker, no cloud account. pip install boxlite, write three lines of code, and you have a hardware-isolated sandbox running inside your application. It runs on your machine — no network latency, works offline, no usage fees.
import boxlite
async with boxlite.SimpleBox(image="python:slim") as box:
result = await box.exec("python", "-c", "print('Hello from sandbox!')")
print(result.stdout)
Stateful. Assign a session_id to a sandbox, resume it later with the same ID. All files, installed packages, and environment configurations are exactly as you left them. Since sandboxes run on your own machine, pausing them costs nothing — no cloud bills for idle time, no 24-hour expiration clocks.
Snapshots. Capture a running Box's full state and restore it instantly. Build a template where the browser, dev server, and MCP tools are already running; fork a sandbox at a decision point to explore two paths in parallel; rollback when an agent breaks something; migrate a running environment between machines. This is version control for runtime environments — something Docker images fundamentally cannot do.
Hardware isolation. Every Box is a microVM with its own dedicated kernel — hardware-level virtualization via KVM on Linux and Hypervisor.framework on macOS. Fully OCI-compatible, so any Docker Hub image works out of the box. Would you let a random AI agent run rm -rf / on your machine? With BoxLite, you can — it'll only destroy the sandbox.
The BoxLite team has also built some higher-level projects: ClaudeBox uses BoxLite to isolate Claude Code execution, and boxlite-mcp provides an MCP server that integrates directly with Claude Desktop, letting AI agents operate browsers and run commands in an isolated desktop environment.
Conclusion
Back to the question we started with: why is a technology that's been around for decades suddenly being revisited in the age of AI?
| Virtualization 15 years ago | Sandboxes today |
| Built for humans | Built for AI agents |
| Startup: minutes | Startup: milliseconds |
| Lifespan: months / years | Lifespan: seconds / hours |
| Concurrency: dozens | Concurrency: thousands |
| Deployment: managed by ops teams | Deployment: pip install |
| Core need: resource utilization | Core need: security + speed + DX |
The requirements have changed, but the core need — "provide an isolated, secure space for computation" — has never changed.
The way I see it, the future of sandboxes will be layered. Cloud-based managed sandboxes will serve SaaS products and high-concurrency scenarios. But for the broader developer community — those writing code locally, running agents locally, building applications locally — sandboxes need to be as lightweight, embeddable, and local-first as SQLite. BoxLite is doing exactly that.
Giving every developer and every agent an isolated, secure environment at their fingertips — that's probably the most important thing happening in the sandbox space right now.