Let's Talk About Sandboxes

Yingjun Wu

Starting around late 2025, people in the infra community began talking about sandbox technology. The general idea goes like this: in the age of agents, every agent should have its own computer, just like a person does. That computer is the sandbox.

If you're not familiar with sandbox technology, you can think of it simply as a virtual machine. Funny enough, nearly 15 years ago when I was interning at EMC, I was working on something closely related to today's sandboxes - Xen. Over the last two decades, we built sandboxes for humans. Now, we're building sandboxes for AI. That feels pretty surreal. But if virtualization has been around for decades, why is everyone suddenly revisiting what seems like an extremely mature technology? Well, because the requirements are completely different now.

In this post, we'll start from real-world sandbox use cases, explore the underlying technology, and discuss what the future of sandboxes looks like.

Starting with Manus

One of the biggest stories in AI in 2025 was Meta's acquisition of AI startup Manus for over $2 billion. The name Manus comes from the Latin motto "Mens et Manus" "mind and hand" embodying the belief that knowledge must be put into action to make an impact. From 2022 to 2024, everyone was using ChatGPT to chat. Manus showed the world that AI can do more than just provide information it can actually get things done. Once AI can do work, it's no longer just a chatbot. It's an agent. So here's the question: what does an agent have to do with a sandbox? Can't the agent just use the user's computer or a regular server?

Let's take Manus as an example. Manus built its agents in the cloud. Suppose Manus is serving 1,000 concurrent users (in reality, far more!). Each user's request spawns an agent, meaning 1,000 agents need to run simultaneously. Now imagine all 1,000 agents on a single server. What happens? Each agent wants to install its own packages, configure its own environment, have its own filesystem, and claim its own CPU resources. So what do you do? Obviously, each agent needs its own office to work in. That office is the sandbox.

Under the hood, Manus used microVM-based sandbox technology to solve exactly this problem. Each agent runs in an isolated sandbox with its own browser, terminal, and filesystem, capable of running Python, JavaScript, Bash, and more in isolation. By the time of its acquisition, Manus had created over 80 million virtual computers each one a standalone sandbox.

Of course, there's only one Manus. But do we, as individual developers, still need sandboxes in our daily work with agents? Absolutely. If you're a programmer, chances are you've used Claude Code. Claude Code is a coding agent that downloads packages, runs scripts, and reads/writes files on your local machine. We trust Anthropic's engineering and credibility, so we're comfortable letting their agent operate on our computers. Claude Code also promises to mostly stay within a user-specified directory, which gives a sense of security. But when we use something more powerful, more permission-hungry like OpenClaw (well, congrats Peter for joining OpenAI!), or some brand-new no-name product would you still dare run it directly on your local machine? I doubt.

OpenClaw is a great example. This open-source AI agent can run shell commands directly on your OS, control browsers, manage local files, can talk with you via Telegram. It's so powerful that people call it "a real-life Jarvis." But power comes with risk the security teams have found a couple of third-party OpenClaw skills performing data exfiltration and prompt injection without user awareness. This is where sandboxes become a necessity.

Sandbox Technologies

Broadly speaking, a sandbox is just an isolated environment. An EC2 instance can be a sandbox. A Docker container can be a sandbox. A microVM can be a sandbox. But they behave very differently when it comes to AI agent workloads. Let's start with a high-level comparison:

Technology Comparison

	EC2	K8s + Containers	Docker	gVisor	MicroVM
Startup time	Minutes	Seconds	Seconds	Seconds	<200ms
Isolation	Full VM	Shared kernel	Shared kernel	User-space kernel	Dedicated kernel
Resource overhead	Very high	Medium	Low	Low	Very low (~5MB)
Elasticity	Poor	Good	Manual	Depends on orchestrator	Good
Ops complexity	Low (but expensive)	Very high	Low	Medium	Medium
Fit for AI agents	Too slow, too heavy	Works but complex	Security insufficient	Compatibility issues	Best balance

Let's walk through each one.

EC2 / Traditional VMs

EC2 is too heavy. Slow to start (minutes), expensive, and lacking good auto-scaling. If you need to instantly spin up a fleet of agents for parallel deep research, EC2 simply can't deliver. In short, EC2 was designed for long-running services, not for fast-start, disposable agent sandboxes.

K8s + Containers

K8s solves the elasticity problem but introduces operational complexity. And its default container isolation shares the host kernel if AI-generated code triggers a kernel vulnerability, it could escape the container. Google recently open-sourced the Agent Sandbox project (using gVisor), which is K8s's formal response to the sandbox use case.

Docker Containers

Fast to start, mature ecosystem, developer-friendly. Docker has even launched a dedicated Docker Sandboxes product for AI agents. But the core problem remains: containers share the host kernel, so isolation isn't strong enough.

gVisor

A user-space kernel developed by Google. Its core component, Sentry, reimplements a large number of Linux syscalls in Go, intercepting and handling them in user space. This dramatically reduces the attack surface. More secure than containers, lighter than microVMs, but with compatibility trade-offs not all syscalls are perfectly emulated. I/O-heavy workloads can see 10-30% overhead.

MicroVM

The current best answer for sandbox isolation. AWS's open-source Firecracker gives each microVM its own dedicated kernel with hardware-level isolation via KVM. Extremely lean: <5MB memory overhead, <125ms startup, up to 150 microVMs per second on a single host. Most sandbox platforms on the market (such as E2B, Daytona, etc.) are built on Firecracker or similar microVM technology.

Takeaway: From an isolation standpoint, microVMs have provided a great answer security approaching traditional VMs, performance approaching containers. But choosing the right isolation technology is only step one. When you actually start using sandboxes, you'll find a deeper set of questions to answer.

Beyond Isolation: What Actually Matters

The isolation question is largely settled microVMs win. But the differences between sandbox solutions today are no longer about how they isolate. They're about everything else: how you integrate them, whether they keep state, and what primitives they expose. Four questions separate a great sandbox from a merely functional one:

Property	What it means
Embeddable	A library, not a service. No daemon, no root, no cloud account.
Stateful	Persistent sessions pause today, resume tomorrow. Your agent's environment survives across sessions.
Snapshots	Capture a running VM's full state. Fork, rollback, migrate, share. Version control for runtime environments.
Hardware isolation	Dedicated kernel per sandbox, not namespace tricks. Table stakes but not everyone delivers it.

Embeddable: A library, not a service

Most sandbox solutions assume you're running a separate service. Cloud solutions require calling a remote API, signing up for an account, and managing API keys. Docker requires a background daemon and typically root privileges. K8s, well, don't even get me started.

For a use case as simple as "I just want to safely run a piece of AI-generated code in my Python script," these are all way too heavy.

This reminds me of an analogy from the database world. Before SQLite, if you wanted to use a database in your application, you had to install MySQL or PostgreSQL, configure connections, and manage a separate database process. SQLite's revolution was this: it's an embedded database no separate process, no network configuration, just a library linked into your application. This made databases ubiquitous from mobile apps to browsers to embedded devices.

The sandbox space needs the same "SQLite moment." The ideal sandbox should be a library you embed into your application not a service you deploy alongside it.

Stateful: More than disposable

Most sandbox solutions today lean toward transient design. Disposable sandboxes are great for one-shot tasks: a user sends a request, the agent spins up a sandbox, executes, and destroys it. Clean and simple.

But as agents grow more capable, more and more use cases demand stateful sandboxes. Imagine a long-running coding agent: yesterday it installed a bunch of dependencies, configured the dev environment, and got the tests passing. You come back today to continue working of course you want all of that state to still be there. Rebuilding from scratch every time is a terrible experience.

Use case	Transient sandbox	Stateful sandbox
One-shot code execution	Perfect	Not needed
Coding agent, ongoing development	Rebuild env every time	State persists across sessions
Personal agent, daily assistant	Loses accumulated context	Memory and tools always available
Deep research, parallel exploration	Fork multiple sandboxes	Branch from checkpoints

A truly great sandbox should support persistent sessions: assign a sandbox an ID, pause it today, resume it tomorrow, all state intact. Just like closing your laptop lid and opening it the next day everything is still there.

Snapshots: Version control for runtime environments

This is where things get really interesting.

A Docker image is a filesystem snapshot it records what's installed. But a VM snapshot captures a running VM's full state disk, running processes, open network connections, everything. When you restore a snapshot, you're not cold-starting an environment; you're resuming a live one. The browser is already open. The server is already listening. The dev environment is already warm.

This single primitive capturing and restoring live VM state unlocks a surprising number of capabilities:

Templates Not just "Python is installed," but "Python is running, the browser is open, the MCP server is listening." Users restore a template and are instantly in a working state, skipping all startup overhead.

Fork / Branch From a single snapshot, fork multiple sandboxes to explore different directions in parallel. A deep research agent hits a decision point? Snapshot, fork into two sandboxes, try both paths simultaneously. An eval harness? Fork 100 sandboxes from the same checkpoint, run 100 strategies.

Rollback Agent broke something? Instantly revert to a previous checkpoint. Like git checkout, but for the entire running environment.

Migration Move a running sandbox between machines. From your desktop to your laptop, or from local to cloud for a burst of heavy computation.

Crash recovery Long-running agents periodically snapshot. If they crash, they don't restart from zero they resume from the last checkpoint.

Docker images can't do any of this. They're static filesystem layers. A live VM snapshot is fundamentally more powerful it's version control for runtime environments.

Hardware isolation: The baseline

This one is table stakes. Every sandbox needs it, but it's not what differentiates products anymore.

Docker containers share the host kernel a kernel vulnerability can lead to container escape. You have no control over what code an AI agent generates. An innocent-looking script could trigger a kernel bug and break out of the container.

A microVM gives each sandbox its own kernel. The boundary is hardware virtualization, not namespaces. Even if the code inside exploits a kernel bug, it's exploiting the guest kernel, not yours. The blast radius is contained.

This is a necessary foundation but most serious sandbox providers already deliver it. The real differentiation lies in the three properties above.

Comparing Existing Solutions

Let's stack these four properties against the current landscape:

	Embeddable	Stateful	Snapshots	Hardware isolation
E2 B	No Cloud API	24h session limit	Pause/resume only	Yes microVM
Modal	No Cloud API	No Transient	No	Partial gVisor
D a yton a	No Cloud API	Yes Stateful	No	Partial Docker by default
Docker Sa ndbox	Partial Requires daemon + root	Yes Persistent	No	Partial microVM
Self-ho sted K8s	No Very heavy	Yes Varies	No	Partial Varies

As you can see, no existing solution checks all four boxes. That's a gap waiting to be filled.

BoxLite: The SQLite for Sandboxes

This brings me to work on a project I think fills this gap: BoxLite.

BoxLite's positioning is clear be the SQLite of sandboxes. It's an embeddable micro-VM runtime written in Rust:

Here's what the core features look like in practice:

Embeddable. No daemon, no root, no Docker, no cloud account. pip install boxlite, write three lines of code, and you have a hardware-isolated sandbox running inside your application. It runs on your machine no network latency, works offline, no usage fees.

import boxlite

async with boxlite.SimpleBox(image="python:slim") as box:
    result = await box.exec("python", "-c", "print('Hello from sandbox!')")
    print(result.stdout)

Stateful. Assign a session_id to a sandbox, resume it later with the same ID. All files, installed packages, and environment configurations are exactly as you left them. Since sandboxes run on your own machine, pausing them costs nothing no cloud bills for idle time, no 24-hour expiration clocks.

Snapshots. Capture a running Box's full state and restore it instantly. Build a template where the browser, dev server, and MCP tools are already running; fork a sandbox at a decision point to explore two paths in parallel; rollback when an agent breaks something; migrate a running environment between machines. This is version control for runtime environments something Docker images fundamentally cannot do.

Hardware isolation. Every Box is a microVM with its own dedicated kernel hardware-level virtualization via KVM on Linux and Hypervisor.framework on macOS. Fully OCI-compatible, so any Docker Hub image works out of the box. Would you let a random AI agent run rm -rf / on your machine? With BoxLite, you can it'll only destroy the sandbox.

The BoxLite team has also built some higher-level projects: ClaudeBox uses BoxLite to isolate Claude Code execution, and boxlite-mcp provides an MCP server that integrates directly with Claude Desktop, letting AI agents operate browsers and run commands in an isolated desktop environment.

Conclusion

Back to the question we started with: why is a technology that's been around for decades suddenly being revisited in the age of AI?

Virtualization 15 years ago	Sandboxes today
Built for humans	Built for AI agents
Startup: minutes	Startup: milliseconds
Lifespan: months / years	Lifespan: seconds / hours
Concurrency: dozens	Concurrency: thousands
Deployment: managed by ops teams	Deployment: pip install
Core need: resource utilization	Core need: security + speed + DX

The requirements have changed, but the core need "provide an isolated, secure space for computation" has never changed.

The way I see it, the future of sandboxes will be layered. Cloud-based managed sandboxes will serve SaaS products and high-concurrency scenarios. But for the broader developer community those writing code locally, running agents locally, building applications locally sandboxes need to be as lightweight, embeddable, and local-first as SQLite. BoxLite is doing exactly that.

Giving every developer and every agent an isolated, secure environment at their fingertips that's probably the most important thing happening in the sandbox space right now.