<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[BoxLite]]></title><description><![CDATA[BoxLite]]></description><link>https://boxlite.hashnode.dev</link><generator>RSS for Node</generator><lastBuildDate>Mon, 11 May 2026 11:05:50 GMT</lastBuildDate><atom:link href="https://boxlite.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><atom:link rel="first" href="https://boxlite.hashnode.dev/rss.xml"/><item><title><![CDATA[From Stateful Stream Processing to Stateful Sandbox]]></title><description><![CDATA[<p>Five years ago, we started building <a target="_blank" href="https://github.com/risingwavelabs/risingwave">RisingWave</a>, an open-source streaming database. There was no AI writing code back then  every line was written by hand.</p>
<p>Technically, we were tackling one of the hardest problems in the database world: <strong>stateful computation</strong>  keeping a system running continuously while performing complex stateful operations, sustaining high throughput, absorbing workload spikes, recovering in seconds from node failures, and maintaining consistency across all state. The system maintains a continuously correct, queryable state over streaming data. Aggregations, joins, materialized views  every operator holds state, and every piece of state must be reliably managed.</p>
<p>After five years of refinement, RisingWave is running in production at thousands of companies. Production environments are unforgiving  workload spikes, node failures, and storage hiccups happen constantly. What keeps things standing is the architectural direction we chose from day one: compute-storage separation, S3 as primary storage, fully stateless compute nodes.</p>
<h2 id="heading-on-call-and-ai-sre">On-Call and AI SRE</h2>
<p>Streaming database users aren't running offline reports. They're running real-time fraud detection, payment settlement, and live monitoring alerts  all on the critical path. One minute of downtime means one minute of transactions halted, fraud checks disabled, alerts missed. On-call isn't "fix it tomorrow"  it's "fix it now." And distributed system failures don't respect working hours  3 AM alerts, cascading failures across dozens of nodes, a SQL query hitting an unseen corner case that stalls checkpointing. This is everyday life.</p>
<p>So we started building AI SRE. The most painful parts of incident response  alert triage, log analysis, fault diagnosis  are largely pattern-based. An experienced SRE sees a certain alert and automatically runs through a mental playbook. These playbooks can be encoded for AI agents.</p>
<p>But a useful AI SRE agent can't just be a chatbot that reads logs and gives advice. It has to <strong>take action</strong>  connect to the meta node to check barrier status, run diagnostic scripts to scan shared buffer backlogs across compute nodes, replay problematic SQL in a staging environment to reproduce bugs, analyze crash dumps hundreds of megabytes large. These are heavy operations, some potentially destructive. You don't want an AI agent running diagnostic scripts directly on production machines.</p>
<p>So the agent needs an isolated execution environment  a sandbox. Then we discovered: the existing sandbox solutions simply don't work.</p>
<h2 id="heading-the-fundamental-problem-with-ephemeral-sandboxes">The Fundamental Problem with Ephemeral Sandboxes</h2>
<p>The AI agent sandbox market is already quite hot  <a target="_blank" href="https://github.com/firecracker-microvm/firecracker">Firecracker</a> microVMs, <a target="_blank" href="https://github.com/google/gvisor">gVisor</a> containers, V8 isolates  isolation technologies are flourishing. But if you look closely at the architecture of these solutions, they're all fundamentally designed around <strong>ephemeral execution</strong>: a sandbox is created, code runs, and when it finishes or times out, the sandbox is reclaimed  session limits range from tens of minutes to twenty-four hours, after which all state is destroyed. Some solutions aggressively reclaim idle sandboxes for resource efficiency, causing multi-second cold start delays.</p>
<p>For running a simple Python script, ephemeral sandboxes are fine. But AI agents work very differently.</p>
<p>When an AI SRE agent investigates an incident, it first spends several minutes setting up the environment  installing diagnostic tools, pulling logs, configuring production-matching connection info. Then it runs analyses, generates intermediate files, starts auxiliary processes. The sandbox accumulates significant state. Then the agent pauses  waiting for human approval, or simply because one round of LLM conversation has ended.</p>
<p>Under the ephemeral model, pause = state death. Next time, rebuild from scratch. This isn't just wasted time  the intermediate state accumulated during the previous exploration often can't be reconstructed, because the reasoning chain that produced it is gone.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771470375670/eea6c4f6-eefa-495b-b735-bcf5136443e2.png" alt class="image--center mx-auto" /></p>
<p>This problem isn't unique to AI SRE. Coding agents need to persist development environments across sessions. Browser agents need to maintain browser context. RL training needs snapshot and restore. <strong>Everyone seriously building AI agents eventually hits the same wall: sandboxes need state.</strong></p>
<h2 id="heading-from-an-isolation-problem-to-a-state-management-problem">From an Isolation Problem to a State Management Problem</h2>
<p>Most sandbox solutions focus their energy on <strong>isolation</strong>  how to prevent untrusted code from escaping. This matters, but it's only half the problem. Once a sandbox needs to be stateful, you're no longer dealing with an isolation problem  you're dealing with a <strong>state management problem</strong>.</p>
<p>Specifically: state can't live only on local disk  that ties the sandbox to a single machine, and if the machine dies, the state is gone. Filesystem changes need to be persisted to object storage, with local disk serving only as cache.</p>
<p>Sandboxes need snapshots, and they must be incremental  snapshotting an entire filesystem each time is unusable in production. Any snapshot should serve as a rollback point  if the agent breaks the environment, revert to the last good state. From a snapshot, you should also be able to fork new instances, and forks must be copy-on-write so agents can explore multiple paths at low cost.</p>
<p>Compute-storage separation means idle sandboxes can release compute resources while retaining disk state, and be restored when needed. At scale, this directly determines cost.</p>
<p>Isolation still can't be compromised. AI agents execute untrusted code  each sandbox should be a micro-VM running its own Linux kernel with hardware-level isolation. Not container namespace isolation  breaking out requires a hypervisor exploit, not a kernel exploit.</p>
<p>If you've built any kind of long-running stateful system  databases, stream processing, distributed storage  you've seen all of these challenges before. They're universal state management problems, just in a different context.</p>
<h2 id="heading-state-management-from-stream-processing-to-sandbox">State Management: From Stream Processing to Sandbox</h2>
<p>We spent five years solving stateful computation's state management challenges in RisingWave. When we started thinking about how to build a stateful sandbox, we found that many of the core challenges are shared.</p>
<p><strong>Persistence: S3 as source of truth.</strong> RisingWave chose S3-as-primary-storage from day one. Our custom storage engine Hummock writes all state as immutable SSTables to S3, organized by table ID and epoch, never doing in-place updates. This makes compute nodes truly stateless  if one dies, spin up a new one, pull state from S3, recover in seconds.</p>
<p>For sandboxes, the same principle applies: filesystem state needs to be persisted to object storage, with local disk serving only as cache. But the shape of sandbox state is different from a database  it's not KV pairs, but an entire filesystem (OS, packages, user files, intermediate data from running processes). How to efficiently sync filesystem changes to S3 is one of the core problems we're currently exploring.</p>
<p><strong>Checkpointing: must be incremental.</strong> RisingWave implements epoch-based asynchronous checkpointing: the meta node injects a barrier into the data stream every second, operators asynchronously dump local state to a shared buffer upon receiving the barrier, which then uploads to S3 in the background  checkpointing doesn't block data processing. The key is that checkpoints are incremental, only persisting changed data.</p>
<p>Sandbox snapshots face the same constraint  full snapshots of an entire filesystem are too slow and too expensive. A natural approach is to leverage copy-on-write disk formats (such as <a target="_blank" href="https://qemu-project.gitlab.io/qemu/system/images.html">QCOW2</a>'s overlay mechanism): each snapshot freezes the current layer and creates a new overlay that only records subsequent writes. This way, snapshot cost scales with the amount of change, not the total filesystem size.</p>
<p><strong>Rollback and Fork.</strong> When a RisingWave node fails, it loads the latest checkpoint from S3, replays the last few seconds of data, and recovers in seconds. For sandboxes, if each snapshot is an overlay layer, rollback means discarding overlays after the target point; fork means creating multiple independent overlay chains from the same snapshot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771470402070/9f3dc0a1-53b8-42b0-80e1-57ec4ad47803.png" alt class="image--center mx-auto" /></p>
<p>The agent can fork a branch at snap-2 to try plan B while the original sandbox continues with plan A, keeping whichever result is better. If neither works, roll back to snap-1 and start over. Each fork shares all previous overlay data  only new writes consume additional space.</p>
<p><strong>Compute-storage separation and elasticity.</strong> RisingWave's compute nodes are stateless  scaling means adding or removing compute nodes without migrating data. LSM-tree compaction is offloaded to dedicated compactor nodes, avoiding resource contention with computation. For sandboxes, compute-storage separation means idle sandboxes can pause  releasing CPU and memory while disk state stays in the persistence layer  and cold-start back when needed. In large-scale agent deployments, a large number of sandboxes are idle at any given moment, making this a direct cost driver.</p>
<p><strong>Isolation.</strong> This is a dimension unique to sandboxes. Stream processing operators run in a trusted environment and don't need strong isolation. But AI agents execute untrusted code  each sandbox should be a micro-VM running its own Linux kernel, with hardware-level isolation via KVM or Hypervisor.framework. Not container namespace isolation  breaking out requires a hypervisor exploit, not a kernel exploit. BoxLite implements this layer using <a target="_blank" href="https://github.com/containers/libkrun">libkrun</a>  a lightweight KVM-based VMM library that provides near-container startup speed and resource overhead, but with VM-level isolation strength.</p>
<p>Side by side:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771470421979/42d8fb38-bf80-4acf-9053-86684eae87c7.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-sandboxes-should-be-embedded">Sandboxes Should Be Embedded</h2>
<p>Everything above is about cloud-level architecture. But I believe sandboxes should first be <strong>embedded</strong>.</p>
<p>Look at the database world. The hottest databases of recent years aren't the most feature-rich cloud databases  they're <a target="_blank" href="https://www.sqlite.org/">SQLite</a> and <a target="_blank" href="https://duckdb.org/">DuckDB</a>  because they're embedded, just import and use. This doesn't mean cloud doesn't matter. It means that trying an idea or validating a scenario on your own machine is always the most natural, fastest way. Get it working locally, understand it, then decide whether to move to the cloud.</p>
<p>BoxLite applies this philosophy to sandboxes. <code>pip install boxlite</code>, three lines of code to spin up a hardware-isolated micro-VM locally. No daemon, no root, no complex deployment. When a sandbox becomes a library you can import, every AI agent developer can use it directly  start locally, connect to cloud-based S3 persistence and elastic scaling when needed. Local-first, cloud-ready.</p>
<h2 id="heading-the-future-of-agent-infra">The Future of Agent Infra</h2>
<p>BoxLite was started by my friend Dorian Zheng in mid-2025. When I began exploring the possibilities of sandboxes with him late last year, we increasingly realized two things: first, stateful sandboxes are a severely underestimated developer pain point; second, the cloud-native stateful system experience we accumulated at RisingWave is almost directly transferable to this domain. This alignment wasn't planned  it emerged from doing the work.</p>
<p>Agentic AI is just getting started. Today, attention is still focused on model capabilities  smarter reasoning, longer context, better tool use. But when agents truly start running at scale, the bottleneck will inevitably shift from the model layer to the infrastructure layer. Agents need reliable execution environments, state management, and secure isolation. These problems don't have good answers yet.</p>
<p>This is the best time to build agent infrastructure.</p>
<hr />
<p><em>BoxLite is open source on GitHub:</em> <a target="_blank" href="https://github.com/boxlite-ai/boxlite"><em>github.com/boxlite-ai/boxlite</em></a></p>
]]></description><link>https://boxlite.hashnode.dev/from-stateful-stream-processing-to-stateful-sandbox</link><guid isPermaLink="true">https://boxlite.hashnode.dev/from-stateful-stream-processing-to-stateful-sandbox</guid><category><![CDATA[AI]]></category><category><![CDATA[ai agents]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Yingjun Wu]]></dc:creator></item><item><title><![CDATA[Let's Talk About Sandboxes]]></title><description><![CDATA[<p>Starting around late 2025, people in the infra community began talking about sandbox technology. The general idea goes like this: in the age of agents, every agent should have its own computer, just like a person does. That computer is the sandbox.</p>
<p>If you're not familiar with sandbox technology, you can think of it simply as a virtual machine. Funny enough, nearly 15 years ago when I was interning at <a target="_blank" href="https://en.wikipedia.org/wiki/EMC_Corporation">EMC</a>, I was working on something closely related to today's sandboxes - <a target="_blank" href="https://xenproject.org/">Xen</a>. Over the last two decades, we built sandboxes for humans. Now, we're building sandboxes for AI. That feels pretty surreal. But if virtualization has been around for decades, why is everyone suddenly revisiting what seems like an extremely mature technology? Well, because the requirements are completely different now.</p>
<p>In this post, we'll start from real-world sandbox use cases, explore the underlying technology, and discuss what the future of sandboxes looks like.</p>
<h2 id="heading-starting-with-manus">Starting with Manus</h2>
<p>One of the biggest stories in AI in 2025 was Meta's acquisition of AI startup <a target="_blank" href="https://manus.im/">Manus</a> for over $2 billion. The name Manus comes from the Latin motto "Mens et Manus"  "mind and hand"  embodying the belief that knowledge must be put into action to make an impact. From 2022 to 2024, everyone was using <a target="_blank" href="https://chatgpt.com/">ChatGPT</a> to chat. Manus showed the world that AI can do more than just provide information  it can actually get things done. Once AI can do work, it's no longer just a chatbot. It's an agent. So here's the question: what does an agent have to do with a sandbox? Can't the agent just use the user's computer or a regular server?</p>
<p>Let's take Manus as an example. Manus built its agents in the cloud. Suppose Manus is serving 1,000 concurrent users (in reality, far more!). Each user's request spawns an agent, meaning 1,000 agents need to run simultaneously. Now imagine all 1,000 agents on a single server. What happens? Each agent wants to install its own packages, configure its own environment, have its own filesystem, and claim its own CPU resources. So what do you do? Obviously, each agent needs its own office to work in. That office is the sandbox.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771150372567/0730a5d3-2233-452a-8f3a-c50b143e657f.png" alt class="image--center mx-auto" /></p>
<p>Under the hood, Manus used microVM-based sandbox technology to solve exactly this problem. Each agent runs in an isolated sandbox with its own browser, terminal, and filesystem, capable of running Python, JavaScript, Bash, and more in isolation. By the time of its acquisition, Manus had created over 80 million virtual computers  each one a standalone sandbox.</p>
<p>Of course, there's only one Manus. But do we, as individual developers, still need sandboxes in our daily work with agents? Absolutely. If you're a programmer, chances are you've used <a target="_blank" href="https://docs.anthropic.com/en/docs/claude-code">Claude Code</a>. Claude Code is a coding agent that downloads packages, runs scripts, and reads/writes files on your local machine. We trust Anthropic's engineering and credibility, so we're comfortable letting their agent operate on our computers. Claude Code also promises to mostly stay within a user-specified directory, which gives a sense of security. But when we use something more powerful, more permission-hungry  like <a target="_blank" href="https://github.com/openclaw/openclaw">OpenClaw</a> (well, congrats Peter for joining OpenAI!), or some brand-new no-name product  would you still dare run it directly on your local machine? I doubt.</p>
<p>OpenClaw is a great example. This open-source AI agent can run shell commands directly on your OS, control browsers, manage local files, can talk with you via Telegram. It's so powerful that people call it "a real-life Jarvis." But power comes with risk  the security teams have found a couple of third-party OpenClaw skills performing data exfiltration and prompt injection without user awareness. This is where sandboxes become a necessity.</p>
<h2 id="heading-sandbox-technologies">Sandbox Technologies</h2>
<p>Broadly speaking, a sandbox is just an isolated environment. An EC2 instance can be a sandbox. A Docker container can be a sandbox. A microVM can be a sandbox. But they behave very differently when it comes to AI agent workloads. Let's start with a high-level comparison:</p>
<h3 id="heading-technology-comparison">Technology Comparison</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>EC2</td><td>K8s + Containers</td><td>Docker</td><td><a target="_blank" href="https://gvisor.dev/">gVisor</a></td><td>MicroVM</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Startup time</strong></td><td>Minutes</td><td>Seconds</td><td>Seconds</td><td>Seconds</td><td>&lt;200ms</td></tr>
<tr>
<td><strong>Isolation</strong></td><td>Full VM</td><td>Shared kernel</td><td>Shared kernel</td><td>User-space kernel</td><td>Dedicated kernel</td></tr>
<tr>
<td><strong>Resource overhead</strong></td><td>Very high</td><td>Medium</td><td>Low</td><td>Low</td><td>Very low (~5MB)</td></tr>
<tr>
<td><strong>Elasticity</strong></td><td>Poor</td><td>Good</td><td>Manual</td><td>Depends on orchestrator</td><td>Good</td></tr>
<tr>
<td><strong>Ops complexity</strong></td><td>Low (but expensive)</td><td>Very high</td><td>Low</td><td>Medium</td><td>Medium</td></tr>
<tr>
<td><strong>Fit for AI agents</strong></td><td>Too slow, too heavy</td><td>Works but complex</td><td>Security insufficient</td><td>Compatibility issues</td><td>Best balance</td></tr>
</tbody>
</table>
</div><p>Let's walk through each one.</p>
<h3 id="heading-ec2-traditional-vms">EC2 / Traditional VMs</h3>
<p>EC2 is too heavy. Slow to start (minutes), expensive, and lacking good auto-scaling. If you need to instantly spin up a fleet of agents for parallel deep research, EC2 simply can't deliver. In short, EC2 was designed for long-running services, not for fast-start, disposable agent sandboxes.</p>
<h3 id="heading-k8s-containers">K8s + Containers</h3>
<p>K8s solves the elasticity problem but introduces operational complexity. And its default container isolation shares the host kernel  if AI-generated code triggers a kernel vulnerability, it could escape the container. Google recently open-sourced the <a target="_blank" href="https://github.com/kubernetes-sigs/agent-sandbox">Agent Sandbox</a> project (using gVisor), which is K8s's formal response to the sandbox use case.</p>
<h3 id="heading-docker-containers">Docker Containers</h3>
<p>Fast to start, mature ecosystem, developer-friendly. Docker has even launched a dedicated <a target="_blank" href="https://docs.docker.com/ai/sandboxes/">Docker Sandboxes</a> product for AI agents. But the core problem remains: containers share the host kernel, so isolation isn't strong enough.</p>
<h3 id="heading-gvisor">gVisor</h3>
<p>A user-space kernel developed by Google. Its core component, Sentry, reimplements a large number of Linux syscalls in Go, intercepting and handling them in user space. This dramatically reduces the attack surface. More secure than containers, lighter than microVMs, but with compatibility trade-offs  not all syscalls are perfectly emulated. I/O-heavy workloads can see 10-30% overhead.</p>
<h3 id="heading-microvm">MicroVM</h3>
<p>The current best answer for sandbox isolation. AWS's open-source Firecracker gives each microVM its own dedicated kernel with hardware-level isolation via KVM. Extremely lean: &lt;5MB memory overhead, &lt;125ms startup, up to 150 microVMs per second on a single host. Most sandbox platforms on the market (such as <a target="_blank" href="https://e2b.dev/">E2B</a>, <a target="_blank" href="https://www.daytona.io/">Daytona</a>, etc.) are built on Firecracker or similar microVM technology.</p>
<blockquote>
<p><strong>Takeaway: From an isolation standpoint, microVMs have provided a great answer  security approaching traditional VMs, performance approaching containers. But choosing the right isolation technology is only step one. When you actually start using sandboxes, you'll find a deeper set of questions to answer.</strong></p>
</blockquote>
<h2 id="heading-beyond-isolation-what-actually-matters">Beyond Isolation: What Actually Matters</h2>
<p>The isolation question is largely settled  microVMs win. But the differences between sandbox solutions today are no longer about <em>how they isolate</em>. They're about everything else: how you integrate them, whether they keep state, and what primitives they expose. Four questions separate a great sandbox from a merely functional one:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Property</td><td>What it means</td></tr>
</thead>
<tbody>
<tr>
<td>Embeddable</td><td>A library, not a service. No daemon, no root, no cloud account.</td></tr>
<tr>
<td>Stateful</td><td>Persistent sessions  pause today, resume tomorrow. Your agent's environment survives across sessions.</td></tr>
<tr>
<td>Snapshots</td><td>Capture a running VM's full state. Fork, rollback, migrate, share. Version control for runtime environments.</td></tr>
<tr>
<td>Hardware isolation</td><td>Dedicated kernel per sandbox, not namespace tricks. Table stakes  but not everyone delivers it.</td></tr>
</tbody>
</table>
</div><h3 id="heading-embeddable-a-library-not-a-service">Embeddable: A library, not a service</h3>
<p>Most sandbox solutions assume you're running a separate service. Cloud solutions require calling a remote API, signing up for an account, and managing API keys. Docker requires a background daemon and typically root privileges. K8s, well, don't even get me started.</p>
<p>For a use case as simple as "I just want to safely run a piece of AI-generated code in my Python script," these are all way too heavy.</p>
<p>This reminds me of an analogy from the database world. Before SQLite, if you wanted to use a database in your application, you had to install MySQL or PostgreSQL, configure connections, and manage a separate database process. SQLite's revolution was this: it's an embedded database  no separate process, no network configuration, just a library linked into your application. This made databases ubiquitous  from mobile apps to browsers to embedded devices.</p>
<p>The sandbox space needs the same "SQLite moment." The ideal sandbox should be a library you embed into your application  not a service you deploy alongside it.</p>
<h3 id="heading-stateful-more-than-disposable">Stateful: More than disposable</h3>
<p>Most sandbox solutions today lean toward transient design. Disposable sandboxes are great for one-shot tasks: a user sends a request, the agent spins up a sandbox, executes, and destroys it. Clean and simple.</p>
<p>But as agents grow more capable, more and more use cases demand stateful sandboxes. Imagine a long-running coding agent: yesterday it installed a bunch of dependencies, configured the dev environment, and got the tests passing. You come back today to continue working  of course you want all of that state to still be there. Rebuilding from scratch every time is a terrible experience.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Use case</td><td>Transient sandbox</td><td>Stateful sandbox</td></tr>
</thead>
<tbody>
<tr>
<td>One-shot code execution</td><td>Perfect</td><td>Not needed</td></tr>
<tr>
<td>Coding agent, ongoing development</td><td>Rebuild env every time</td><td>State persists across sessions</td></tr>
<tr>
<td>Personal agent, daily assistant</td><td>Loses accumulated context</td><td>Memory and tools always available</td></tr>
<tr>
<td>Deep research, parallel exploration</td><td>Fork multiple sandboxes</td><td>Branch from checkpoints</td></tr>
</tbody>
</table>
</div><p>A truly great sandbox should support persistent sessions: assign a sandbox an ID, pause it today, resume it tomorrow, all state intact. Just like closing your laptop lid and opening it the next day  everything is still there.</p>
<h3 id="heading-snapshots-version-control-for-runtime-environments">Snapshots: Version control for runtime environments</h3>
<p>This is where things get really interesting.</p>
<p>A Docker image is a filesystem snapshot  it records what's <em>installed</em>. But a VM snapshot captures a running VM's full state  disk, running processes, open network connections, everything. When you restore a snapshot, you're not cold-starting an environment; you're resuming a <em>live</em> one. The browser is already open. The server is already listening. The dev environment is already warm.</p>
<p>This single primitive  capturing and restoring live VM state  unlocks a surprising number of capabilities:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771150730126/3a94b38a-7633-43eb-8d20-e6846ba67635.png" alt class="image--center mx-auto" /></p>
<p><strong>Templates</strong>  Not just "Python is installed," but "Python is running, the browser is open, the MCP server is listening." Users restore a template and are <em>instantly</em> in a working state, skipping all startup overhead.</p>
<p><strong>Fork / Branch</strong>  From a single snapshot, fork multiple sandboxes to explore different directions in parallel. A deep research agent hits a decision point? Snapshot, fork into two sandboxes, try both paths simultaneously. An eval harness? Fork 100 sandboxes from the same checkpoint, run 100 strategies.</p>
<p><strong>Rollback</strong>  Agent broke something? Instantly revert to a previous checkpoint. Like <code>git checkout</code>, but for the entire running environment.</p>
<p><strong>Migration</strong>  Move a running sandbox between machines. From your desktop to your laptop, or from local to cloud for a burst of heavy computation.</p>
<p><strong>Crash recovery</strong>  Long-running agents periodically snapshot. If they crash, they don't restart from zero  they resume from the last checkpoint.</p>
<p>Docker images can't do any of this. They're static filesystem layers. A live VM snapshot is fundamentally more powerful  it's version control for runtime environments.</p>
<h3 id="heading-hardware-isolation-the-baseline">Hardware isolation: The baseline</h3>
<p>This one is table stakes. Every sandbox needs it, but it's not what differentiates products anymore.</p>
<p>Docker containers share the host kernel  a kernel vulnerability can lead to container escape. You have no control over what code an AI agent generates. An innocent-looking script could trigger a kernel bug and break out of the container.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771150771703/1a7fd148-5cd2-4e9c-9f7f-d0ce6f678499.png" alt class="image--center mx-auto" /></p>
<p>A microVM gives each sandbox its own kernel. The boundary is hardware virtualization, not namespaces. Even if the code inside exploits a kernel bug, it's exploiting the <em>guest</em> kernel, not yours. The blast radius is contained.</p>
<p>This is a necessary foundation  but most serious sandbox providers already deliver it. The real differentiation lies in the three properties above.</p>
<h2 id="heading-comparing-existing-solutions">Comparing Existing Solutions</h2>
<p>Let's stack these four properties against the current landscape:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>Embeddable</td><td>Stateful</td><td>Snapshots</td><td>Hardware isolation</td></tr>
</thead>
<tbody>
<tr>
<td><a target="_blank" href="https://e2b.dev/"><strong>E2</strong></a><a target="_blank" href="https://e2b.dev/"><strong>B</strong></a></td><td>No  Cloud API</td><td>24h session limit</td><td>Pause/resume only</td><td>Yes  microVM</td></tr>
<tr>
<td><a target="_blank" href="https://modal.com/"><strong>Moda</strong></a><a target="_blank" href="https://modal.com/"><strong>l</strong></a></td><td>No  Cloud API</td><td>No  Transient</td><td>No</td><td>Partial  gVisor</td></tr>
<tr>
<td><a target="_blank" href="https://www.daytona.io/"><strong>D</strong></a><a target="_blank" href="https://modal.com/"><strong>a</strong></a><a target="_blank" href="https://e2b.dev/"><strong>yto</strong></a><a target="_blank" href="https://modal.com/"><strong>n</strong></a><a target="_blank" href="https://www.daytona.io/"><strong>a</strong></a></td><td>No  Cloud API</td><td>Yes  Stateful</td><td>No</td><td>Partial  Docker by default</td></tr>
<tr>
<td><a target="_blank" href="https://modal.com/"><strong>Docke</strong></a><a target="_blank" href="https://docs.docker.com/ai/sandboxes/"><strong>r</strong></a> <a target="_blank" href="https://modal.com/"><strong>Sa</strong></a><a target="_blank" href="https://e2b.dev/"><strong>ndb</strong></a><a target="_blank" href="https://www.daytona.io/"><strong>ox</strong></a></td><td>Partial  Requires daemon + root</td><td>Yes  Persistent</td><td>No</td><td>Partial  microVM</td></tr>
<tr>
<td><a target="_blank" href="https://e2b.dev/"><strong>Se</strong></a><strong>l</strong><a target="_blank" href="https://docs.docker.com/ai/sandboxes/"><strong>f-ho</strong></a><a target="_blank" href="https://www.daytona.io/"><strong>sted</strong></a> <a target="_blank" href="https://modal.com/"><strong>K8s</strong></a></td><td>No  Very heavy</td><td>Yes  Varies</td><td>No</td><td>Partial  Varies</td></tr>
</tbody>
</table>
</div><p>As you can see, <strong>no existing solution checks all four boxes.</strong> That's a gap waiting to be filled.</p>
<h2 id="heading-boxlite-the-sqlite-for-sandboxes">BoxLite: The SQLite for Sandboxes</h2>
<p>This brings me to work on a project I think fills this gap: <a target="_blank" href="https://github.com/boxlite-ai/boxlite">BoxLite</a>.</p>
<p>BoxLite's positioning is clear  be the SQLite of sandboxes. It's an embeddable micro-VM runtime written in Rust:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771156440213/2631ac80-cb7f-437b-b34f-4234edcf5f22.png" alt class="image--center mx-auto" /></p>
<p>Here's what the core features look like in practice:</p>
<p><strong>Embeddable.</strong> No daemon, no root, no Docker, no cloud account. <code>pip install boxlite</code>, write three lines of code, and you have a hardware-isolated sandbox running inside your application. It runs on your machine  no network latency, works offline, no usage fees.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> boxlite

<span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> boxlite.SimpleBox(image=<span class="hljs-string">"python:slim"</span>) <span class="hljs-keyword">as</span> box:
    result = <span class="hljs-keyword">await</span> box.exec(<span class="hljs-string">"python"</span>, <span class="hljs-string">"-c"</span>, <span class="hljs-string">"print('Hello from sandbox!')"</span>)
    print(result.stdout)
</code></pre>
<p><strong>Stateful.</strong> Assign a <code>session_id</code> to a sandbox, resume it later with the same ID. All files, installed packages, and environment configurations are exactly as you left them. Since sandboxes run on your own machine, pausing them costs nothing  no cloud bills for idle time, no 24-hour expiration clocks.</p>
<p><strong>Snapshots.</strong> Capture a running Box's full state and restore it instantly. Build a template where the browser, dev server, and MCP tools are already running; fork a sandbox at a decision point to explore two paths in parallel; rollback when an agent breaks something; migrate a running environment between machines. This is version control for runtime environments  something Docker images fundamentally cannot do.</p>
<p><strong>Hardware isolation.</strong> Every Box is a microVM with its own dedicated kernel  hardware-level virtualization via KVM on Linux and Hypervisor.framework on macOS. Fully OCI-compatible, so any Docker Hub image works out of the box. Would you let a random AI agent run <code>rm -rf /</code> on your machine? With BoxLite, you can  it'll only destroy the sandbox.</p>
<p>The BoxLite team has also built some higher-level projects: <a target="_blank" href="https://github.com/boxlite-ai/claudebox">ClaudeBox</a> uses BoxLite to isolate Claude Code execution, and <a target="_blank" href="https://github.com/boxlite-labs/boxlite-mcp">boxlite-mcp</a> provides an MCP server that integrates directly with Claude Desktop, letting AI agents operate browsers and run commands in an isolated desktop environment.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Back to the question we started with: why is a technology that's been around for decades suddenly being revisited in the age of AI?</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Virtualization 15 years ago</td><td>Sandboxes today</td></tr>
</thead>
<tbody>
<tr>
<td>Built for humans</td><td>Built for AI agents</td></tr>
<tr>
<td>Startup: minutes</td><td>Startup: milliseconds</td></tr>
<tr>
<td>Lifespan: months / years</td><td>Lifespan: seconds / hours</td></tr>
<tr>
<td>Concurrency: dozens</td><td>Concurrency: thousands</td></tr>
<tr>
<td>Deployment: managed by ops teams</td><td>Deployment: pip install</td></tr>
<tr>
<td>Core need: resource utilization</td><td>Core need: security + speed + DX</td></tr>
</tbody>
</table>
</div><p>The requirements have changed, but the core need  "provide an isolated, secure space for computation"  has never changed.</p>
<p>The way I see it, the future of sandboxes will be layered. Cloud-based managed sandboxes will serve SaaS products and high-concurrency scenarios. But for the broader developer community  those writing code locally, running agents locally, building applications locally  sandboxes need to be as lightweight, embeddable, and local-first as SQLite. <a target="_blank" href="https://github.com/boxlite-ai/boxlite">BoxLite</a> is doing exactly that.</p>
<p>Giving every developer and every agent an isolated, secure environment at their fingertips  that's probably the most important thing happening in the sandbox space right now.</p>
]]></description><link>https://boxlite.hashnode.dev/lets-talk-about-sandboxes</link><guid isPermaLink="true">https://boxlite.hashnode.dev/lets-talk-about-sandboxes</guid><category><![CDATA[AI]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[virtual machine]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Yingjun Wu]]></dc:creator></item></channel></rss>