
LLMs can reason. But reasoning alone doesn't get much done.
Running code execution in an AI agent is harder than it looks. Your agent needs a real computer (filesystem, shell, package manager, persistent state) but handing it access to your infrastructure is dangerous.
Think about it this way: you use one laptop. You are n of one. But agents are going to run millions of tasks, and each one needs its own computer to work from. That's the infrastructure shift happening right now. Satya Nadella put it plainly: "Every agent needs a computer." The question is what that computer looks like, and how you give it to them safely.
LangSmith Sandboxes are our answer to that. Here's why it matters, and why doing it yourself is harder than it sounds.
What becomes possible when an agent has a computer
Think about what Cursor, Claude Code, or ChatGPT's code interpreter can do that a plain chat interface can't. They don't just answer questions: they run the code, see the error, fix it, run it again, and hand you something that works. That feedback loop is what makes them useful.
That same loop is what separates a demo agent from a production agent. Once your agent can execute, a whole category of work opens up:
- A coding assistant that doesn't just suggest a fix: it applies the fix, runs your tests, and confirms nothing broke
- A data analyst that pulls a CSV, runs Python against it, and hands you a formatted report
- A CI agent that clones your repo, installs dependencies, runs the full test suite, and opens a PR (like OpenSWE)
- A research agent that browses, scrapes, synthesizes, and writes — not just searches
- A content pipeline that generates, renders, and exports finished artifacts
- An RL or eval harness that needs to spin up environments in parallel, run episodes at burst scale, and tear them down immediately — zero to thousands of sandboxes, then back to zero
The common thread: these agents need more than a token stream. They need a place to work.
Why you can't just hand your agent your laptop
The obvious next question is: why not just let the agent run code locally? Or in a Docker container? Teams do this in early prototypes. It stops working in production for two reasons.
First: agents run untrusted code by definition.
The code your agent executes might come from a model, a user prompt, a cloned repo, or an installed package. You didn't write it. You can't fully vet it. In September 2025, a self-replicating npm worm called Shai-Hulud backdoored 500+ packages — code that executed in preinstall before any validation could run. A second wave in November hit 796 more packages and 25,000+ GitHub repos in hours. An agent that installs npm packages as part of its workflow is exposed to exactly this.
Second: containers aren't enough.
The common instinct is "just run it in Docker." Containers are great for isolating known, vetted application code (i.e. a web server, a background job). They're not designed for an agent that's installing arbitrary dependencies, running model-generated scripts, and persisting state across a long-running session. And critically: containers share a kernel with the host. A kernel exploit reaches through them. Copy Fail (CVE-2026-31431) is a 732-byte Python script that roots every major Linux distribution back to 2017 via the kernel crypto API. AI tooling found it in about an hour.
A container boundary is not an isolation boundary. For untrusted, model-generated code, you need hardware-level separation.
LangSmith Sandboxes: a computer for every agent
The mental model that helps here: a sandbox needs to be two things at once. It needs the instant startup of a serverless function because you can't make an agent wait two minutes for a VM to boot. And it needs the statefulness of a full machine because agents aren't stateless request-handlers; they're mid-session workers that install dependencies, edit files, and pick up where they left off.
LangSmith Sandboxes are built for that model. Each one is a hardware-virtualized microVM. Not a container, a full machine with its own kernel. The agent gets:
Agent
└── its own computer
├── filesystem
├── shell
├── package manager
├── network access
├── code execution
└── persistent stateIt can install packages, run scripts, edit files, spin up a local server, and keep working across a long session — all without touching your production infrastructure or any other agent's sandbox. When the work is done, the sandbox disappears.
You access it through the same LangSmith SDK and API key you already use:
from langsmith import Client
client = Client()
sandbox = client.create_sandbox()
# Give the agent a shell
result = sandbox.run("pip install pandas && python analysis.py")
print(result.stdout)It just takes one call, and your agent has a computer.
There's also a less obvious benefit for teams running GPU workloads: when your sandbox spins up instantly, your GPU doesn't idle waiting for CPU compute to provision. Fast sandboxes are a GPU efficiency multiplier — a detail that compounds quickly at scale.
What you get beyond basic execution
A sandbox is more than a place to run code. The GA release ships a set of primitives that make agent workflows production-ready:
Snapshots and forks: Capture a sandbox mid-session and boot new ones from it. Forks use copy-on-write, so spinning up ten parallel branches costs roughly the same as one. When your agent goes down the wrong path, restore and try again, without rebuilding from scratch.
Blueprints for pre-warmed environments: Define a base image (your repo cloned, your deps installed, your config in place) and boot sandboxes from it in seconds instead of minutes.
Service URLs: If the agent starts a local web server — say, to preview a generated report — you get an authenticated URL you can open in a browser or share with a teammate. No port forwarding.
Auth proxy: Outbound requests from the sandbox flow through a proxy that injects credentials at the network layer. Secrets never touch the agent runtime.
Creator-private by default: Only the user who launched a sandbox (and workspace admins) can access it. Share when you're ready.
When to reach for Sandboxes
Sandboxes are the right layer when your agent needs to do something, not just say something. Concretely:
- Your agent generates code and you want it to verify that code runs before responding
- You're building a coding assistant, CI agent, or data pipeline that operates on real files
- You're running multi-step workflows where state needs to persist across tool calls
- You need burst capacity (i.e. thousands of parallel environments for RL training or evals) that has to scale from zero in seconds
- You're accepting any user-supplied input that could end up being executed
Sandboxes are overkill if your agent only calls APIs with fixed schemas and never executes dynamic code. A retrieval agent that searches docs and returns citations doesn't need one. An agent that writes and runs a Python script does.
How teams are using this today
At monday.com, Sandboxes power their Sidekick AI assistant, giving it a secure environment to write and run code for advanced user workflows, including data analysis and multimedia generation.
"LangSmith Sandboxes are helping us make our Sidekick much more capable for monday.com users. With secure environments, Sidekick can write and run code, and use the results to create richer workflows, like running data analysis and generating multimedia."
— Omri Bruchim, AI Platform Group Manager, monday.com
The shift worth paying attention to
For the last few years, making an agent more capable meant giving it better tools: a search API, a calculator, a database connector. That's still true. But the ceiling on what predefined tools can do is low.
The agents that will actually replace workflows (not just assist with them!) are the ones that can pick up whatever tool they need, run it, see what happened, and adapt. That's what having a computer makes possible. It's not an infrastructure detail. It's the difference between an agent that can think and an agent that can act.
You use one laptop. Your agents will each need their own. LangSmith Sandboxes are how you give them one.
Get started: Try LangSmith Sandboxes or read the docs.






