LangSmith

LangSmith Sandboxes are Generally Available

Mukhil Loganathan
May 13, 2026
6
min
Go back to blog

Key Takeaways

  • LangSmith Sandboxes are now GA — Each sandbox runs as a hardware-virtualized microVM, fully kernel-isolated from your services and other sandboxes, making them genuinely secure for running untrusted, model-generated code, something containers alone can't guarantee.
  • Agents need real isolation, not just "sandbox" features — Real-world supply chain attacks and kernel exploits (like the Shai-Hulud npm worm and Copy Fail CVE) show that running agent code in containers or eval boundaries is dangerously insufficient for production workloads.
  • GA ships powerful new primitives for agent workflows — Snapshots and cheap copy-on-write forks, Blueprints for pre-warmed environments, Service URLs, a Sandbox CLI, and an Auth Proxy make LangSmith Sandboxes a full execution platform.

Today, LangSmith Sandboxes are Generally Available: secure, scalable environments built for agent code execution, and integrated with the Deep Agents SDK and the LangSmith platform.

Each sandbox is a hardware-virtualized microVM, kernel-isolated from your services and from other sandboxes. Sandboxes use the same SDK and API key as the rest of LangSmith and work with any framework or custom code.

Try LangSmith Sandboxes

Why do agents need sandboxes?

Over the past year, a new class of agents has started to use code execution as part of their core workflow. Systems like Cursor, Claude Code, OpenSWE, and Deep Agents don’t just call predefined tools. They generate code, install dependencies, run tests, inspect failures, and edit files.

A few common workloads that need code execution:

  • A coding assistant that runs and validates its own output before responding
  • A CI-style agent that clones a repo, installs deps, runs tests, and opens a PR (like OpenSWE)
  • A data analysis agent that runs Python against a dataset

These agents need a computer-like environment with a filesystem, package manager, shell, and persistent state. They also need isolation, because the code they run may be generated by a model, pulled from an external dependency, or supplied by a user.

Most teams start by running this on a laptop. That works for a prototype, but it breaks down in production.

Agent code needs strong isolation

The risks of running agent code outside a real isolation boundary aren't theoretical:

  • Supply-chain attacks can reach into your runtime: In September 2025, the self-replicating Shai-Hulud npm worm backdoored 500+ packages including @ctrl/tinycolor, executing in preinstall before any tests ran. A second wave in November hit 796 more packages (20M+ weekly downloads) and 25,000+ GitHub repos in hours.
  • "Sandbox" features aren't always sandboxes: n8n had six RCE CVEs disclosed in a single day, including CVE-2026-1470 (CVSS 9.9) bypassing the JS expression sandbox and CVE-2026-0863 breaking out of the Python task executor. A JS eval boundary is not isolation.
  • Containers share a kernel, and kernels break: Copy Fail (CVE-2026-31431) is a 732-byte Python script that roots every major Linux distribution back to 2017 via the kernel crypto API. AI tooling surfaced it in about an hour. Containers can't help here because they share a kernel with the host, so an agent running the wrong script escapes.

Containers weren’t built for agent workloads. They’re designed to run known, vetted application code statelessly, such as a web server that handles fixed operations and disappears. Agents are the opposite. They want stateful little computers where they can install packages, edit files, follow long-running threads of work, and come back to where they left off. And the code they run is untrusted by definition. LangSmith Sandboxes are built for that execution model.

LangSmith Sandboxes

LangSmith Sandboxes give agents a computer-like environment they can use without putting your infrastructure at risk. Each sandbox runs as an ephemeral microVM with its own filesystem, shell, package manager, and network boundary. Agents can write code, install dependencies, run tests, and keep working across long-running sessions, while the sandbox stays isolated from your services and from other sandboxes.

They’re managed through the same LangSmith SDK and API key teams already use, so you can attach secure code execution to an agent workflow without building the runtime layer yourself. Sandboxes work with Deep Agents, Open SWE, LangSmith Deployment, LangSmith Fleet, and custom code. They also include the production controls teams need around credentials, resource limits, lifecycle, and access, with GA adding new capabilities for parallel workloads, snapshotting, and enterprise security.

New features with the GA release

  • Snapshots and cheap forks: Capture a running sandbox, or build one from a Docker image, then boot new sandboxes from it. Forks share state via copy-on-write, so spinning up ten parallel branches costs about the same as one. When your agent goes down a wrong path, you can restore and try a different branch.
  • Pause when inactive: Idle sandboxes pause automatically, so you don’t pay for resources that are doing anything.
  • Service URLs: Authenticated HTTP access to anything running inside a sandbox. Open a sandbox-hosted preview in a browser, hit it from a script, or share the URL with a teammate. No port forwarding needed.
  • Sandbox CLI: Build snapshots from Dockerfiles, manage sandboxes, open interactive consoles, tunnel raw TCP, and use standard tools (ssh, scp, rsync, sftp) against a sandbox like any Linux box.
  • Creator-private by default: Sandboxes ship with creator-specific auth, so only the user who launched a sandbox (and workspace admins) can shell into it or open its Service URLs. Grant access to other workspace members when you are ready to share.
  • Auth Proxy with custom callbacks: Outbound requests from a sandbox flow through a proxy that injects credentials at the network layer, so secrets never touch the runtime. New in GA: callbacks let you plug in custom secret resolution for advanced setups (per-tenant tokens, vault lookups, audit hooks). Also allowlist/denylist domains to control your access boundary.

How teams are using Sandboxes

Sandboxes are already helping teams move from agents that answer questions to agents that can do work safely. At monday.com, that means giving Sidekick a secure environment to write and run code for more advanced user workflows.

LangSmith Sandboxes are helping us make our Sidekick, our AI assistant, much more capable for monday.com users. With secure environments, Sidekick can write and run code, and use the results to create richer workflows, like running data analysis and generating multimedia.

- Omri Bruchim, AI Platform Group Manager, monday.com

What's Coming Next

  • Local-to-cloud agents. Develop an agent against a sandbox on your laptop, then promote the same agent to a cloud-hosted sandbox with no code changes.
  • Shared volumes so agents can collaborate. Agent 1 writes to a volume, then Agent 2 picks up where it left off.
  • Volume Mounts. Mount your own blob storage or git repository for instant access on startup.
  • Full execution tracing of every process and network call inside the VM, doubling as an audit log.

Join our Slack community to share what matters most for your workflows.

Get Started

You can start using LangSmith Sandboxes with one line of code with your existing SDK and API key.

Try LangSmith Sandboxes or read the docs.

Related content

See what your agent is really doing

LangSmith, our agent engineering platform, helps developers debug every agent decision, eval changes, and deploy in one click.