Skip to content

feat(providers): add @langchain/wasmsh — in-process shell + Python sandbox#427

Open
Johann-Peter Hartmann (johannhartmann) wants to merge 11 commits into
langchain-ai:mainfrom
mayflower:feat/wasmsh-sandbox
Open

feat(providers): add @langchain/wasmsh — in-process shell + Python sandbox#427
Johann-Peter Hartmann (johannhartmann) wants to merge 11 commits into
langchain-ai:mainfrom
mayflower:feat/wasmsh-sandbox

Conversation

@johannhartmann

Copy link
Copy Markdown

Summary

Adds @langchain/wasmsh, a sandbox provider that runs a full Bash-compatible
shell (88 utilities including grep, sed, awk, jq, curl) and Python 3.13 with
pip — entirely in-process via WebAssembly. No containers, no cloud services,
no API keys.

  • Node.js: WasmshSandbox.createNode() — spawns a local host process
  • Browser: WasmshSandbox.createBrowserWorker() — runs in a Web Worker

Backed by wasmsh and
Pyodide, the shell and Python share a virtual
filesystem. Agents get execute, read_file, write_file, edit_file,
ls, grep, glob — same tools as remote sandboxes, zero infrastructure.

What's included

Commit Change
feat: expose filesystemOptions in createDeepAgent Allows tuning token eviction thresholds per agent
feat: add @langchain/wasmsh sandbox provider Main provider package with 32 unit + 97 integration tests
feat: add browser build and LLM agent integration tests Browser entry for deepagents core, Playwright e2e, agent tests
fix: browser subagent state handling Use runtime.state instead of getCurrentTaskInput() in browser environments
chore: add changeset Changeset for release

Why not just use containers?

Containers are the right choice for untrusted code or system-level operations.
Wasmsh is for the common case where agents need a filesystem, a shell, and
Python — and you don't want to spin up infrastructure for it. Tests run in
<1s, CI needs no secrets, and it works in the browser.

Test plan

  • 32 unit tests (mocked session, all sandbox operations)
  • 97 integration tests (standard test suite, LLM agent tests)
  • 821 core tests pass (no regressions, typecheck clean)
  • 125 node-vfs tests pass (no regressions to other providers)
  • Playwright browser e2e tests
  • pnpm format:check clean
  • pnpm lint clean (0 errors)
  • pnpm build succeeds

@changeset-bot

changeset-bot Bot commented Apr 5, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 3409442

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
deepagents Minor
@langchain/wasmsh Minor
deepagents-acp Patch
@deepagents/evals Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@johannhartmann

Copy link
Copy Markdown
Author

See example under https://deepagents.data.mayflower.tech/, a demo of an in-browser running deepagent, sandbox and LLM - source in https://github.com/mayflower/langchainjs-mediapipe-agent-demo.

@pkg-pr-new

pkg-pr-new Bot commented Apr 6, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/deepagents-acp@427
npm i https://pkg.pr.new/deepagents@427
npm i https://pkg.pr.new/@langchain/sandbox-standard-tests@427

commit: cbe8fa5

WasmshSandbox wraps the wasmsh-pyodide npm package to provide a
WASM-based sandbox backend for DeepAgents. Runs bash (88 utilities) +
Python 3.13 with micropip in-process — no containers, no server.

Supports Node.js (child process) and browser (Web Worker) modes.
Includes unit tests (97/97 pass), integration tests, and examples.
- Add browser entry points for deepagents core (index.browser.ts)
- Add Playwright E2E tests running a DeepAgent with wasmsh in-browser
- Add Node.js agent integration tests exercising file ops, Python,
  shell scripting, and multi-step pipelines
@christian-bromann

Christian Bromann (christian-bromann) commented Apr 7, 2026

Copy link
Copy Markdown
Member

Johann-Peter Hartmann (@johannhartmann) this is rad! 🤯 Let me discuss with the team and get back to you!

Brings the @langchain/wasmsh provider to parity with @langchain/quickjs by
adding an interpreter middleware on top of the existing sandbox surface.

# New exports

* `createWasmshInterpreterMiddleware(options)` — exposes the sandbox as a
  single `py_eval` agent tool. Variables, imports, and defined functions
  persist across calls within the same session (via the sandbox's
  globals pickle, transparently). Top-level `await` works.
* `WasmshFilesystemBackend(sandbox, { namespace })` — adapts a
  WasmshSandbox as a deepagents `BackendProtocolV2` memory backend with
  optional namespace prefixing. Composable as a sub-backend in
  `CompositeBackend`. Mirrors `WasmshFilesystemBackend` from the Python
  adapter.
* `scanSkillReferences`, `loadSkill`, `installPendingSkills` — Python
  skills loading. Scans user code for `import skills.<name>` and stages
  the matching skill directory from a `BackendProtocol` into the sandbox
  VFS under `/skills/<package_name>/`. Auto-generates `__init__.py`.
* `WasmshSandbox.runPtc(code, tools, onHostCall)` — passthrough to the
  underlying npm session's runPtc method (requires
  @mayflowergmbh/wasmsh-pyodide ≥ 0.6.4; surfaces a clear error against
  older builds via duck-type check).

# Programmatic tool calling

Selected agent tools can be exposed inside the sandbox as
`tools.<snake_name>` awaitables. The model can fan out via
`asyncio.gather`, branch on results, and chain dependent calls — all
within one `py_eval` invocation. PTC calls bypass the regular ToolNode
path, so `interrupt_on` approval hooks are not enforced; treat the
allowlist as the permission boundary.

PTC config shapes mirror the QuickJS middleware:
* `false` (default) — disabled.
* `true` — every agent tool except the default vfs helpers.
* `string[]` — explicit allowlist.
* `{ include: string[] }` / `{ exclude: string[] }` — include/exclude shapes.

# Tests

* 16 new unit tests in `middleware.test.ts` cover: tool registration,
  custom tool name, sandbox round-trip via runPtc, error envelope
  formatting, skills scanner, snake-case conversion, Python identifier
  validation, envelope formatting (incl. truncation + error blocks).
* Existing sandbox tests (30) continue to pass.

# Notes

Bumps the `@mayflowergmbh/wasmsh-pyodide` dependency to `^0.6.4`,
which adds the runPtc client method needed for the PTC bridge.

@corridor-security corridor-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issues

  • Path Traversal within WasmshFilesystemBackend namespace
    The WasmshFilesystemBackend's #scope() method (filesystem-backend.ts:59-65) concatenates the configured namespace prefix with the caller-supplied path using simple string concatenation — e.g. ${this.#namespace}${abs} — without normalizing .. segments or verifying that the resulting path stays beneath the namespace root. Because the downstream WasmshSandbox (backed by Pyodide's POSIX VFS via @mayflowergmbh/wasmsh-pyodide) does resolve .. segments at the filesystem level, a traversal payload is fully effective inside the sandbox.

The path flows from user input through the LangChain tool layer with no sanitization: the read_file, write_file, and edit_file tools registered in libs/deepagents/src/middleware/fs.ts accept any file_path string (z.string(), no pattern constraint) and pass it verbatim to resolvedBackend.read/write/edit(), which then calls this.#scope(filePath) and forwards the result to the sandbox. If a WasmshFilesystemBackend is used as a namespaced sub-backend — the explicit design goal of the class — an LLM agent or any caller with tool-call access can pass ../../skills/secret.py to escape from /memories into /skills (or any other directory in the shared VFS), enabling unauthorized cross-namespace reads and writes.

Recommendations

  • Normalize and enforce containment in #scope: resolve the joined path with path.posix.resolve(namespace, userPath) and throw (or return an error) if the result does not start with the namespace prefix.
  • Apply the same containment check in #unscope to avoid leaking non-namespaced paths back to callers.
  • Consider adding a .includes("../") / .includes("..") early-exit guard as a defense-in-depth measure before the resolve step.

const abs = path.startsWith("/") ? path : `/${path}`;
if (abs === "/") return this.#namespace || "/";
return `${this.#namespace}${abs}`;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace scoping concatenates the namespace and the user-supplied path without validating for directory traversal ("..") or enforcing that the resolved path stays under the namespace. This allows escaping the intended namespace in the wasm VFS and accessing other directories within the same sandbox.

Vulnerable code:

#scope(path: string | null | undefined): string {
  if (path == null) return this.#namespace || "/";
  if (!this.#namespace) return path;
  const abs = path.startsWith("/") ? path : `/${path}`;
  if (abs === "/") return this.#namespace || "/";
  return `${this.#namespace}${abs}`;
}

The #scope() method at filesystem-backend.ts:59-65 performs simple string concatenation — return \${this.#namespace}${abs}`— without normalizing.. segments or verifying the resulting path stays within the namespace. The Pyodide POSIX VFS (@mayflowergmbh/wasmsh-pyodide) that backs WasmshSandboxdoes resolve..` at the filesystem layer, so traversal payloads are fully effective inside the sandbox.

The attack is directly reachable: the read_file, write_file, and edit_file LangChain tools registered in libs/deepagents/src/middleware/fs.ts accept file_path as a plain z.string() with no pattern constraint and pass the value verbatim to resolvedBackend.read/write/edit(), which calls #scope(). An LLM agent (or an adversarial prompt injected into its context) can pass "../../skills/secret.py" to escape from /memories into /skills, enabling cross-namespace reads and writes.

Remediation: Normalize and enforce containment in #scope. Resolve the final path and reject it if it falls outside the namespace boundary:

import path from 'path'; // posix in browser builds

#scope(p?: string | null): string {
  const ns = this.#namespace || '/';
  if (p == null || p === '/') return ns;
  const rel = p.startsWith('/') ? p.slice(1) : p;
  const full = path.posix.resolve(ns, rel);
  const boundary = ns.endsWith('/') ? ns : ns + '/';
  if (full !== ns && !full.startsWith(boundary)) {
    throw new Error('invalid_path: traversal outside namespace');
  }
  return full;
}

Also harden #unscope so it only strips the namespace prefix when the path actually starts with it.

Attack Path
  1. An LLM agent — or an adversarial prompt injection — issues a read_file tool call with file_path = "../../skills/secret.py".

  2. libs/deepagents/src/middleware/fs.ts:586 passes the value verbatim to resolvedBackend.read("../../skills/secret.py", offset, limit) with no sanitization.

  3. resolvedBackend is a WasmshFilesystemBackend configured with namespace "/memories". Its read() method (filesystem-backend.ts:91) calls this.#scope("../../skills/secret.py").

  4. #scope (filesystem-backend.ts:59-65) returns "/memories/../../skills/secret.py" — no traversal check performed.

  5. That path is forwarded to WasmshSandbox.read("/memories/../../skills/secret.py"), which builds a shell command with shellQuote (escapes shell-special characters but does not strip ..).

  6. The Pyodide WASM session resolves the .. components and returns the content of /skills/secret.py, escaping the /memories namespace.

  7. The same technique applies to write_file and edit_file, enabling cross-namespace writes.

For more details, see the finding in Corridor.

Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.

* Replace `instanceof Uint8Array` with the `typeof === "string"` /
  fallthrough pattern used in `internal.ts::toInitialFiles`.
* Replace `instanceof Error` in the skills loader's catch with a
  structural check on the `message` property.
* Drop `console.warn` in favour of a single `process.stderr.write`
  to surface broken skills without tripping `no-console`.
* Apply oxfmt across the new files.
…runPtc passthrough

The original 16 unit tests covered middleware shape, the scanner regex, and
the formatting helpers. This adds the previously-untested surface:

* `filesystem-backend.test.ts` (12 cases) — namespace prefix application
  across every protocol method, namespace normalisation (trailing slash,
  no leading slash, `/` → bare root), result-path unscope for ls/glob/grep/
  upload/download, and pass-through of error results.
* `skills.test.ts` (12 cases) — `loadSkill` covering synthesised vs.
  author-supplied `__init__.py`, kebab→snake renaming, invalid skill
  names, empty dir, missing entrypoint, download errors, no-module
  metadata; `installPendingSkills` covering scanner-driven staging,
  caching across calls, per-skill failure isolation, and the no-skill
  short-circuit.
* `ptc.test.ts` (12 cases) — every `ptc` config shape (false / true /
  array / `{include}` / `{exclude}`), self-tool exclusion, kebab→snake
  exposure, identifier validation, plus the `onHostCall` dispatcher
  pipeline: success path, UnknownToolError, isolated throw → error envelope.
* `sandbox-runPtc.test.ts` (3 cases) — the `WasmshSandbox.runPtc`
  passthrough forwards the right shape, the duck-check surfaces a clear
  error against older sessions, and a stopped sandbox rejects cleanly.

Total: 85 unit tests pass (up from 46).
Johann-Peter Hartmann (johannhartmann) added a commit to mayflower/wasmsh that referenced this pull request May 17, 2026
…mBackend

`WasmshFilesystemBackend.#scope` concatenated the configured namespace
with the caller-supplied path, then handed the result to the wasmsh
sandbox — whose Pyodide VFS resolves `..` segments at the filesystem
layer. An LLM-controlled `file_path` like `../../skills/secret.py`
would resolve to a different namespace (or root) once the sandbox saw
it, defeating the very isolation the `namespace=` knob promises.

The fix:

* Normalise the joined path with `posixpath.normpath` and assert the
  result still sits at-or-below the namespace prefix; reject otherwise.
* Apply the matching containment check on the inbound (`_unscope`)
  side so an upstream bug elsewhere can't leak non-namespaced paths
  into the caller's view.
* Anchor the prefix match with a trailing slash so a sibling whose
  name shares the namespace prefix (`/memstore` vs `/mem`) is rejected.
* Surface the rejection as `WasmshNamespaceEscapeError` — a subclass of
  `PermissionError` so existing error-handlers that map OS permission
  errors to `"permission_denied"` continue to do the right thing
  without an additional catch.

Adds 6 regression tests covering direct `..`, multi-segment payloads,
interior `..` landing outside, sibling-prefix attacks, allowed `./`
and interior `..` that stays inside, plus an upstream-leak check on
the unscope path.

Caught by corridor-security on langchain-ai/deepagentsjs#427.
`WasmshFilesystemBackend.#scope` concatenated the configured namespace
with the caller-supplied path, then handed the result to the wasmsh
sandbox — whose Pyodide VFS resolves `..` segments at the filesystem
layer. An LLM-controlled `file_path` like `../../skills/secret.py`
would resolve to a different namespace (or root) once the sandbox saw
it, defeating the very isolation the `namespace=` knob promises.

The fix:

* Normalise the joined path with `posix.resolve` and assert the result
  still sits at-or-below the namespace prefix; reject otherwise.
* Apply the matching containment check on the inbound (`#unscope`)
  side so an upstream bug elsewhere can't leak non-namespaced paths.
* Anchor the prefix match with a trailing slash so a sibling whose
  name shares the namespace prefix (`/memstore` vs `/mem`) is rejected.
* Surface the rejection as `WasmshNamespaceEscapeError`.

Adds 8 regression tests covering direct `..`, multi-segment payloads,
interior `..` landing outside, sibling-prefix attacks, allowed `./`
and interior `..` that stays inside, plus the upstream-leak check on
the unscope path and the no-namespace passthrough.

Reported by corridor-security on PR langchain-ai#427.
@johannhartmann

Copy link
Copy Markdown
Author

Thanks for the catch — this is a real escape. Fixed in 799d461 on this branch.

Fix:
#scope now resolves the joined path via posix.resolve and rejects if the result no longer sits at-or-below the namespace prefix. #unscope applies the matching containment check on inbound paths so an upstream bug elsewhere can't leak non-namespaced paths back to callers. The prefix match is anchored with a trailing slash so a sibling whose name shares the prefix (/memstore vs /mem) is also rejected — your recommendation list covered that case implicitly via posix.resolve, but I made it explicit so the guard reads top-to-bottom.

The rejection surfaces as WasmshNamespaceEscapeError (exported).

Tests added (8 cases in filesystem-backend.test.ts):

  • direct .. escape rejected on every protocol method (read/write/edit/ls/glob/grep/upload/download)
  • multi-segment ../../skills/x payload
  • interior /x/../../etc/passwd that resolves outside
  • sibling-prefix attack (/../memstore/x against /mem)
  • benign ./sub and interior .. that stays inside (still allowed)
  • upstream-leak guard on #unscope (simulated sandbox returns /etc/passwd)
  • no-namespace passthrough (containment doesn't apply when no namespace configured)

The same bug was in the Python adapter (langchain_wasmsh.WasmshFilesystemBackend) and is fixed identically in mayflower/wasmsh@29a6c2f, with the same 6 regression tests on the Python side.

…ytes, cover skills seam

Addresses findings from the comprehensive PR review on langchain-ai#427.

# Silent-failure fixes (high impact)

* `installPendingSkills` now re-throws `WasmshNamespaceEscapeError`
  instead of demoting it to a stderr log. A malicious skill metadata
  `path` containing `..` segments would otherwise have the namespace
  guard's signal swallowed by the best-effort skill-load catch.
* `asBytes` now validates its argument with a structural ArrayBufferView
  check and throws on unsupported shapes. The previous `return content
  as Uint8Array` cast would silently propagate `undefined byteLength`,
  poisoning the bundle-size cap and surfacing as a cryptic TypeError at
  upload time.
* `process.stderr.write` in the best-effort skill-load catch is now
  guarded against undefined `process.stderr`, so the path works in
  browser environments (the provider ships a browser build).

# Comment accuracy

* `middleware.ts` claimed to inject `timeoutMs` into the system prompt
  and described `afterAgent` as a careful "no-op by design" — neither
  was true. Replaced with accurate explanations of the actual behaviour.
* `types.ts` `timeoutMs` docstring updated to reflect that the option is
  accepted for API parity but not yet wired into the prompt or budget.

# Coverage gaps closed

* `middleware-skills.test.ts` (new, 2 cases) — wires
  `createWasmshInterpreterMiddleware({ skillsBackend })` with a mocked
  `getCurrentTaskInput` and verifies the skill is actually staged into
  the sandbox before the eval runs, plus the negative case (no
  references → no uploads). Closes the highest-priority test gap
  identified in the review: the middleware ↔ skills loader seam.
* `filesystem-backend.test.ts` — new test that exercises
  `#isContained`'s trailing-`/` anchor directly via an `#unscope` leak,
  not via the resolver. Closes the gap where the sibling-prefix
  rejection only worked because of `posix.resolve`, masking a
  hypothetical regression in the anchor check.
* `skills.test.ts` — two new cases: (a) `WasmshNamespaceEscapeError`
  thrown from inside `loadSkill` is re-thrown by `installPendingSkills`,
  not swallowed; (b) `asBytes` rejects non-binary backend content.

Total: 98 unit tests pass (up from 93). Typecheck, oxlint, oxfmt clean.
Three follow-ups from the PR review:

- Add `WasmshLogger` interface and thread it through the two catch sites
  that previously swallowed errors into stderr: PTC dispatch
  (`dispatchHostCall`) and best-effort skill loading
  (`installPendingSkills`). When a logger is configured it becomes
  authoritative; the stderr fallback only fires when no logger is wired.
  Logger contract documents that implementations must not throw, and the
  middleware swallows logger exceptions so observability bugs can't break
  the agent loop.
- Deterministic agent integration test driven by a scripted chat model
  that emits a prebuilt tool-call → final-answer sequence. Pins the
  full LLM → middleware → sandbox.runPtc → ToolMessage → next-turn shape
  without the LLM round-trip.
- Adapter-layer integration test for `WasmshSandbox.runPtc` against real
  Pyodide, covering plain eval, host_call round-trip, error envelopes,
  and globals persistence across calls. Gated on built Pyodide assets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants