fix(orchestrator): prefer running sandbox on host IP lookups by dobrac · Pull Request #2387 · e2b-dev/infra

dobrac · 2026-04-14T07:44:33Z

Fixes GetByHostPort to handle IP slot reuse during sandbox lifecycle transitions. When a stopping sandbox frees an IP slot that is immediately claimed by a new starting sandbox, both temporarily coexist in the map with the same IP. Previously, GetByHostPort returned the first match (which could be the stale stopping sandbox). Now it applies a priority:

Running → return immediately
Starting preferred over Stopping as fallback
Stopping only if nothing better exists

cursor · 2026-04-14T07:44:40Z

PR Summary

Medium Risk
Changes IP-based sandbox routing behavior, which could impact traffic/volume attribution if selection logic is wrong. Added tests reduce regression risk, but this code path is security-sensitive (cross-tenant routing).

Overview
Updates Map.GetByHostPort to prefer StatusRunning sandboxes for a matching host IP, with a fallback to the best non-running match (favoring StatusStarting over StatusStopping) when no running sandbox exists, and adds unit tests covering running-vs-stopping preference, starting-vs-stopping preference, and stopping-only fallback.

^{Reviewed by Cursor Bugbot for commit f4db401. Bugbot is set up for automated code reviews on this repo. Configure here.}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

…ver stopping in fallback - Switch tests to use require.NoError/require.Same to match repo conventions - In GetByHostPort fallback, prefer StatusStarting over StatusStopping so that when an IP slot is freed and reused, we route to the new sandbox - Add TestGetByHostPortPrefersStartingOverStopping test Co-Authored-By: jakub.dobry <jakub.dobry@e2b.dev>

Co-Authored-By: jakub.dobry <jakub.dobry@e2b.dev>

claude · 2026-04-15T06:45:57Z

+		if sbx.IsRunning() {
 			return sbx, nil
 		}
+
+		// Prefer a starting sandbox over a stopping one so that when an IP
+		// slot is freed by a stopping sandbox and immediately reused by a
+		// new starting sandbox, we route to the new sandbox.
+		if fallback == nil || SandboxStatus(sbx.status.Load()) == StatusStarting {
+			fallback = sbx
+		}


🔴 Two defects remain in the GetByHostPort fallback logic: (1) a TOCTOU race where the sandbox status is read twice via separate atomic loads—once inside IsRunning() and once in the fallback condition—allowing a concurrent MarkRunning() CAS to cause the function to return a stale Stopping sandbox instead of a newly-Running one, directly defeating the PR's core security goal; (2) a non-deterministic overwrite where the condition SandboxStatus(sbx.status.Load()) == StatusStarting unconditionally replaces any existing Starting fallback with another Starting sandbox (acknowledged but unaddressed in the PR checklist). Fix both by reading status once per iteration: status := SandboxStatus(sbx.status.Load()), using it for both the IsRunning check and the fallback condition, and guarding the fallback overwrite with status == StatusStarting && SandboxStatus(fallback.status.Load()) \!= StatusStarting.

Extended reasoning...

Bug 1 — TOCTOU double-read race (normal severity)

The fallback logic in GetByHostPort (map.go:105-114) makes two separate atomic loads of the same sandbox's status field. IsRunning() (sandbox.go:249) calls s.status.Load() once, and the fallback condition on line 112 calls sbx.status.Load() again. Because these are independent atomic operations with no lock between them, a concurrent MarkRunning() CAS (0→1) can execute between the two reads, making the two reads observe different values for the same field.

Step-by-step proof of the race:

Map iteration reaches sandbox X whose status is StatusStarting (0).

Thread A: sbx.IsRunning() → status.Load() → returns 0 (Starting) → false → does NOT early-return.

Thread B (concurrent): MarkRunning() CAS → 0→1; sandbox X is now StatusRunning (1).

Thread A continues: fallback condition evaluates SandboxStatus(sbx.status.Load()) → returns 1 (Running, NOT Starting).

Condition: fallback == nil || 1 == StatusStarting = fallback == nil || false.

Suppose a StatusStopping sandbox was already encountered in an earlier iteration, so fallback \!= nil: neither branch is true → fallback remains the Stopping sandbox.

No other Running sandbox is found → function returns the Stopping sandbox.

This directly defeats the PR's primary security goal: preventing cross-tenant NFS volume access and TCP traffic misattribution by preferring Running sandboxes. A Running sandbox exists on the IP but is silently bypassed.

Bug 2 — Non-deterministic StatusStarting overwrite (nit)

The fallback condition if fallback == nil || SandboxStatus(sbx.status.Load()) == StatusStarting is unconditional on the second branch: any StatusStarting candidate always overwrites the current fallback, even when the current fallback is already a StatusStarting sandbox. Because Go map iteration order is intentionally randomised, when two StatusStarting sandboxes share the same IP, the one encountered last (randomly) wins.

Step-by-step proof:
Suppose iteration order is [Starting-A, Starting-B] (or [Starting-B, Starting-A] — both orders possible):

Iteration 1: fallback is nil → condition fallback == nil is true → fallback = Starting-A.

Iteration 2: SandboxStatus(sbx.status.Load()) == StatusStarting is true → fallback = Starting-B (overwrites).
In the reversed order, Starting-A would win instead. The PR description explicitly acknowledges this case in the Review Checklist but does not fix it.

While two StatusStarting sandboxes on the same IP is rare (requires the async slot-return goroutine in sandbox.go:1264 to return the slot before the Remove callback fires, allowing a new sandbox to acquire the same slot), it is a reachable scenario as confirmed by code analysis of the cleanup/LIFO ordering.

Impact: Both bugs affect the security-sensitive code paths that route NFS proxy and TCP firewall traffic. The TOCTOU race can cause a Running sandbox to be silently bypassed in favor of a Stopping one; the non-deterministic overwrite can route traffic to the wrong tenant when two Starting sandboxes share an IP.

Fix: Read status once per iteration and use the cached value for both checks:

status := SandboxStatus(sbx.status.Load()) if status == StatusRunning { return sbx, nil } if fallback == nil || (status == StatusStarting && SandboxStatus(fallback.status.Load()) \!= StatusStarting) { fallback = sbx }

This eliminates the TOCTOU window and ensures a Starting sandbox is never overwritten by another Starting sandbox.

fix(orchestrator): prefer running sandbox on host IP lookups

ac071b1

dobrac added aardvark codex labels Apr 14, 2026 — with ChatGPT Codex Connector

dobrac added aardvark codex labels Apr 14, 2026

e2b-request-same-site-reviewers bot assigned ValentaTomas Apr 14, 2026

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Apr 14, 2026

View reviewed changes

devin-ai-integration bot and others added 2 commits April 14, 2026 22:32

fix(orchestrator): add t.Parallel() to satisfy paralleltest linter

f4db401

Co-Authored-By: jakub.dobry <jakub.dobry@e2b.dev>

dobrac marked this pull request as ready for review April 15, 2026 06:29

dobrac requested review from ValentaTomas and jakubno as code owners April 15, 2026 06:29

dobrac added the bug Something isn't working label Apr 15, 2026

dobrac closed this Apr 15, 2026

claude bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(orchestrator): prefer running sandbox on host IP lookups#2387

fix(orchestrator): prefer running sandbox on host IP lookups#2387
dobrac wants to merge 3 commits intomainfrom
codex/address-vulnerability-with-stopping-sandboxes

dobrac commented Apr 14, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

cursor bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

claude bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dobrac commented Apr 14, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

claude bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dobrac commented Apr 14, 2026 •

edited by devin-ai-integration bot

Loading

cursor bot commented Apr 14, 2026 •

edited

Loading