Skip to content

Update benchmark behavior#104

Merged
HeyGarrison merged 8 commits intomasterfrom
update-benchmark-runner
May 1, 2026
Merged

Update benchmark behavior#104
HeyGarrison merged 8 commits intomasterfrom
update-benchmark-runner

Conversation

@HeyGarrison
Copy link
Copy Markdown
Contributor

Summary

  • Minor benchmark reliability updates.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Sandbox Benchmark Results

Sequential

# Provider Score Median TTI P95 P99 Status
1 declaw 98.9 0.05s 0.21s 0.21s 10/10
2 daytona 97.8 0.19s 0.27s 0.27s 10/10
3 upstash 95.1 0.43s 0.58s 0.58s 10/10
4 archil 94.4 0.45s 0.72s 0.72s 10/10
5 blaxel 94.4 0.53s 0.60s 0.60s 10/10
6 e2b 93.9 0.52s 0.74s 0.74s 10/10
7 vercel 92.7 0.60s 0.93s 0.93s 10/10
8 hopx 83.5 1.54s 1.80s 1.80s 10/10
9 namespace 82.0 1.71s 1.94s 1.94s 10/10
10 modal 78.3 1.89s 2.58s 2.58s 10/10
11 cloudflare 73.7 2.13s 3.39s 3.39s 10/10
12 runloop 73.0 2.04s 3.69s 3.69s 10/10
13 codesandbox 72.7 2.49s 3.08s 3.08s 10/10

Staggered

# Provider Score Median TTI P95 P99 Status
1 declaw 99.5 0.05s 0.06s 0.06s 10/10
2 daytona 96.6 0.17s 0.58s 0.58s 10/10
3 archil 96.3 0.30s 0.47s 0.47s 10/10
4 upstash 95.8 0.41s 0.43s 0.43s 10/10
5 blaxel 94.6 0.52s 0.57s 0.57s 10/10
6 e2b 94.2 0.50s 0.70s 0.70s 10/10
7 vercel 91.1 0.59s 1.33s 1.33s 10/10
8 hopx 82.5 1.42s 2.26s 2.26s 10/10
9 namespace 80.5 1.83s 2.12s 2.12s 10/10
10 modal 78.5 1.89s 2.54s 2.54s 10/10
11 codesandbox 75.2 2.30s 2.75s 2.75s 10/10
12 runloop 73.8 1.99s 3.57s 3.57s 10/10
13 cloudflare 73.7 2.28s 3.16s 3.16s 10/10

Burst

# Provider Score Median TTI P95 P99 Status
1 declaw 98.4 0.07s 0.30s 0.30s 10/10
2 daytona 97.4 0.21s 0.35s 0.35s 10/10
3 archil 95.3 0.45s 0.50s 0.50s 10/10
4 upstash 94.8 0.44s 0.65s 0.65s 10/10
5 e2b 94.2 0.48s 0.72s 0.72s 10/10
6 blaxel 94.1 0.55s 0.66s 0.66s 10/10
7 vercel 93.3 0.58s 0.80s 0.80s 10/10
8 hopx 79.9 1.99s 2.03s 2.03s 10/10
9 modal 79.8 1.87s 2.25s 2.25s 10/10
10 namespace 79.6 1.96s 2.16s 2.16s 10/10
11 runloop 75.6 2.04s 3.04s 3.04s 10/10
12 cloudflare 68.5 2.22s 4.53s 4.53s 10/10
13 codesandbox 66.9 3.07s 3.66s 3.66s 10/10

View full run · SVGs available as build artifacts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve benchmark reliability by detecting potential sandbox/container reuse across iterations and concurrent launches.

Changes:

  • Adds a per-run “reuse detector” (nonce + remembered identity signals) for sequential, concurrent, and staggered benchmark modes.
  • Introduces an in-sandbox identity probe that writes marker files and collects runtime/namespace signals to flag suspected reuse.
  • Threads the reuse detector through runIteration so different benchmark runners share a single detector per run.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/sandbox/staggered.ts Creates a per-run reuse detector and passes it into runIteration for staggered launches.
src/sandbox/concurrent.ts Creates a per-run reuse detector and passes it into runIteration for concurrent launches.
src/sandbox/benchmark.ts Implements reuse detection: run nonce + identity probe parsing + heuristics, and extends runIteration to accept the detector.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/sandbox/benchmark.ts Outdated
Comment thread src/sandbox/benchmark.ts
Comment thread src/sandbox/benchmark.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Browser Benchmark Results

# Provider Score Create Connect Navigate Release Total Status
1 Browserbase 94.2 0.22s 0.11s 0.13s 0.14s 0.63s 10/10
2 Hyperbrowser 93.8 0.14s 0.18s 0.10s 0.18s 0.61s 10/10
3 Kernel 91.3 0.08s 0.47s 0.15s 0.07s 0.80s 10/10

View full run · SVG available as build artifact

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Storage Benchmark Results

1MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 94.8 0.11s 76.0 Mbps 0.19s 1000/1000
2 AWS S3 94.7 0.15s 55.6 Mbps 0.06s 1000/1000
3 Tigris 93.7 0.44s 19.0 Mbps 0.21s 1000/1000

4MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 95.0 0.17s 194.3 Mbps 0.33s 1000/1000
2 AWS S3 92.7 0.81s 41.3 Mbps 0.24s 1000/1000
3 Tigris 91.6 1.07s 31.4 Mbps 0.44s 1000/1000

10MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 94.3 0.33s 251.5 Mbps 0.73s 1000/1000
2 AWS S3 91.1 1.08s 77.3 Mbps 0.44s 1000/1000
3 Tigris 84.6 3.64s 23.1 Mbps 1.02s 1000/1000

16MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 94.0 0.49s 272.2 Mbps 0.79s 1000/1000
2 AWS S3 85.7 3.35s 40.1 Mbps 0.50s 1000/1000
3 Tigris 81.4 4.59s 29.2 Mbps 1.06s 1000/1000

View full run · SVGs available as build artifacts

@HeyGarrison HeyGarrison merged commit 11eb505 into master May 1, 2026
41 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants