fix(reborn): mount per-task workspace at /workspace + use scoped paths by pranavraja99 · Pull Request #97 · nearai/benchmarks

pranavraja99 · 2026-06-12T05:05:44Z

Problem

Reborn pinchbench/clawbench tasks that use coding tools (write_file, read_file, list_dir, shell) failed with dispatch_failure_kind=InputEncode → HostUnavailable{Capability} → terminated turn (empty response, 0 score). 15/26 pinchbench tasks were affected; reproduced on both DeepSeek-V4-Flash and Qwen3.5-122B.

This was initially misattributed to an ironclaw main regression (approval gates). It is not — ironclaw is innocent; the bug is entirely bench-side. Proven: on the same new-main ironclaw, an absolute workspace path → 3 InputEncode; a scoped /workspace path → 0, dispatch completes.

Root cause

resolve_workspace_placeholder resolved {{WORKSPACE}} to the absolute host dir (ws_base/task_id) and the prompt told the model to use absolute paths. But reborn's coding tools scope paths to the /workspace mount and reject absolute host paths (coding/paths.rs::resolve_path → input_error() = InputEncode) — even under reborn-yolo, tools do not take raw host paths. (The old-fork passing run had {{WORKSPACE}} unresolved, so the model used /workspace/... and it worked.)

Second, the reborn /workspace mount defaulted to {local_dev_root}/workspace, which the bench never seeds or scores — so even valid scoped writes landed where the grader couldn't see them, and seeded inputs were invisible to the agent.

Fix

run_reborn_conversation gains workspace_root: Option<&Path> and threads it into RebornBuildInput::with_local_dev_workspace_root, mounting the per-task workspace dir (ws_base/task_id — the exact dir the suite seeds inputs into and the scorer reads from) at /workspace.
run_reborn_task resolves {{WORKSPACE}} → /workspace (scoped) instead of the absolute host dir. The legacy host-fs path keeps the absolute form (its tools take raw host paths).
Standalone reborn CLI command passes None (unchanged behavior).

Verification

task_03_blog (reborn, against current ironclaw main):

0 InputEncode (was 3) — coding tools dispatch cleanly.
blog_post.md now lands in workspaces/task_03_blog/ (the scorer's dir) — previously orphaned in reborn-task_03_blog/workspace/.

cargo check passes against the pinned ironclaw branch. No ironclaw change required.

🤖 Generated with Claude Code

Reborn pinchbench/clawbench tasks that use coding tools (write_file, read_file, list_dir, shell) were failing with `dispatch_failure_kind= InputEncode` → `HostUnavailable{Capability}` → terminated turn (empty response, 0 score). 15/26 pinchbench tasks were affected; reproduced on both DeepSeek-V4-Flash and Qwen3.5-122B. Root cause (bench-side, not ironclaw): `resolve_workspace_placeholder` resolved `{{WORKSPACE}}` to the absolute host dir (ws_base/task_id) and told the model to use absolute paths. Reborn's coding tools scope paths to the `/workspace` mount and reject absolute host paths (coding/paths.rs::resolve_path -> InputEncode) — even under reborn-yolo, tools do NOT take raw host paths. Additionally the reborn `/workspace` mount defaulted to `{local_dev_root}/workspace`, which the bench never seeds or scores, so even valid scoped writes landed where the grader couldn't see them. Fix: - run_reborn_conversation gains a `workspace_root: Option<&Path>` and pumps it into `RebornBuildInput::with_local_dev_workspace_root`, mounting the per-task workspace dir (ws_base/task_id — the exact dir the suite seeds inputs into and the scorer reads) at `/workspace`. - run_reborn_task resolves `{{WORKSPACE}}` -> `/workspace` (scoped) instead of the absolute host dir; the legacy host-fs path keeps the absolute form. Verified: task_03_blog dispatches with 0 InputEncode and writes blog_post.md into the scorer's workspaces/task_03_blog dir (previously orphaned in reborn-task_03_blog/workspace). Confirmed against current ironclaw main — no ironclaw change needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…as: 0/26) Folds in the workspace fix from #97 (a19f428), which is the root cause of reborn pinchbench scoring 0/26: reborn coding tools scope paths to the /workspace mount and reject absolute host paths, and the mount defaulted to {local_dev_root}/workspace — a dir the bench never seeds or scores. So writes failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't read, grade() errored, and score_task_result kept the 'pending' placeholder. - run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via with_local_dev_workspace_root so /workspace == the per-task dir the suite seeds + the scorer reads (ws_base/task_id). - run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs path keeps the absolute form. Verified against ironclaw main: pinchbench task_01_calendar -> 0.833, task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before. Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pranavraja99 · 2026-06-15T21:51:41Z

Superseded by #108 — the workspace-mount fix (a19f428) is folded into #108 on top of the ironclaw-main repin + API sync, since they overlap in reborn_runner.rs/runner.rs and #97's old API (with_provider) no longer builds against ironclaw main. Verified there: pinchbench reborn now scores real values (task_01_calendar 0.833, task_03_blog 0.92) instead of 0/'pending'. Closing in favor of #108.

… stranded after early merges (#110) * fix(reborn): mount per-task workspace at /workspace + scoped paths (was: 0/26) Folds in the workspace fix from #97 (a19f428), which is the root cause of reborn pinchbench scoring 0/26: reborn coding tools scope paths to the /workspace mount and reject absolute host paths, and the mount defaulted to {local_dev_root}/workspace — a dir the bench never seeds or scores. So writes failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't read, grade() errored, and score_task_result kept the 'pending' placeholder. - run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via with_local_dev_workspace_root so /workspace == the per-task dir the suite seeds + the scorer reads (ws_base/task_id). - run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs path keeps the absolute form. Verified against ironclaw main: pinchbench task_01_calendar -> 0.833, task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before. Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(bench): no-baseline comment shows score + links the viewer The baseline-less result comment only printed [download results] — no viewer link and no score, so a fully-scored run (synced to the viewer) looked like a dead end, and a 0.0 run gave no signal. The run IS in the viewer regardless of baseline; surface the headline pass%/avg-score from run.json + a [browse run + per-task trajectories] link (same VIEWER_BASE format-pr-comment.sh uses). Repro that motivated this: pinchbench (no baseline) on PR #4841. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pranavraja99 closed this Jun 15, 2026

This was referenced Jun 15, 2026

deps: track ironclaw main; drop reborn-extra-capabilities; fix reborn workspace mount (0/26) #108

Merged

fix(reborn): restore workspace-mount (0/26) + no-baseline viewer link stranded after early merges #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reborn): mount per-task workspace at /workspace + use scoped paths#97

fix(reborn): mount per-task workspace at /workspace + use scoped paths#97
pranavraja99 wants to merge 1 commit into
mainfrom
fix/reborn-workspace-scoped-path

pranavraja99 commented Jun 12, 2026

Uh oh!

pranavraja99 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pranavraja99 commented Jun 12, 2026

Problem

Root cause

Fix

Verification

Uh oh!

pranavraja99 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant