Skip to content

fix(reborn): mount per-task workspace at /workspace + use scoped paths#97

Closed
pranavraja99 wants to merge 1 commit into
mainfrom
fix/reborn-workspace-scoped-path
Closed

fix(reborn): mount per-task workspace at /workspace + use scoped paths#97
pranavraja99 wants to merge 1 commit into
mainfrom
fix/reborn-workspace-scoped-path

Conversation

@pranavraja99

Copy link
Copy Markdown
Contributor

Problem

Reborn pinchbench/clawbench tasks that use coding tools (write_file, read_file, list_dir, shell) failed with dispatch_failure_kind=InputEncodeHostUnavailable{Capability} → terminated turn (empty response, 0 score). 15/26 pinchbench tasks were affected; reproduced on both DeepSeek-V4-Flash and Qwen3.5-122B.

This was initially misattributed to an ironclaw main regression (approval gates). It is not — ironclaw is innocent; the bug is entirely bench-side. Proven: on the same new-main ironclaw, an absolute workspace path → 3 InputEncode; a scoped /workspace path → 0, dispatch completes.

Root cause

resolve_workspace_placeholder resolved {{WORKSPACE}} to the absolute host dir (ws_base/task_id) and the prompt told the model to use absolute paths. But reborn's coding tools scope paths to the /workspace mount and reject absolute host paths (coding/paths.rs::resolve_pathinput_error() = InputEncode) — even under reborn-yolo, tools do not take raw host paths. (The old-fork passing run had {{WORKSPACE}} unresolved, so the model used /workspace/... and it worked.)

Second, the reborn /workspace mount defaulted to {local_dev_root}/workspace, which the bench never seeds or scores — so even valid scoped writes landed where the grader couldn't see them, and seeded inputs were invisible to the agent.

Fix

  • run_reborn_conversation gains workspace_root: Option<&Path> and threads it into RebornBuildInput::with_local_dev_workspace_root, mounting the per-task workspace dir (ws_base/task_id — the exact dir the suite seeds inputs into and the scorer reads from) at /workspace.
  • run_reborn_task resolves {{WORKSPACE}}/workspace (scoped) instead of the absolute host dir. The legacy host-fs path keeps the absolute form (its tools take raw host paths).
  • Standalone reborn CLI command passes None (unchanged behavior).

Verification

task_03_blog (reborn, against current ironclaw main):

  • 0 InputEncode (was 3) — coding tools dispatch cleanly.
  • blog_post.md now lands in workspaces/task_03_blog/ (the scorer's dir) — previously orphaned in reborn-task_03_blog/workspace/.

cargo check passes against the pinned ironclaw branch. No ironclaw change required.

🤖 Generated with Claude Code

Reborn pinchbench/clawbench tasks that use coding tools (write_file,
read_file, list_dir, shell) were failing with `dispatch_failure_kind=
InputEncode` → `HostUnavailable{Capability}` → terminated turn (empty
response, 0 score). 15/26 pinchbench tasks were affected; reproduced on
both DeepSeek-V4-Flash and Qwen3.5-122B.

Root cause (bench-side, not ironclaw): `resolve_workspace_placeholder`
resolved `{{WORKSPACE}}` to the absolute host dir (ws_base/task_id) and
told the model to use absolute paths. Reborn's coding tools scope paths to
the `/workspace` mount and reject absolute host paths
(coding/paths.rs::resolve_path -> InputEncode) — even under reborn-yolo,
tools do NOT take raw host paths. Additionally the reborn `/workspace`
mount defaulted to `{local_dev_root}/workspace`, which the bench never
seeds or scores, so even valid scoped writes landed where the grader
couldn't see them.

Fix:
- run_reborn_conversation gains a `workspace_root: Option<&Path>` and pumps
  it into `RebornBuildInput::with_local_dev_workspace_root`, mounting the
  per-task workspace dir (ws_base/task_id — the exact dir the suite seeds
  inputs into and the scorer reads) at `/workspace`.
- run_reborn_task resolves `{{WORKSPACE}}` -> `/workspace` (scoped) instead
  of the absolute host dir; the legacy host-fs path keeps the absolute form.

Verified: task_03_blog dispatches with 0 InputEncode and writes
blog_post.md into the scorer's workspaces/task_03_blog dir (previously
orphaned in reborn-task_03_blog/workspace). Confirmed against current
ironclaw main — no ironclaw change needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pranavraja99 added a commit that referenced this pull request Jun 15, 2026
…as: 0/26)

Folds in the workspace fix from #97 (a19f428), which is the root cause of
reborn pinchbench scoring 0/26: reborn coding tools scope paths to the
/workspace mount and reject absolute host paths, and the mount defaulted to
{local_dev_root}/workspace — a dir the bench never seeds or scores. So writes
failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't
read, grade() errored, and score_task_result kept the 'pending' placeholder.

- run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via
  with_local_dev_workspace_root so /workspace == the per-task dir the suite
  seeds + the scorer reads (ws_base/task_id).
- run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs
  path keeps the absolute form.

Verified against ironclaw main: pinchbench task_01_calendar -> 0.833,
task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before.
Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pranavraja99

Copy link
Copy Markdown
Contributor Author

Superseded by #108 — the workspace-mount fix (a19f428) is folded into #108 on top of the ironclaw-main repin + API sync, since they overlap in reborn_runner.rs/runner.rs and #97's old API (with_provider) no longer builds against ironclaw main. Verified there: pinchbench reborn now scores real values (task_01_calendar 0.833, task_03_blog 0.92) instead of 0/'pending'. Closing in favor of #108.

pranavraja99 added a commit that referenced this pull request Jun 15, 2026
… stranded after early merges (#110)

* fix(reborn): mount per-task workspace at /workspace + scoped paths (was: 0/26)

Folds in the workspace fix from #97 (a19f428), which is the root cause of
reborn pinchbench scoring 0/26: reborn coding tools scope paths to the
/workspace mount and reject absolute host paths, and the mount defaulted to
{local_dev_root}/workspace — a dir the bench never seeds or scores. So writes
failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't
read, grade() errored, and score_task_result kept the 'pending' placeholder.

- run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via
  with_local_dev_workspace_root so /workspace == the per-task dir the suite
  seeds + the scorer reads (ws_base/task_id).
- run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs
  path keeps the absolute form.

Verified against ironclaw main: pinchbench task_01_calendar -> 0.833,
task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before.
Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(bench): no-baseline comment shows score + links the viewer

The baseline-less result comment only printed [download results] — no viewer
link and no score, so a fully-scored run (synced to the viewer) looked like a
dead end, and a 0.0 run gave no signal. The run IS in the viewer regardless of
baseline; surface the headline pass%/avg-score from run.json + a
[browse run + per-task trajectories] link (same VIEWER_BASE format-pr-comment.sh
uses). Repro that motivated this: pinchbench (no baseline) on PR #4841.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant