fix(reborn): mount per-task workspace at /workspace + use scoped paths#97
Closed
pranavraja99 wants to merge 1 commit into
Closed
fix(reborn): mount per-task workspace at /workspace + use scoped paths#97pranavraja99 wants to merge 1 commit into
pranavraja99 wants to merge 1 commit into
Conversation
Reborn pinchbench/clawbench tasks that use coding tools (write_file,
read_file, list_dir, shell) were failing with `dispatch_failure_kind=
InputEncode` → `HostUnavailable{Capability}` → terminated turn (empty
response, 0 score). 15/26 pinchbench tasks were affected; reproduced on
both DeepSeek-V4-Flash and Qwen3.5-122B.
Root cause (bench-side, not ironclaw): `resolve_workspace_placeholder`
resolved `{{WORKSPACE}}` to the absolute host dir (ws_base/task_id) and
told the model to use absolute paths. Reborn's coding tools scope paths to
the `/workspace` mount and reject absolute host paths
(coding/paths.rs::resolve_path -> InputEncode) — even under reborn-yolo,
tools do NOT take raw host paths. Additionally the reborn `/workspace`
mount defaulted to `{local_dev_root}/workspace`, which the bench never
seeds or scores, so even valid scoped writes landed where the grader
couldn't see them.
Fix:
- run_reborn_conversation gains a `workspace_root: Option<&Path>` and pumps
it into `RebornBuildInput::with_local_dev_workspace_root`, mounting the
per-task workspace dir (ws_base/task_id — the exact dir the suite seeds
inputs into and the scorer reads) at `/workspace`.
- run_reborn_task resolves `{{WORKSPACE}}` -> `/workspace` (scoped) instead
of the absolute host dir; the legacy host-fs path keeps the absolute form.
Verified: task_03_blog dispatches with 0 InputEncode and writes
blog_post.md into the scorer's workspaces/task_03_blog dir (previously
orphaned in reborn-task_03_blog/workspace). Confirmed against current
ironclaw main — no ironclaw change needed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pranavraja99
added a commit
that referenced
this pull request
Jun 15, 2026
…as: 0/26) Folds in the workspace fix from #97 (a19f428), which is the root cause of reborn pinchbench scoring 0/26: reborn coding tools scope paths to the /workspace mount and reject absolute host paths, and the mount defaulted to {local_dev_root}/workspace — a dir the bench never seeds or scores. So writes failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't read, grade() errored, and score_task_result kept the 'pending' placeholder. - run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via with_local_dev_workspace_root so /workspace == the per-task dir the suite seeds + the scorer reads (ws_base/task_id). - run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs path keeps the absolute form. Verified against ironclaw main: pinchbench task_01_calendar -> 0.833, task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before. Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Superseded by #108 — the workspace-mount fix (a19f428) is folded into #108 on top of the ironclaw-main repin + API sync, since they overlap in |
pranavraja99
added a commit
that referenced
this pull request
Jun 15, 2026
… stranded after early merges (#110) * fix(reborn): mount per-task workspace at /workspace + scoped paths (was: 0/26) Folds in the workspace fix from #97 (a19f428), which is the root cause of reborn pinchbench scoring 0/26: reborn coding tools scope paths to the /workspace mount and reject absolute host paths, and the mount defaulted to {local_dev_root}/workspace — a dir the bench never seeds or scores. So writes failed (HostUnavailable/dispatch_failure) or landed where the grader couldn't read, grade() errored, and score_task_result kept the 'pending' placeholder. - run_reborn_conversation gains workspace_root: Option<&Path>, mounting it via with_local_dev_workspace_root so /workspace == the per-task dir the suite seeds + the scorer reads (ws_base/task_id). - run_reborn_task resolves {{WORKSPACE}} -> /workspace (scoped); legacy host-fs path keeps the absolute form. Verified against ironclaw main: pinchbench task_01_calendar -> 0.833, task_03_blog -> 0.92 (real graded breakdowns), vs 0/'pending' before. Supersedes #97 (carries its fix on top of the ironclaw-main repin + API sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(bench): no-baseline comment shows score + links the viewer The baseline-less result comment only printed [download results] — no viewer link and no score, so a fully-scored run (synced to the viewer) looked like a dead end, and a 0.0 run gave no signal. The run IS in the viewer regardless of baseline; surface the headline pass%/avg-score from run.json + a [browse run + per-task trajectories] link (same VIEWER_BASE format-pr-comment.sh uses). Repro that motivated this: pinchbench (no baseline) on PR #4841. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Reborn pinchbench/clawbench tasks that use coding tools (
write_file,read_file,list_dir,shell) failed withdispatch_failure_kind=InputEncode→HostUnavailable{Capability}→ terminated turn (empty response, 0 score). 15/26 pinchbench tasks were affected; reproduced on both DeepSeek-V4-Flash and Qwen3.5-122B.This was initially misattributed to an ironclaw
mainregression (approval gates). It is not — ironclaw is innocent; the bug is entirely bench-side. Proven: on the same new-main ironclaw, an absolute workspace path → 3InputEncode; a scoped/workspacepath → 0, dispatch completes.Root cause
resolve_workspace_placeholderresolved{{WORKSPACE}}to the absolute host dir (ws_base/task_id) and the prompt told the model to use absolute paths. But reborn's coding tools scope paths to the/workspacemount and reject absolute host paths (coding/paths.rs::resolve_path→input_error()=InputEncode) — even under reborn-yolo, tools do not take raw host paths. (The old-fork passing run had{{WORKSPACE}}unresolved, so the model used/workspace/...and it worked.)Second, the reborn
/workspacemount defaulted to{local_dev_root}/workspace, which the bench never seeds or scores — so even valid scoped writes landed where the grader couldn't see them, and seeded inputs were invisible to the agent.Fix
run_reborn_conversationgainsworkspace_root: Option<&Path>and threads it intoRebornBuildInput::with_local_dev_workspace_root, mounting the per-task workspace dir (ws_base/task_id— the exact dir the suite seeds inputs into and the scorer reads from) at/workspace.run_reborn_taskresolves{{WORKSPACE}}→/workspace(scoped) instead of the absolute host dir. The legacy host-fs path keeps the absolute form (its tools take raw host paths).rebornCLI command passesNone(unchanged behavior).Verification
task_03_blog(reborn, against current ironclaw main):InputEncode(was 3) — coding tools dispatch cleanly.blog_post.mdnow lands inworkspaces/task_03_blog/(the scorer's dir) — previously orphaned inreborn-task_03_blog/workspace/.cargo checkpasses against the pinned ironclaw branch. No ironclaw change required.🤖 Generated with Claude Code