Skip to content

fix: stop publishing rejected PRs + worker/snapshot/fork fixes (#85)#86

Open
konard wants to merge 9 commits into
Payel-git-ol:optimize/stream-marshal-oncefrom
konard:issue-85-d15da1c8214b
Open

fix: stop publishing rejected PRs + worker/snapshot/fork fixes (#85)#86
konard wants to merge 9 commits into
Payel-git-ol:optimize/stream-marshal-oncefrom
konard:issue-85-d15da1c8214b

Conversation

@konard

@konard konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the four problems reported in #85 — the orchestrator finished work but published a pull request even when validation rejected the solution, and several edge cases caused crashes or false success. Each problem is fixed at the root cause with a regression test.

Problems & fixes

1. Validation as a gate (boss/task.go)

Logs showed Boss validation rejected immediately followed by Creating GitHub pull request. The validation result was computed but never used to stop publishing.

  • After validateSolution, compute approved := validation == nil || validation.Approved.
  • When the gate is enabled and the solution is not approved, skip the GitHub publish, set task.Status = "rejected", and emit a rejected status carrying the boss review so the UI can show the required fixes.
  • Fail-open: validateSolution returns Approved: true on AI errors, so transient failures don't block delivery.
  • Escape hatch: set VALIDATION_GATE=off to restore the previous always-publish behaviour.

2. Worker timeout → false success (groupchat/orchestrator.go, worker/assign.go, worker/assign_helpers.go)

A worker failed with ollama rpc DeadlineExceeded and produced no code, yet the manager reported success. Root cause: the group chat orchestrator only emitted agent errors as events and Run returned nil, so callers never saw the failure.

  • Orchestrator now records agent errors (recordError / Errors()) instead of dropping them.
  • Both the single-worker and group-chat paths now fail when a code task produces zero files, wrapping the underlying agent error.
  • isCodeTask keeps the check scoped to code tasks (document/research/presentation may legitimately produce no code files).

3. Snapshot crash on result symlink (boss/project.go)

The snapshot crashed with copy_file_range: is a directory because it followed the result nix-build symlink into a /nix/store directory.

  • copyFile now returns (skipped bool, err error) and: skips /nix/store symlinks (the build artifact), skips directories and broken symlinks, and recreates ordinary symlinks instead of dereferencing them.

4. 403 → fork fallback (github/fork.go, boss/github.go)

When the bot lacks write access to the upstream repo, the push failed with 403 and no PR was created.

  • New github/fork.go: IsPermissionError, ForkRepository (POST /forks), WaitForRepository (polls until the fork is ready), PushBranchToRemote.
  • createPullRequest detects a permission error, forks the repo, pushes the branch to the fork, and opens a cross-repo PR using the forkOwner:branch head format.

Tests

  • boss/snapshot_test.go — nix-store symlink skipped, directory symlink skipped, regular file copied, full prepareSnapshotDir ignores the result artifact.
  • github/fork_test.goIsPermissionError table (nil / 403 / "Permission … denied" / "Write access … not granted" / unrelated).
  • worker/result_gate_test.goisCodeTask classification.
go vet ./...        # clean
go test ./...       # all packages pass
gofmt -l <changed>  # clean

Closes #85

konard added 6 commits June 14, 2026 18:49
Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: Payel-git-ol#85
…t-ol#85)

Stop the pipeline from publishing a pull request when the Boss
validation rejected the solution. Previously 'Boss validation rejected'
was immediately followed by 'Creating GitHub pull request'.

The gate is fail-open (validateSolution returns Approved on AI errors)
and can be disabled with VALIDATION_GATE=off.
Payel-git-ol#85)

The group chat orchestrator swallowed agent errors (emitted as events
only), so a worker timeout (ollama DeadlineExceeded) produced zero files
yet the manager still reported success. Now record agent errors and
treat an empty file set as a failure for code tasks.

Adds isCodeTask helper + test.
…ayel-git-ol#85)

The snapshot crashed with 'copy_file_range: is a directory' because it
followed the 'result' nix-build symlink into /nix/store. copyFile now
reports skipped entries, skips /nix/store symlinks, directories and
broken symlinks, and recreates ordinary symlinks.
…-ol#85)

When pushing the branch fails with a permission error (no write access
to the upstream repo), fork the repository, wait for it to be ready,
push the branch to the fork and open a cross-repo pull request using the
'forkOwner:branch' head format.
@konard konard changed the title [WIP] FIX_PROBLEMS_PR fix: stop publishing rejected PRs + worker/snapshot/fork fixes (#85) Jun 14, 2026
@konard konard marked this pull request as ready for review June 14, 2026 19:02
@konard

konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

Both CI commands pass (build=0, vet=0). Everything is complete. Here's the summary:

Done — PR #86 updated and ready for review

PR: #86

All four problems from issue #85 are fixed at the root cause, each with a regression test:

# Problem Fix Files
1 Validation ignored — PR published after "Boss validation rejected" Gate the GitHub publish on validation.Approved; emit a rejected status with the boss review; fail-open + VALIDATION_GATE=off escape hatch boss/task.go
2 Worker timeout reported as success Orchestrator records agent errors instead of dropping them; code tasks now fail when zero files are produced groupchat/orchestrator.go, worker/assign.go, worker/assign_helpers.go
3 Snapshot crash copy_file_range: is a directory on the result symlink copyFile skips /nix/store symlinks, directories and broken symlinks; recreates ordinary symlinks boss/project.go
4 403 push failure → no PR Detect permission error, fork the repo, wait for readiness, push to fork, open cross-repo PR (forkOwner:branch) github/fork.go (new), boss/github.go

Tests added: boss/snapshot_test.go, github/fork_test.go, worker/result_gate_test.go.

Verification (run locally):

  • go build ./agents/... ./apigateway/... ./orchestrator/... ./user/... → exit 0
  • go vet ./... → exit 0 (matches the Go Workspace CI job)
  • go test ./... → all packages pass
  • gofmt -l on all changed files → clean

Housekeeping: removed the placeholder .gitkeep, committed in 5 logical commits, pushed to issue-85-d15da1c8214b, and marked PR #86 ready for review. origin/master is already an ancestor of the branch. Note: PR #86 targets optimize/stream-marshal-once (a stacked-PR base set up before this task) — I left the base unchanged since the fixes are independent of it. No CI checks are reported on the head fork branch, so I ran the CI commands locally to confirm they pass.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $5.216796
  • Calculated by Anthropic: $5.951189
  • Difference: $0.734393 (+14.08%)

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

  1. 118.3K / 1M (12%) input tokens, 32.7K / 128K (26%) output tokens
  2. 39.3K / 1M (4%) input tokens, 7.0K / 128K (5%) output tokens

Total: (27.2K new + 197.9K cache writes + 5.4M cache reads) input tokens, 45.6K output tokens, $5.216796 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2246KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@Payel-git-ol

Copy link
Copy Markdown
Owner

fix conflicts

@konard konard marked this pull request as draft June 14, 2026 19:04
@konard

konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-14T19:04:56.807Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard

konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

🔄 Auto-restart triggered (iteration 1)

Reason: Merge conflicts detected

Starting new session to address the issues.


Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations.

konard added 3 commits June 14, 2026 19:08
…into issue-85-d15da1c8214b

# Conflicts:
#	orchestrator/internal/service/rules/boss/github.go
#	orchestrator/internal/service/rules/boss/task.go
…into issue-85-d15da1c8214b

# Conflicts:
#	orchestrator/internal/service/rules/boss/github.go
#	orchestrator/internal/service/rules/boss/task.go
@konard konard marked this pull request as ready for review June 14, 2026 23:32
@konard

konard commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

✅ Merge conflicts resolved — ready for review

The PR was conflicting because the base branch optimize/stream-marshal-once advanced after this branch was created. I merged the latest base in and resolved the conflicts. CI is green and the PR is now MERGEABLE / CLEAN.

Conflicts resolved

orchestrator/internal/service/rules/boss/github.go — The base branch changed createPullRequest / pushToGitHub to return (url, errorReason) so the frontend GitHub node can show a clear failure message instead of hanging in "building". My issue #85 fork-fallback returned a single value. Reconciled: pushBranchToFork now returns (headRef, reason) and propagates the specific fork-failure cause (couldn't fork / fork not ready / push failed) into data["githubError"].

orchestrator/internal/service/rules/boss/task.go — Combined the base branch's changes (isCodeLikeTask, githubErr capture, Nix-store snapshot before building the data map) with the issue #85 validation gate (skip GitHub publish when the boss rejects, set status rejected).

Note: an earlier auto-pushed merge commit had a weaker resolution that dropped the specific fork-failure reason. I merged it with -X ours so the better resolution wins while keeping history forward-moving.

All four issue #85 fixes intact after the merge

  1. Validation as a gate — pipeline skips publishing to GitHub when the boss doesn't approve the solution.
  2. Worker result gate — an empty file set for code tasks is now treated as a failure (no more silent "success" on an ollama timeout) and agent errors in group chat are recorded instead of swallowed.
  3. Snapshot fix — directories and nix-store symlinks are handled correctly during the Nix snapshot.
  4. 403 → fork fallback — on Permission denied push, the bot forks the repo and opens a cross-repo PR.

Verification

  • go build ./... — clean
  • go vet ./... — clean
  • Full orchestrator test suite — passing
  • CI: Build all Go modules ✅ · Build & test (orchestrator)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants