fix(polecat): pre-spawn duplicate-branch check in mol-polecat-work#1
Open
fix(polecat): pre-spawn duplicate-branch check in mol-polecat-work#1
Conversation
…all#799) Pool-agent sessions resumed through the control-dispatcher path dropped --dangerously-skip-permissions, --settings, and other schema-default launch flags. The previous shouldPreserveStoredRuntimeCommand treated a stored command that exactly equalled the bare provider binary ("claude") as complete and preserved it, skipping BuildProviderLaunchCommand. The resulting claude --resume <uuid> ran without the unrestricted permission mode flag and wedged on interactive tool-call prompts. Rebuild when the stored command is only the bare binary while still preserving longer stored commands that already carry flags (so existing paths that persist fully-resolved command strings continue to win). The API path (internal/api/session_runtime.go) and the CLI worker-boundary path (cmd/gc/worker_handle.go) carry parallel copies of the helper — update both and add regression tests covering each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pool-managed and ephemeral cursor-agent sessions were timing out during create because the cursor profile had no ReadyPromptPrefix or ReadyDelayMs, so the reconciler could not detect when the TUI was ready to accept piped-in prompts. Named sessions worked because their prompt is delivered as an exec arg, bypassing readiness detection. cursor-agent 2026.04+ renders its composer input with a U+2192 (right arrow) prefix after roughly 5-10s of startup (workspace-trust gate + statsig + model load). Match the pattern used by the claude/codex/ gemini/copilot profiles: ReadyPromptPrefix: "\u2192 " ReadyDelayMs: 10000 Verified on macOS arm64 with cursor-agent 2026.04.17: ephemeral cursor workers now claim beads, execute mol-do-work, write the requested files, and run gc runtime drain-ack cleanly instead of cycling stopped -> creating -> stopped against a deadline_exceeded outcome. Made-with: Cursor
…nhall#1196) ## Summary - Change `dolt-gc-nudge` interval from **6h → 1h** - Change default `GC_DOLT_GC_THRESHOLD_BYTES` from **2 GiB → 0** (run unconditionally; knob preserved as optional escape hatch) - Update runbook + script comments with empirical evidence ## Why A beads perf harness run (2026-04) shows Dolt's auto-GC **does not fire under bd's fork-per-op CLI workload** even with `auto_gc_behavior.enable: true` and the managed `config.yaml` we ship. Unbounded commit-graph growth causes two prod symptoms: - Disk bloat — ~120 GB after a few days of agent work - Tail latency — at 143 MB of accumulated history: **p99 = 16.9s, max = 18.6s**; cost scales roughly linearly with history size, extrapolating to multi-minute ops at GB scale Manual `CALL DOLT_GC()` on a 148 MB store reclaims 43% in 4.6s, so a periodic nudge is both necessary and cheap. The pre-tune defaults (2 GiB threshold + 6h cooldown) allow bloat to climb well into the tail-latency danger zone between nudges. At 1h unconditional GC, the commit graph stays bounded and p99 stays low. ## Test plan - [x] `go test ./cmd/gc/ -run 'DoltGCNudge|Order'` passes locally (existing tests already set `GC_DOLT_GC_THRESHOLD_BYTES=0` explicitly, so no test changes needed) - [ ] CI green - [ ] Prod observation post-merge: `gc doctor` reports clean `dolt-noms-size` on cities that previously bloated 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atomic-materialize fix: atomically materialize system pack files
…nv-vars fix: persist supervisor provider env vars
The zombie scan flagged every dolt sql-server PID except the main city server as a zombie. Rig-local Dolt servers configured via dolt.port in config.yaml are legitimate — the scan now reads rig configs from the metadata cache and excludes PIDs listening on known rig ports. Closes gastownhall#1217 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When dolt-provider-state.json has a stale PID (dolt was restarted but state not yet refreshed) or is absent entirely (e.g. after a crash), publishManagedDoltRuntimeState now falls back to inspecting the actual running process via repairedManagedDoltRuntimeState, which probes the port holder and verifies process ownership. On success it repairs the provider state file atomically before writing dolt-state.json, so all subsequent readers see a consistent view. Previously the function returned an immediate error in both cases, causing gc doctor and gc beads health to fail permanently until gc start was run — even though dolt was running and healthy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r-disable managed dolt: disable upstream load-avg auto-GC scheduler (workaround dolt#10944)
…me-under-2m-complete Reduce cmd/gc test runtime under two minutes
Add a regression test for the builtin cursor provider so future cleanup cannot silently drop its readiness prefix or readiness delay and recreate the startup deadline_exceeded loop.
…astownhall#1222) ## Summary - Fix `gc-nudge` DOLT_GC() failure: replace `--database` (invalid in dolt 1.86.3) with `--use-db` flag - Fix darwin build: cast `stat.Dev` from `int32` to `uint64` in `fsys.readRegularFileSnapshot` ## Changes - `examples/dolt/commands/gc-nudge/run.sh` — use `--use-db` instead of `--database` when calling `dolt sql` - `internal/fsys/read_regular_unix.go` — cast `stat.Dev` to `uint64` for darwin compatibility - `cmd/gc/dolt_gc_nudge_script_test.go` — update test assertions to match new flag ## Test plan - [x] All gc-nudge script tests updated and pass - [x] `stat.Dev` cast verified on darwin Closes ga-66a 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…atcher-pool-resume-flags fix: rebuild bare resume command to include schema defaults (gastownhall#799)
…readiness-detection fix(profiles): add readiness tuning to cursor provider
…de-rig-locals fix(dolt-health): exclude rig-local Dolt servers from zombie scan
…all#1208) ## Summary Fixes a build break on darwin introduced in c3e6f17. `unix.Stat_t.Dev` is `int32` on darwin / BSDs and `uint64` on Linux; the new `readRegularFileSnapshot` (`internal/fsys/read_regular_unix.go`) assigned it directly to `fileIdentity.dev` (`uint64`), so the build fails on darwin with: ``` cannot use stat.Dev (variable of type int32) as uint64 value in struct literal ``` This keeps the direct cast in the unix snapshot path, aligns the reflective identity path with the same signed-device bit preservation, adds in-package coverage for both synthetic signed-device handling and the real snapshot-to-Lstat identity flow, and wires a Darwin compile guard into `make test` / `make test-cover`. Closes gastownhall#1207 ## Testing - [x] `go test ./internal/fsys` - [x] `make test-fsys-darwin-compile` - [x] `make check` (maintainer follow-up; local pre-commit hook) - [ ] `make check-docs` — N/A, no docs touched - [ ] `make test-integration` — N/A, no runtime/controller/workflow change ## Checklist - [x] Linked an issue (gastownhall#1207) - [x] Added targeted `internal/fsys` regression coverage - [x] Added a default-path Darwin compile guard - [x] No user-facing surface — internal helper - [x] No breaking change Co-authored-by: Julian Knutsen <julianknutsen@users.noreply.github.com>
Providers like OpenCode need different commands for ACP vs tmux transport (e.g. 'opencode acp' for ACP, bare 'opencode' for tmux). The existing Args field applies to both transports with no override. Add ACPCommand and ACPArgs fields to ProviderSpec and ResolvedProvider. When transport is 'acp', BuildProviderLaunchCommand uses ACPCommandString() which falls back field-by-field to Command/Args, allowing partial overrides.
…udge-complete fix(mail): nudge reply recipients from human
## Summary - run the triage-label workflow for reopened issues/PRs and PRs marked ready for review - skip draft PRs so they do not enter triage until ready - fail loudly if the workflow cannot resolve an issue/PR number ## Tests - git diff --check
…y-hardening Harden release provenance
…session-id fix(mail): use session bead id for senders
Three interacting bugs orphaned in_progress beads when pool sessions churned, leaving them invisible to every work_query tier (no assignee match, status not "ready"): 1. reapStaleSessionBeads closed any session whose tmux probe failed, including ones past startup that held active claims. Restrict to sessions stuck in the "creating" state (or with pending_create_claim set) — by design those cannot have claimed work yet, since claim is the worker's first post-startup action. Sessions past creating with a dead tmux are left for the lifecycle reconciler to restart so the original assignee resumes the work. 2. unclaimWorkAssignedToRetiredSessionBead and the default EffectiveOnDeath/EffectiveOnBoot shell hooks all cleared the assignee on in_progress beads but never reset status. Reset to "open" so a fresh worker can re-claim via Tier 3 of the work_query (gc.routed_to + --unassigned). 3. Belt-and-suspenders against any future close path that bypasses (1): closeBead now refuses to close a session bead while non-session work is still assigned to it. The reconciler relies on the assignee link to wake the session and resume claims; closing under live claims would strand the work. Callers that legitimately need to retire an active session must drain or unclaim first. Evidence on a live city: 17 codex-max session beads cycled in ~1 hour (9 "stale-session", plus drained/orphaned/duplicate). Four PR-review finalize beads ended up status=in_progress + assignee="" + routed_to= gascity/workflows.codex-max, invisible to all four work_query tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up for closed PR gastownhall#1340.\n\nPreserves Julian Knutsen's original order-check cache work and includes the reviewed cold-cache fallback fix plus generated API/dashboard artifacts.
…er-orphan-claims-complete-20260428 fix(session): preserve in_progress claims across worker churn
…ync-session-starts-complete perf(sessions): enqueue reconciler starts asynchronously
…tions-permissions chore: reduce workflow token permissions
…lease-security chore: harden release security docs and publishing
…hell-trust-boundaries Harden controller shell trust boundaries
…-codeql Add fork-safe advanced CodeQL workflow
…-deps Pin build and CI dependencies
Polecats were creating duplicate PRs against the same bead because re-spawn didn't check whether another polecat alias already had an open branch/PR for that bead. Triage of pa-9zyx4 found four such double-PR cases (pa-v4oi6.7, .19, .23, .24) where polecat/capable/* and polecat/rictus/* simultaneously targeted the same issue ID. Add a check in mol-polecat-work.workspace-setup that runs before fresh branch creation. When metadata.branch is empty: - Scan origin for any polecat/* branch carrying this bead ID, matched precisely (anchored to '/' before and end/'@'/'-' after, so pa-wgb8g does not match pa-wgb8g2 or pa-wgb8g.2). - Same-alias match → adopt as ours by setting metadata.branch; the existing rejection-aware-resume path checks it out below. - Different-alias match → don't open a duplicate. Mail the witness with both branches, drop the claim back to the pool, drain-ack and exit. - Both → adopt our own; nudge witness about the other-alias branches. Bumps formula version 8 → 9.
vbtcl
pushed a commit
that referenced
this pull request
May 6, 2026
…gastownhall#1203) ## Summary The session-first migration (`dd90ac0a`, Mar 8 2026) deleted the Agent Protocol primitive (`internal/agent/agent.go` and the `Agent`/`Handle` interfaces) and moved its responsibilities into `internal/session/` (lifecycle) and `internal/runtime/` (providers). Commit `be8debd8` then renamed `agent.*` events to `session.*`. The primitive list in AGENTS.md and the architecture docs in `engdocs/architecture/` continued to describe the deleted abstraction — for ~50 days. This PR aligns the docs with HEAD. **No code changes.** ## Changes - **Renamed** `engdocs/architecture/agent-protocol.md` → `session.md` (88% similarity preserved by `git mv`). Title and summary rewritten to frame as primitive #1 (Session) with a History note pointing at `dd90ac0a`. - **AGENTS.md**: - Renamed primitive #1 from "Agent Protocol" to "Session"; reference `agent.SessionNameFor` (in the `agent` helper package, not `session`) - Fixed Messaging derivation: `Session.Nudge()` (delegating to `runtime.Provider.Nudge()`) — `SendPrompt()` doesn't exist - Fixed Health Patrol derivation language - Updated progressive capability table (Level 0-1: Agent → Session) - Replaced `agent` with `worker` in the canonical-domain list; footnoted `internal/agent/` as a small helper package - Added new **"Active migrations"** subsection documenting: - Worker boundary migration (started `12a0a848`, in progress) — names the CI test `TestGCNonTestFilesStayOnWorkerBoundary` that enforces non-test `cmd/gc/` files route through `worker.Handle` - Session-first migration (completed `dd90ac0a`) - **`engdocs/architecture/`**: updated `nine-concepts.md`, `index.md`, `glossary.md`, `messaging.md`, `dispatch.md`, `controller.md`, `prompt-templates.md` — Agent Protocol references replaced with Session, broken links fixed. - **`event-bus.md`**: rewrote the event-type constants table from `Agent*`/`agent.*` to `Session*`/`session.*` (matching `internal/events/events.go:20-38` exactly, including the new `SessionUpdated` constant the old table was missing). Renamed `AutomationFired`/`Completed`/`Failed` to `OrderFired`/`Completed`/`Failed`. Updated the storage-format example. - **`health-patrol.md`**: fixed six event names renamed by `be8debd8` (`agent.*` → `session.*`); replaced obsolete description of the removed `internal/agent.Agent` interface with current helper-package responsibilities; disambiguated the Erlang/OTP table's "Worker" row from gascity's `internal/worker/` package. - **`test/integration/E2E-PROVIDER-GAPS.md`**: three references to `agent.started` → `session.woke`. - **`examples/gastown/packs/gastown/formulas/mol-digest-generate.toml`**: runnable formula's `gc events --type=...` filter updated from broken `agent.*` event names to current `session.*` names. - Refreshed "Last verified against code" stamps on all touched docs to 2026-04-25. ## Why The spec maintenance rule (originally in `specs/architecture.md`, now distributed across the per-doc "Last verified" stamps) requires updating docs in the same commit as referenced symbol renames. That discipline wasn't honored on `dd90ac0a` or `be8debd8`, leaving the doc set describing a deleted primitive. A new contributor reading AGENTS.md was being directed to look for an abstraction that doesn't exist. ## Test plan - [x] `make build` clean - [x] `make check` clean (full `./...` suite green) - [x] `make check-docs` clean - [x] Repo-wide grep for `agent-protocol.md` returns 0 hits (no broken cross-doc links) - [x] All `Agent Protocol` references that remain are in intentional History/migration footnotes (4 files) - [x] All `session.*` event constants in docs match `internal/events/events.go:20-38` exactly - [x] `agent.*` event references remain only in `engdocs/archive/` (3 files, deliberately preserved historical analysis) ## Pre-flight review Multi-agent review pipeline (3 Anthropic reviewers + Gas City contributor checker) iterated 2 rounds before convergence. Iteration 1 caught a blocker (the worker-boundary claim was factually wrong) plus 5 majors; iteration 2 verified all fixes and approved with zero remaining findings. Codex and Copilot CLI were unresponsive in this session and skipped per the review skill's documented fallback policy. --------- Co-authored-by: sjarmak <sjarmak@users.noreply.github.com> Co-authored-by: Julian Knutsen <julianknutsen@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Polecats were creating duplicate PRs against the same bead because re-spawn didn't check whether another polecat alias already had an open branch/PR for that bead. Triage of bead
pa-9zyx4found four such double-PR cases (pa-v4oi6.7,.19,.23,.24) wherepolecat/capable/*andpolecat/rictus/*simultaneously targeted the same issue ID.This adds a check in
mol-polecat-work.workspace-setupthat runs before fresh branch creation. Whenmetadata.branchis empty:polecat/*branch carrying this bead ID, matched precisely (anchored to/before and end/@/-after — sopa-wgb8gdoes not matchpa-wgb8g2orpa-wgb8g.2).metadata.branch; the existing rejection-aware-resume path below checks it out.drain-ackand exit.Bumps formula version 8 → 9.
Source bead
Filed (in partcl rig's beads database, since that's where the symptom was observed) as
pa-wgb8g. The fix has to land here ingascitybecause the formula and dispatcher live in this repo, not in partcl.Test plan
python3 -c 'import tomllib; tomllib.load(open(...))')bash -nsyntax check/<bead>([@-]|$)matchespolecat/<alias>/<bead>-...andpolecat/<alias>/<bead>@..., does NOT matchpolecat/<alias>/<bead>2-...orpolecat/<alias>/<bead>.2-...The last two require driving a real polecat through the formula and aren't easily automated. They are the right follow-up if a test harness for formula execution exists.
Notes for reviewers
git ls-remote(the equivalent path the bead description offered) rather thangh pr list.git ls-remotedoesn't require GitHub auth and runs in any rig regardless of forge.metadata.duplicate_detectedis set in the OTHER-only path so the next polecat that picks the bead can see what happened on the previous attempt.🤖 Generated with Claude Code