fix(polecat): pre-spawn duplicate-branch check in mol-polecat-work by vbtcl · Pull Request #1 · partcleda/gascity

vbtcl · 2026-04-29T22:00:20Z

Summary

Polecats were creating duplicate PRs against the same bead because re-spawn didn't check whether another polecat alias already had an open branch/PR for that bead. Triage of bead pa-9zyx4 found four such double-PR cases (pa-v4oi6.7, .19, .23, .24) where polecat/capable/* and polecat/rictus/* simultaneously targeted the same issue ID.

This adds a check in mol-polecat-work.workspace-setup that runs before fresh branch creation. When metadata.branch is empty:

Scan origin for any polecat/* branch carrying this bead ID, matched precisely (anchored to / before and end/@/- after — so pa-wgb8g does not match pa-wgb8g2 or pa-wgb8g.2).
Same-alias match → adopt as ours by setting metadata.branch; the existing rejection-aware-resume path below checks it out.
Different-alias match → don't open a duplicate. Mail the witness with both branches, drop the claim back to the pool, drain-ack and exit.
Both → adopt our own; nudge witness about the other-alias branches.

Bumps formula version 8 → 9.

Source bead

Filed (in partcl rig's beads database, since that's where the symptom was observed) as pa-wgb8g. The fix has to land here in gascity because the formula and dispatcher live in this repo, not in partcl.

Test plan

TOML parses (python3 -c 'import tomllib; tomllib.load(open(...))')
Rendered shell snippet passes bash -n syntax check
Regex /<bead>([@-]|$) matches polecat/<alias>/<bead>-... and polecat/<alias>/<bead>@..., does NOT match polecat/<alias>/<bead>2-... or polecat/<alias>/<bead>.2-...
Decision logic verified manually for SAME-only, OTHER-only, BOTH, NONE cases
End-to-end: contrived re-spawn against a bead with an existing same-alias branch — should resume (not implemented as automated test; would need the dispatcher in the loop)
End-to-end: contrived re-spawn against a bead with an existing different-alias branch — should release claim, mail witness, exit

The last two require driving a real polecat through the formula and aren't easily automated. They are the right follow-up if a test harness for formula execution exists.

Notes for reviewers

I left the dispatcher-side guard (Go code) for a separate change. The bead description says both layers should ideally guard; this PR is the formula half.
The check uses git ls-remote (the equivalent path the bead description offered) rather than gh pr list. git ls-remote doesn't require GitHub auth and runs in any rig regardless of forge.
metadata.duplicate_detected is set in the OTHER-only path so the next polecat that picks the bead can see what happened on the previous attempt.

🤖 Generated with Claude Code

…all#799) Pool-agent sessions resumed through the control-dispatcher path dropped --dangerously-skip-permissions, --settings, and other schema-default launch flags. The previous shouldPreserveStoredRuntimeCommand treated a stored command that exactly equalled the bare provider binary ("claude") as complete and preserved it, skipping BuildProviderLaunchCommand. The resulting claude --resume <uuid> ran without the unrestricted permission mode flag and wedged on interactive tool-call prompts. Rebuild when the stored command is only the bare binary while still preserving longer stored commands that already carry flags (so existing paths that persist fully-resolved command strings continue to win). The API path (internal/api/session_runtime.go) and the CLI worker-boundary path (cmd/gc/worker_handle.go) carry parallel copies of the helper — update both and add regression tests covering each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pool-managed and ephemeral cursor-agent sessions were timing out during create because the cursor profile had no ReadyPromptPrefix or ReadyDelayMs, so the reconciler could not detect when the TUI was ready to accept piped-in prompts. Named sessions worked because their prompt is delivered as an exec arg, bypassing readiness detection. cursor-agent 2026.04+ renders its composer input with a U+2192 (right arrow) prefix after roughly 5-10s of startup (workspace-trust gate + statsig + model load). Match the pattern used by the claude/codex/ gemini/copilot profiles: ReadyPromptPrefix: "\u2192 " ReadyDelayMs: 10000 Verified on macOS arm64 with cursor-agent 2026.04.17: ephemeral cursor workers now claim beads, execute mol-do-work, write the requested files, and run gc runtime drain-ack cleanly instead of cycling stopped -> creating -> stopped against a deadline_exceeded outcome. Made-with: Cursor

…nhall#1196) ## Summary - Change `dolt-gc-nudge` interval from **6h → 1h** - Change default `GC_DOLT_GC_THRESHOLD_BYTES` from **2 GiB → 0** (run unconditionally; knob preserved as optional escape hatch) - Update runbook + script comments with empirical evidence ## Why A beads perf harness run (2026-04) shows Dolt's auto-GC **does not fire under bd's fork-per-op CLI workload** even with `auto_gc_behavior.enable: true` and the managed `config.yaml` we ship. Unbounded commit-graph growth causes two prod symptoms: - Disk bloat — ~120 GB after a few days of agent work - Tail latency — at 143 MB of accumulated history: **p99 = 16.9s, max = 18.6s**; cost scales roughly linearly with history size, extrapolating to multi-minute ops at GB scale Manual `CALL DOLT_GC()` on a 148 MB store reclaims 43% in 4.6s, so a periodic nudge is both necessary and cheap. The pre-tune defaults (2 GiB threshold + 6h cooldown) allow bloat to climb well into the tail-latency danger zone between nudges. At 1h unconditional GC, the commit graph stays bounded and p99 stays low. ## Test plan - [x] `go test ./cmd/gc/ -run 'DoltGCNudge|Order'` passes locally (existing tests already set `GC_DOLT_GC_THRESHOLD_BYTES=0` explicitly, so no test changes needed) - [ ] CI green - [ ] Prod observation post-merge: `gc doctor` reports clean `dolt-noms-size` on cities that previously bloated 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…atomic-materialize fix: atomically materialize system pack files

…nv-vars fix: persist supervisor provider env vars

The zombie scan flagged every dolt sql-server PID except the main city server as a zombie. Rig-local Dolt servers configured via dolt.port in config.yaml are legitimate — the scan now reads rig configs from the metadata cache and excludes PIDs listening on known rig ports. Closes gastownhall#1217 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When dolt-provider-state.json has a stale PID (dolt was restarted but state not yet refreshed) or is absent entirely (e.g. after a crash), publishManagedDoltRuntimeState now falls back to inspecting the actual running process via repairedManagedDoltRuntimeState, which probes the port holder and verifies process ownership. On success it repairs the provider state file atomically before writing dolt-state.json, so all subsequent readers see a consistent view. Previously the function returned an immediate error in both cases, causing gc doctor and gc beads health to fail permanently until gc start was run — even though dolt was running and healthy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… dolt#10944) (gastownhall#1200)

…r-disable managed dolt: disable upstream load-avg auto-GC scheduler (workaround dolt#10944)

…me-under-2m-complete Reduce cmd/gc test runtime under two minutes

Add a regression test for the builtin cursor provider so future cleanup cannot silently drop its readiness prefix or readiness delay and recreate the startup deadline_exceeded loop.

…astownhall#1222) ## Summary - Fix `gc-nudge` DOLT_GC() failure: replace `--database` (invalid in dolt 1.86.3) with `--use-db` flag - Fix darwin build: cast `stat.Dev` from `int32` to `uint64` in `fsys.readRegularFileSnapshot` ## Changes - `examples/dolt/commands/gc-nudge/run.sh` — use `--use-db` instead of `--database` when calling `dolt sql` - `internal/fsys/read_regular_unix.go` — cast `stat.Dev` to `uint64` for darwin compatibility - `cmd/gc/dolt_gc_nudge_script_test.go` — update test assertions to match new flag ## Test plan - [x] All gc-nudge script tests updated and pass - [x] `stat.Dev` cast verified on darwin Closes ga-66a 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…atcher-pool-resume-flags fix: rebuild bare resume command to include schema defaults (gastownhall#799)

…readiness-detection fix(profiles): add readiness tuning to cursor provider

…de-rig-locals fix(dolt-health): exclude rig-local Dolt servers from zombie scan

…all#1208) ## Summary Fixes a build break on darwin introduced in c3e6f17. `unix.Stat_t.Dev` is `int32` on darwin / BSDs and `uint64` on Linux; the new `readRegularFileSnapshot` (`internal/fsys/read_regular_unix.go`) assigned it directly to `fileIdentity.dev` (`uint64`), so the build fails on darwin with: ``` cannot use stat.Dev (variable of type int32) as uint64 value in struct literal ``` This keeps the direct cast in the unix snapshot path, aligns the reflective identity path with the same signed-device bit preservation, adds in-package coverage for both synthetic signed-device handling and the real snapshot-to-Lstat identity flow, and wires a Darwin compile guard into `make test` / `make test-cover`. Closes gastownhall#1207 ## Testing - [x] `go test ./internal/fsys` - [x] `make test-fsys-darwin-compile` - [x] `make check` (maintainer follow-up; local pre-commit hook) - [ ] `make check-docs` — N/A, no docs touched - [ ] `make test-integration` — N/A, no runtime/controller/workflow change ## Checklist - [x] Linked an issue (gastownhall#1207) - [x] Added targeted `internal/fsys` regression coverage - [x] Added a default-path Darwin compile guard - [x] No user-facing surface — internal helper - [x] No breaking change Co-authored-by: Julian Knutsen <julianknutsen@users.noreply.github.com>

Providers like OpenCode need different commands for ACP vs tmux transport (e.g. 'opencode acp' for ACP, bare 'opencode' for tmux). The existing Args field applies to both transports with no override. Add ACPCommand and ACPArgs fields to ProviderSpec and ResolvedProvider. When transport is 'acp', BuildProviderLaunchCommand uses ACPCommandString() which falls back field-by-field to Command/Args, allowing partial overrides.

…udge-complete fix(mail): nudge reply recipients from human

## Summary - run the triage-label workflow for reopened issues/PRs and PRs marked ready for review - skip draft PRs so they do not enter triage until ready - fail loudly if the workflow cannot resolve an issue/PR number ## Tests - git diff --check

…y-hardening Harden release provenance

…session-id fix(mail): use session bead id for senders

Three interacting bugs orphaned in_progress beads when pool sessions churned, leaving them invisible to every work_query tier (no assignee match, status not "ready"): 1. reapStaleSessionBeads closed any session whose tmux probe failed, including ones past startup that held active claims. Restrict to sessions stuck in the "creating" state (or with pending_create_claim set) — by design those cannot have claimed work yet, since claim is the worker's first post-startup action. Sessions past creating with a dead tmux are left for the lifecycle reconciler to restart so the original assignee resumes the work. 2. unclaimWorkAssignedToRetiredSessionBead and the default EffectiveOnDeath/EffectiveOnBoot shell hooks all cleared the assignee on in_progress beads but never reset status. Reset to "open" so a fresh worker can re-claim via Tier 3 of the work_query (gc.routed_to + --unassigned). 3. Belt-and-suspenders against any future close path that bypasses (1): closeBead now refuses to close a session bead while non-session work is still assigned to it. The reconciler relies on the assignee link to wake the session and resume claims; closing under live claims would strand the work. Callers that legitimately need to retire an active session must drain or unclaim first. Evidence on a live city: 17 codex-max session beads cycled in ~1 hour (9 "stale-session", plus drained/orphaned/duplicate). Four PR-review finalize beads ended up status=in_progress + assignee="" + routed_to= gascity/workflows.codex-max, invisible to all four work_query tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up for closed PR gastownhall#1340.\n\nPreserves Julian Knutsen's original order-check cache work and includes the reviewed cold-cache fallback fix plus generated API/dashboard artifacts.

…er-orphan-claims-complete-20260428 fix(session): preserve in_progress claims across worker churn

…ync-session-starts-complete perf(sessions): enqueue reconciler starts asynchronously

…tions-permissions chore: reduce workflow token permissions

…lease-security chore: harden release security docs and publishing

…hell-trust-boundaries Harden controller shell trust boundaries

…-codeql Add fork-safe advanced CodeQL workflow

…-deps Pin build and CI dependencies

Polecats were creating duplicate PRs against the same bead because re-spawn didn't check whether another polecat alias already had an open branch/PR for that bead. Triage of pa-9zyx4 found four such double-PR cases (pa-v4oi6.7, .19, .23, .24) where polecat/capable/* and polecat/rictus/* simultaneously targeted the same issue ID. Add a check in mol-polecat-work.workspace-setup that runs before fresh branch creation. When metadata.branch is empty: - Scan origin for any polecat/* branch carrying this bead ID, matched precisely (anchored to '/' before and end/'@'/'-' after, so pa-wgb8g does not match pa-wgb8g2 or pa-wgb8g.2). - Same-alias match → adopt as ours by setting metadata.branch; the existing rejection-aware-resume path checks it out below. - Different-alias match → don't open a duplicate. Mail the witness with both branches, drop the claim back to the pool, drain-ack and exit. - Both → adopt our own; nudge witness about the other-alias branches. Bumps formula version 8 → 9.

…gastownhall#1203) ## Summary The session-first migration (`dd90ac0a`, Mar 8 2026) deleted the Agent Protocol primitive (`internal/agent/agent.go` and the `Agent`/`Handle` interfaces) and moved its responsibilities into `internal/session/` (lifecycle) and `internal/runtime/` (providers). Commit `be8debd8` then renamed `agent.*` events to `session.*`. The primitive list in AGENTS.md and the architecture docs in `engdocs/architecture/` continued to describe the deleted abstraction — for ~50 days. This PR aligns the docs with HEAD. **No code changes.** ## Changes - **Renamed** `engdocs/architecture/agent-protocol.md` → `session.md` (88% similarity preserved by `git mv`). Title and summary rewritten to frame as primitive #1 (Session) with a History note pointing at `dd90ac0a`. - **AGENTS.md**: - Renamed primitive #1 from "Agent Protocol" to "Session"; reference `agent.SessionNameFor` (in the `agent` helper package, not `session`) - Fixed Messaging derivation: `Session.Nudge()` (delegating to `runtime.Provider.Nudge()`) — `SendPrompt()` doesn't exist - Fixed Health Patrol derivation language - Updated progressive capability table (Level 0-1: Agent → Session) - Replaced `agent` with `worker` in the canonical-domain list; footnoted `internal/agent/` as a small helper package - Added new **"Active migrations"** subsection documenting: - Worker boundary migration (started `12a0a848`, in progress) — names the CI test `TestGCNonTestFilesStayOnWorkerBoundary` that enforces non-test `cmd/gc/` files route through `worker.Handle` - Session-first migration (completed `dd90ac0a`) - **`engdocs/architecture/`**: updated `nine-concepts.md`, `index.md`, `glossary.md`, `messaging.md`, `dispatch.md`, `controller.md`, `prompt-templates.md` — Agent Protocol references replaced with Session, broken links fixed. - **`event-bus.md`**: rewrote the event-type constants table from `Agent*`/`agent.*` to `Session*`/`session.*` (matching `internal/events/events.go:20-38` exactly, including the new `SessionUpdated` constant the old table was missing). Renamed `AutomationFired`/`Completed`/`Failed` to `OrderFired`/`Completed`/`Failed`. Updated the storage-format example. - **`health-patrol.md`**: fixed six event names renamed by `be8debd8` (`agent.*` → `session.*`); replaced obsolete description of the removed `internal/agent.Agent` interface with current helper-package responsibilities; disambiguated the Erlang/OTP table's "Worker" row from gascity's `internal/worker/` package. - **`test/integration/E2E-PROVIDER-GAPS.md`**: three references to `agent.started` → `session.woke`. - **`examples/gastown/packs/gastown/formulas/mol-digest-generate.toml`**: runnable formula's `gc events --type=...` filter updated from broken `agent.*` event names to current `session.*` names. - Refreshed "Last verified against code" stamps on all touched docs to 2026-04-25. ## Why The spec maintenance rule (originally in `specs/architecture.md`, now distributed across the per-doc "Last verified" stamps) requires updating docs in the same commit as referenced symbol renames. That discipline wasn't honored on `dd90ac0a` or `be8debd8`, leaving the doc set describing a deleted primitive. A new contributor reading AGENTS.md was being directed to look for an abstraction that doesn't exist. ## Test plan - [x] `make build` clean - [x] `make check` clean (full `./...` suite green) - [x] `make check-docs` clean - [x] Repo-wide grep for `agent-protocol.md` returns 0 hits (no broken cross-doc links) - [x] All `Agent Protocol` references that remain are in intentional History/migration footnotes (4 files) - [x] All `session.*` event constants in docs match `internal/events/events.go:20-38` exactly - [x] `agent.*` event references remain only in `engdocs/archive/` (3 files, deliberately preserved historical analysis) ## Pre-flight review Multi-agent review pipeline (3 Anthropic reviewers + Gas City contributor checker) iterated 2 rounds before convergence. Iteration 1 caught a blocker (the worker-boundary claim was factually wrong) plus 5 majors; iteration 2 verified all fixes and approved with zero remaining findings. Codex and Copilot CLI were unresponsive in this session and skipped per the review skill's documented fallback policy. --------- Co-authored-by: sjarmak <sjarmak@users.noreply.github.com> Co-authored-by: Julian Knutsen <julianknutsen@users.noreply.github.com>

Jim Wordelman and others added 30 commits April 22, 2026 14:05

fix: atomically materialize system pack files

32b2a64

fix: persist supervisor provider env vars

62e4881

fix: avoid no-op system pack rewrites

c3e6f17

fix: secure supervisor service files

4367ec3

Merge pull request gastownhall#1172 from gastownhall/fix/system-pack-…

7ed2d38

…atomic-materialize fix: atomically materialize system pack files

fix: refresh cache from bd hook events

584c265

Merge pull request gastownhall#1198 from gastownhall/fix/supervisor-e…

73f52d5

…nv-vars fix: persist supervisor provider env vars

ci: publish asset-based Homebrew formula

3c024cc

docs: document gascity core and emergency tap paths

07005b5

managed dolt: disable upstream load-avg auto-GC scheduler (workaround…

48a1e9b

… dolt#10944) (gastownhall#1200)

Fix gc-beads-bd empty GC scheduler override

81ab8dd

Merge pull request gastownhall#1200 from gastownhall/dolt-gc-schedule…

167423b

…r-disable managed dolt: disable upstream load-avg auto-GC scheduler (workaround dolt#10944)

Reduce cmd/gc test runtime under two minutes (gastownhall#1245)

2ea55af

fix: preserve cmd/gc process coverage and dolt cleanup

72139e9

Merge pull request gastownhall#1246 from gastownhall/fix/cmd-gc-runti…

3341b5a

…me-under-2m-complete Reduce cmd/gc test runtime under two minutes

test: cover settings rebuild in API resume regression

4630840

test(profiles): lock cursor readiness hints

e4891b9

Add a regression test for the builtin cursor provider so future cleanup cannot silently drop its readiness prefix or readiness delay and recreate the startup deadline_exceeded loop.

fix(dolt-health): parse quoted rig ports

4b578dc

Merge pull request gastownhall#1125 from quad341/fix/799-control-disp…

e5783dc

…atcher-pool-resume-flags fix: rebuild bare resume command to include schema defaults (gastownhall#799)

Merge pull request gastownhall#1133 from m1ddl3w4r3/fix/cursor-agent-…

70c6e79

…readiness-detection fix(profiles): add readiness tuning to cursor provider

Merge pull request gastownhall#1218 from boylec/fix/dolt-health-exclu…

da057be

…de-rig-locals fix(dolt-health): exclude rig-local Dolt servers from zombie scan

fix: include protocolVersion in ACP initialize handshake

0ef7ce4

julianknutsen and others added 28 commits April 28, 2026 08:18

fix(mail): queue reply nudges for exec provider

b86d5ff

Merge pull request gastownhall#1404 from gastownhall/fix/mail-reply-n…

fde8e6e

…udge-complete fix(mail): nudge reply recipients from human

Merge pull request gastownhall#1405 from gastownhall/chore/p1-securit…

3facf10

…y-hardening Harden release provenance

fix(mail): preserve sender display names

4f0187f

Merge pull request gastownhall#1375 from gastownhall/fix/mail-sender-…

19e8dc7

…session-id fix(mail): use session bead id for senders

fix(session): preserve pending-create recovery on cleanup

638b284

perf(orders): cache order check read model (gastownhall#1408)

36e15be

Follow-up for closed PR gastownhall#1340.\n\nPreserves Julian Knutsen's original order-check cache work and includes the reviewed cold-cache fallback fix plus generated API/dashboard artifacts.

Merge pull request gastownhall#1407 from gastownhall/fix/session-reap…

45b581b

…er-orphan-claims-complete-20260428 fix(session): preserve in_progress claims across worker churn

perf: enqueue session starts asynchronously

283d658

fix: guard async session start commits

d57a64d

Merge pull request gastownhall#1409 from gastownhall/adopt/pr-1341-as…

0d4ec3a

…ync-session-starts-complete perf(sessions): enqueue reconciler starts asynchronously

chore: reduce workflow token permissions

0e5a928

Merge pull request gastownhall#1415 from gastownhall/chore/gascity-ac…

83eb7d8

…tions-permissions chore: reduce workflow token permissions

chore: harden gascity release security

6b44fc2

ci: grant label workflows pull request permissions

27b21e7

Merge pull request gastownhall#1416 from gastownhall/chore/gascity-re…

f345761

…lease-security chore: harden release security docs and publishing

harden controller shell trust boundaries

2b4767b

Merge pull request gastownhall#1423 from gastownhall/harden/gascity-s…

0dd0b4e

…hell-trust-boundaries Harden controller shell trust boundaries

Add fork-safe CodeQL workflow

b470386

Merge pull request gastownhall#1450 from gastownhall/harden/fork-safe…

96d330a

…-codeql Add fork-safe advanced CodeQL workflow

Add OpenSSF Scorecard workflow

465e450

Add OpenSSF Scorecard workflow

43fb18b

chore: pin build tool dependencies

1c92d20

test: isolate provider binary lookup test

76a5d48

Merge pull request gastownhall#1458 from gastownhall/harden/pin-build…

3ecdc72

…-deps Pin build and CI dependencies

github-actions Bot added the status/needs-triage label Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(polecat): pre-spawn duplicate-branch check in mol-polecat-work#1

fix(polecat): pre-spawn duplicate-branch check in mol-polecat-work#1
vbtcl wants to merge 148 commits intomainfrom
fix/polecat-dup-branch-check

vbtcl commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

vbtcl commented Apr 29, 2026

Summary

Source bead

Test plan

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants