You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
### Why?
The orchestrator's build stage needs to drive the `BuildRunner` contract
end-to-end: trigger the runner, persist the result, and poll `Status`
until terminal so the batch state machine can react. Polling has to
behave like the rest of the pipeline (queue-driven, partition-isolated,
restart-safe) rather than running as an in-process timer loop.
### What?
Stacks on top of the BuildRunner interface, noop, and `PublishAfter`
PRs. Wires the runner into the orchestrator pipeline.
The build poll loop runs as queue traffic inside the existing
`buildsignal` consumer (no separate stage). On each delivery it loads the
`Build` from storage, calls `BuildRunner.Status`, persists the result via
`BuildStore.UpdateStatus`, publishes the batch ID to `speculate` so the
state machine re-evaluates, and re-publishes itself via
`Publisher.PublishAfter` until the build reaches a terminal state. A
webhook-capable backend can publish into the same topic — the consumer
cannot tell a poll-driven message from a push.
Only the build **ID** travels on the queue (`entity.BuildID`); the
consumer reloads the full `Build` from `BuildStore`, keeping the message
small and storage the single source of truth — the same ID-on-the-queue,
load-from-storage pattern the rest of the pipeline already uses for
batches and requests. The controllers consume the runner's
`entity.BuildID` signatures (`Trigger` returns one; `Status` takes one).
Pieces:
- `orchestrator/controller/build`: assembles `base` from
`batch.Dependencies` and `head` from `batch.Contains`, calls
`Trigger`, persists the initial `Build{Accepted}` via
`BuildStore.Create` (`ErrAlreadyExists` is swallowed for redelivery),
publishes the build ID to `buildsignal`.
- `orchestrator/controller/buildsignal`: the polling consumer described
above. It loads the `Build` by ID, then polls. `PollDelayAcceptedMs=5000`,
`PollDelayRunningMs=2000` by default (vars so tests can override; a TODO
notes these should move into the `queueconfig` extension). Error
classification: only the `PublishAfter` re-schedule is wrapped retryable
(`errs.NewRetryableError`) — it is the poll loop's heartbeat, so a
transient enqueue blip nacks and replays (up to `MaxAttempts`) rather
than rejecting the loop's only live message straight to DLQ. Deserialize,
the `Build` load, `Status`, `UpdateStatus`, and the speculate publish
stay non-retryable and reject to DLQ on first failure, where an
operational republish is the recovery path.
- `example/server/orchestrator/main.go`: passes the `BuildRunner` to
both `build.NewController` and `buildsignal.NewController`; pipeline
diagram updated.
- root `BUILD.bazel`: adds `# gazelle:exclude .claude` so gazelle does
not index nested worktrees as duplicate rule definitions and corrupt
the canonical BUILD files.
## Test Plan
- ✅ `bazel test //extension/buildrunner/... //orchestrator/controller/build/... //orchestrator/controller/buildsignal/... //extension/queue/...` — all pass.
- ✅ `make fmt lint check-tidy check-gazelle check-mocks` — clean.
- ✅ `make build` — all targets compile.
- New coverage: build controller persist+publish path (with
`ErrAlreadyExists` swallow), buildsignal poll loop (terminal forwards
to speculate, non-terminal re-publishes via `PublishAfter` with
per-status delay, retryable re-publish failure, non-retryable build-load
/ `Status` / `UpdateStatus` failures reject to DLQ).
0 commit comments