feat(buildrunner): add Buildkite BuildRunner implementation#190
Open
JamyDev wants to merge 8 commits into
Open
feat(buildrunner): add Buildkite BuildRunner implementation#190JamyDev wants to merge 8 commits into
JamyDev wants to merge 8 commits into
Conversation
Implements buildrunner.BuildRunner and buildrunner.Factory backed by the Buildkite CI platform, under submitqueue/extension/buildrunner/buildkite. Design: - Trigger/Cancel return promptly (per the async/sync contract in the build-runner RFC): both enqueue work on a buffered channel and a background worker contacts Buildkite, keeping the orchestrator's queue loops decoupled from Buildkite availability. - The in-memory build-ID -> Buildkite-ref map is a pure latency cache, not the source of truth. The SQ build ID is stamped into the Buildkite build's metadata at create time; Status and Cancel re-derive the ref via a metadata-filtered build lookup on cache miss, so they remain correct after a process restart that empties the cache (no orphaned Accepted-forever builds). - The background worker retries transient create/cancel failures with backoff. If a submission exhausts its retries, the build is recorded as a submission failure and Status reports terminal Failed (with the reason in BuildMetadata["error"]) rather than polling Accepted forever. - mapState collapses Buildkite states into BuildStatus; unrecognised states map to the non-terminal Unknown (not Failed), so an unknown state does not terminally fail a batch. - Each job is dispatched to its own goroutine so a retrying submit does not head-of-line block other builds. Errors are returned plain (classification left to the controller, per core/errs). Includes config knobs (queue sizes, submit timeout, retry attempts/backoff, test overrides) and unit tests covering trigger/status/cancel, cache-miss recovery, retry-then-succeed, retry-exhaustion-fails, and state mapping. Known limitation: a process restart while a trigger job is still buffered (never submitted to Buildkite) leaves the build Accepted with nothing to recover; catching that belongs to an orchestrator-level Accepted deadline, which is out of scope for this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
behinddwalls
reviewed
Jun 4, 2026
… via HTTP transport - Replace New(Config) with NewBuildRunner(Params) following the MergeChecker Params pattern: Config holds queue-specific settings, HTTPClient is injected by the caller with auth and base URL already configured via transports - Remove APIToken from Config; auth is now the caller's concern via a transport layer (e.g. Authorization-header RoundTripper) - Remove BaseURL from Config and client; client now uses relative paths so BaseURLTransport on the injected client resolves them to absolute URLs - Delete factory.go; the Factory belongs in the example server (per-queue wiring is an application-level concern, not an extension concern)
Replace the channel-based async machinery (triggerCh, cancelCh, background worker goroutine, retry loop, submitFailures map) with direct synchronous Buildkite API calls in Trigger and Cancel. Errors propagate to the caller so the queue consumer can nack and retry through the normal path. Config is simplified to QueueName only; the channel buffer sizes, submit timeout, max attempts, and backoff fields are removed. Tests drop the drainTrigger/drainCancel helpers and async-specific cases in favour of straightforward call-and-assert patterns.
behinddwalls
reviewed
Jun 5, 2026
behinddwalls
reviewed
Jun 5, 2026
Comment on lines
+89
to
+91
| if httpClient == nil { | ||
| httpClient = http.DefaultClient | ||
| } |
Collaborator
There was a problem hiding this comment.
should we fail given we expect transports to be wired up and be passed and if nil, there is no point is continueing?
behinddwalls
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
submitqueue/extension/buildrunner/buildkite— aBuildRunnerimplementation backed by the Buildkite CI platform. The package is four files:buildkite.go—runnerstruct implementingBuildRunner.Trigger,.Status, and.Cancelclient.go— thin HTTP wrapper over the three Buildkite REST endpoints (create/get/cancel build)buildkite_test.go— 16 unit tests backed by an in-processhttptestserverBUILD.bazel— Bazel build fileTriggercalls the Buildkite API to create a build, encoding the base and head change URIs as JSON-encoded environment variables (SQ_BASE_URIS,SQ_HEAD_URIS,SQ_QUEUE). The Buildkitepipeline script uses these to fetch diffs via the GitHub API, apply them in order, and run CI.
StatusandCancelderive the Buildkite build number from theBuildIDstring — no localstate is needed.
Auth and base URL are injected via HTTP transport layers on the caller-provided
*http.Client(usinghttpclient.BaseURLTransport+ an auth transport), following the existinghttpclientpattern and keeping credentials out of the struct.
Why
The
BuildRunnerextension interface was designed to support multiple CI platforms as pluggable backends. This adds the first real (non-noop) implementation. Buildkite is the target CIplatform for speculative merges: its pipeline model maps cleanly onto the base/head separation in
BuildRunner.Trigger— the pipeline applies the base layer first (potentially cacheableacross batches), then the head layer, then runs the full test suite.
Using environment variables to pass change URIs keeps the SQ→Buildkite contract simple and auditable; the pipeline script is responsible for materialising the diffs, isolating SQ from
Buildkite's pipeline YAML.
Tests
All three interface methods are tested against an in-process
httptestserver — no external dependencies or mocks:Trigger: validates HTTP method, request payload (env var keys/values), URI flattening across multipleChangeentries, and that a nil base produces[](notnull) in JSONStatus: state mapping — exhaustive table over all documented Buildkite states (creating,scheduled,running,blocked,passed,failed,not_run,skipped,canceling,canceled) plus unknown future states (mapped to non-terminalUnknownso the poll loop keeps waiting rather than failing a batch on an unrecognised state); also validates build URL isreturned in metadata
Cancel: verifies the cancel endpoint is called; verifies 422 (already-terminal build) is treated as a no-op