feat(extension): Add BuildManager interface for CI/CD integration#106
feat(extension): Add BuildManager interface for CI/CD integration#106djuloori2794 wants to merge 1 commit into
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| // Returns: | ||
| // - string: Unique build ID that can be used with Poll and CancelBuild methods | ||
| // - error: ErrInvalidRequest if validation fails | ||
| Schedule(ctx context.Context, queueName string, changes []entity.BuildChange) (string, error) |
There was a problem hiding this comment.
does it make sense to say just "TriggerBuild", whether it's done right away vs scheduled...it's impl details?
There was a problem hiding this comment.
I think this is another constraint that we need to put on the implementation by explicitly stating the expected behavior in comments. Which is: implementation should be asynchronous, return fast, and Poll is to be used to check statuses.
There was a problem hiding this comment.
which means that "done right away" is not an option to the implementation.
If we believe that the implementation can deduce builds which already happen based on provided change, it can return the build status along with the buildID.
"123, accepted, nil"
"124, succeeded, nil"
So SQ will save on not doing the next polling call.
| // - BuildStatus: Current state of the build | ||
| // - BuildMetadata: Additional metadata about the build (e.g., build URL, commit SHA, duration) | ||
| // - error: ErrBuildNotFound if the build doesn't exist | ||
| Poll(ctx context.Context, buildID string) (entity.BuildStatus, entity.BuildMetadata, error) |
There was a problem hiding this comment.
I would rename this to `BuildStatus".. poller is just an impl detail?
There was a problem hiding this comment.
Not going to nitpick names, but here again is where behavior matters. We can say in the comments, that unlike "Schedule" this implementation might be synchronous and potentially lengthy.
Full abstractions are not efficient unless they are implemented according to expected usage.
Unfortunately, the language itself has limited ways to contract this so we have to rely on conventions (explicit in comments).
| // | ||
| // Returns: | ||
| // - error: ErrBuildNotFound if the build doesn't exist | ||
| CancelBuild(ctx context.Context, buildID string) error |
There was a problem hiding this comment.
we can drop Build and say Cancel
| "github.com/uber/submitqueue/entity" | ||
| ) | ||
|
|
||
| // BuildManager is a vendor-agnostic interface for managing builds with external CI/CD providers. |
There was a problem hiding this comment.
can we avoid saying CI/CD completely in this interface as it can be anything as per impl...
|
|
||
| // BuildManager is a vendor-agnostic interface for managing builds with external CI/CD providers. | ||
| // Implementations provide integration with specific CI systems (BuildKite, Jenkins, etc.) | ||
| // to schedule builds, poll their status, and cancel running builds. |
There was a problem hiding this comment.
poll is again a semantics of impl
There was a problem hiding this comment.
Yet we made it a part of the interface.
| // Implementations are long-lived singletons (one per build provider) initialized at service | ||
| // startup, similar to Storage and other extension components. They should manage connection | ||
| // pooling, caching, and other resources for the lifetime of the service. | ||
| // |
There was a problem hiding this comment.
i would avoid saying this, maybe simplify it.
BuildManager is responsible for triggering, cancelling and checking status of builds generated by underlying implementations.
Implementations must be thread-safe.
There was a problem hiding this comment.
This is what I asked for so I take this comment.
I think comments need to provide an instruction of how the implementation will be used - as it will ultimately dictate the implementation itself. Can the implementation hold heavy state in the object, or should it use a shared state elsewhere?
A requirement on just thread safety may not be enough to create efficient implementations, as someone implementing them may oversynchronize for the sake of safety.
| // | ||
| // All implementations must be thread-safe and support concurrent operations. | ||
| type BuildManager interface { | ||
| // Schedule submits a list of changes to the CI provider for processing. |
| // The implementation is responsible for: | ||
| // - Looking up the job name from the queue configuration | ||
| // - Creating appropriate builds/jobs for each change based on its action | ||
| // - Handling dependencies between changes (order may be significant) |
There was a problem hiding this comment.
generally order is always important
| // | ||
| // The implementation is responsible for: | ||
| // - Looking up the job name from the queue configuration | ||
| // - Creating appropriate builds/jobs for each change based on its action |
There was a problem hiding this comment.
jobs is very impl specific? so we should just say build
| // - changes: List of changes to process. Order may be significant for dependencies. | ||
| // | ||
| // Returns: | ||
| // - string: Unique build ID that can be used with Poll and CancelBuild methods |
There was a problem hiding this comment.
replace Poll with BuildStatus or for checking the build status
|
|
||
| // BuildManager is a vendor-agnostic interface for managing builds with external CI/CD providers. | ||
| // Implementations provide integration with specific CI systems (BuildKite, Jenkins, etc.) | ||
| // to schedule builds, poll their status, and cancel running builds. |
There was a problem hiding this comment.
Yet we made it a part of the interface.
| // Implementations are long-lived singletons (one per build provider) initialized at service | ||
| // startup, similar to Storage and other extension components. They should manage connection | ||
| // pooling, caching, and other resources for the lifetime of the service. | ||
| // |
There was a problem hiding this comment.
This is what I asked for so I take this comment.
I think comments need to provide an instruction of how the implementation will be used - as it will ultimately dictate the implementation itself. Can the implementation hold heavy state in the object, or should it use a shared state elsewhere?
A requirement on just thread safety may not be enough to create efficient implementations, as someone implementing them may oversynchronize for the sake of safety.
| // Returns: | ||
| // - string: Unique build ID that can be used with Poll and CancelBuild methods | ||
| // - error: ErrInvalidRequest if validation fails | ||
| Schedule(ctx context.Context, queueName string, changes []entity.BuildChange) (string, error) |
There was a problem hiding this comment.
I think this is another constraint that we need to put on the implementation by explicitly stating the expected behavior in comments. Which is: implementation should be asynchronous, return fast, and Poll is to be used to check statuses.
| // Returns: | ||
| // - string: Unique build ID that can be used with Poll and CancelBuild methods | ||
| // - error: ErrInvalidRequest if validation fails | ||
| Schedule(ctx context.Context, queueName string, changes []entity.BuildChange) (string, error) |
There was a problem hiding this comment.
which means that "done right away" is not an option to the implementation.
If we believe that the implementation can deduce builds which already happen based on provided change, it can return the build status along with the buildID.
"123, accepted, nil"
"124, succeeded, nil"
So SQ will save on not doing the next polling call.
| // - BuildStatus: Current state of the build | ||
| // - BuildMetadata: Additional metadata about the build (e.g., build URL, commit SHA, duration) | ||
| // - error: ErrBuildNotFound if the build doesn't exist | ||
| Poll(ctx context.Context, buildID string) (entity.BuildStatus, entity.BuildMetadata, error) |
There was a problem hiding this comment.
Not going to nitpick names, but here again is where behavior matters. We can say in the comments, that unlike "Schedule" this implementation might be synchronous and potentially lengthy.
Full abstractions are not efficient unless they are implemented according to expected usage.
Unfortunately, the language itself has limited ways to contract this so we have to rely on conventions (explicit in comments).
| Schedules builds, polls their status, and cancels running builds. | ||
|
|
||
| ```go | ||
| type BuildManager interface { |
There was a problem hiding this comment.
plz avoid copying the code to the docs.
There was a problem hiding this comment.
examples on "how to" are ok
|
|
||
| **Thread-safety**: All implementations must be thread-safe and support concurrent operations. | ||
|
|
||
| **Singleton Design**: Implementations are long-lived singletons (one per build provider) initialized at service startup, similar to Storage and other extension components. They should manage connection pooling, caching, and other resources for the lifetime of the service. |
There was a problem hiding this comment.
Also important to note how transient errors are handled. Are the implementation expected to reestablish the connection if it is lost, and give back errors on interface requests in the interim?
## Summary Adds a vendor-agnostic `BuildManager` interface under `extension/build` so the orchestrator can drive builds against an external provider (BuildKite, Jenkins, an internal job runner, etc.) without coupling to any one of them. Ports the closed upstream PR #106, refined per its review feedback (naming, contract clarity, no CI/CD or caller-specific wording, no premature error sentinels). ### Interface `BuildManager` exposes four verbs — `Trigger`, `Status`, `Cancel`, `Close`: - `Trigger` returns `(buildID, status, error)`; an implementation that can deduce an already-finished build returns its terminal status so the caller can skip the first `Status` poll, otherwise `Accepted`. - `Status` may be synchronous and lengthy (a provider round trip); `Trigger`/`Cancel` must return promptly. - The godoc spells out what is required of an implementation: concurrency safety, internal recovery from transient connectivity failures, transient-only local state, and plain errors left for the controller to classify per `core/errs`. No custom error sentinels ship yet — they land with the first implementation that needs them. ### Entity changes - `entity.BuildStatus` collapses to `{Unknown, Accepted, Succeeded, Failed, Cancelled}` — drops `Queued`/`Running`/`Blocked` (provider-specific stages) and renames `Passed → Succeeded`. Orchestrator build/buildsignal references move to `Accepted`. - Adds `entity.ChangeAction`, `entity.BuildChange`, `entity.BuildMetadata` as `Trigger` inputs/outputs. ### Out of scope No backend implementation and no orchestrator wiring — those are follow-ups. ## Test Plan - ✅ \`make fmt && make lint\` - ✅ \`make check-tidy && make check-gazelle && make check-mocks\` - ✅ \`bazel build //...\` - ✅ \`bazel test //... --test_tag_filters=-integration\` — 33/33 (integration tests require Docker, not run here) ## Issues (none)
## Summary Adds a vendor-agnostic `BuildManager` interface under `extension/build` so the orchestrator can drive builds against an external provider (BuildKite, Jenkins, an internal job runner, etc.) without coupling to any one of them. Ports the closed upstream PR #106, refined per its review feedback (naming, contract clarity, no CI/CD or caller-specific wording, no premature error sentinels). ### Interface `BuildManager` exposes four verbs — `Trigger`, `Status`, `Cancel`, `Close`: - `Trigger` returns `(buildID, status, error)`; an implementation that can deduce an already-finished build returns its terminal status so the caller can skip the first `Status` poll, otherwise `Accepted`. - `Status` may be synchronous and lengthy (a provider round trip); `Trigger`/`Cancel` must return promptly. - The godoc spells out what is required of an implementation: concurrency safety, internal recovery from transient connectivity failures, transient-only local state, and plain errors left for the controller to classify per `core/errs`. No custom error sentinels ship yet — they land with the first implementation that needs them. ### Entity changes - `entity.BuildStatus` is `{Unknown, Accepted, Running, Succeeded, Failed, Cancelled}`. This drops the provider-specific `Queued`/`Blocked` stages and renames `Passed → Succeeded`, while keeping `Running` as a first-class non-terminal state so callers can distinguish a queued/accepted build from one that is actively executing. Orchestrator build/buildsignal references use `Accepted`. - Adds `entity.ChangeAction`, `entity.BuildChange`, `entity.BuildMetadata` as `Trigger` inputs/outputs. ### Out of scope No backend implementation and no orchestrator wiring — those are follow-ups. ## Test Plan - ✅ \`make fmt && make lint\` - ✅ \`make check-tidy && make check-gazelle && make check-mocks\` - ✅ \`bazel build //...\` - ✅ \`bazel test //... --test_tag_filters=-integration\` (integration tests require Docker, not run here) ## Issues (none)
Add vendor-agnostic BuildManager interface for managing builds with external CI providers (BuildKite, Jenkins, etc.). Enables orchestrator to schedule builds, poll status, and cancel running builds.