feat(extension): add BuildManager extension for CI/CD integration#65
feat(extension): add BuildManager extension for CI/CD integration#65djuloori2794 wants to merge 18 commits into
Conversation
| // | ||
| // For database storage, this single string format avoids the complexity of | ||
| // composite keys and makes it easy to use as a primary key or index. | ||
| type BuildID string |
There was a problem hiding this comment.
we can just store string...
There was a problem hiding this comment.
We probably want to discuss how we want to type this and similar. I am seeing typed approach and "raw string" approach used between different entities.
|
|
||
| // BuildState represents the execution state of a build. | ||
| // This is a string enum following SubmitQueue's pattern of using string enums with sentinel values. | ||
| type BuildState string |
There was a problem hiding this comment.
https://github.com/uber/submitqueue/blob/main/entity/build.go#L8C2-L8C19 we should move it here
Add vendor-agnostic BuildManager interface for managing builds with external CI providers (BuildKite, Jenkins, etc.). Enables orchestrator to schedule builds, poll status, and cancel running builds. Key components: - BuildManager interface with ScheduleBuild, Poll, CancelBuild methods - BuildID string type with URI format (provider://id) - BuildState enum with IsTerminal() for state management - BuildStatus with timestamps, URLs, and metadata - Sentinel errors for common failure modes Implementations: - mock/: mockgen-generated gomock mocks for unit testing - inmemory/: functional in-memory implementation for integration testing - Simulates async build execution with goroutines - Deterministic test behavior based on batch IDs - State callbacks for testing without time.Sleep Test coverage: - Complete coverage of BuildState.IsTerminal() business logic - BuildID parsing and construction - In-memory implementation with state transitions - Mock compilation and usage verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d code - Simplify ScheduleBuild to 3 parameters (head, base, jobName) - Change all BuildID types to plain string - Return entity.BuildStatus enum directly from Poll - Remove BuildStatusBlocked state - Remove in-memory test implementation (use gomock instead) - Remove unused entity/build package (BuildID, BuildStatus struct) - Consolidate build state into entity.BuildStatus enum - Fix go:generate directive to use mockgen directly This reduces the interface from 10 parameters to 3, removes ~1000 lines of unused code, and eliminates custom types in favor of primitives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
255217e to
6ca9bc9
Compare
| @@ -0,0 +1,40 @@ | |||
| package build | |||
There was a problem hiding this comment.
In Go, it is not idiomatic to have each type in a separate file
| // | ||
| // For database storage, this single string format avoids the complexity of | ||
| // composite keys and makes it easy to use as a primary key or index. | ||
| type BuildID string |
There was a problem hiding this comment.
We probably want to discuss how we want to type this and similar. I am seeing typed approach and "raw string" approach used between different entities.
| expectError: false, | ||
| }, | ||
| { | ||
| name: "valid mock ID", |
There was a problem hiding this comment.
from a functional standpoint, "valid jenkins ID" and "valid mock ID" are the same tests.
AI is clearly overdid on this one.
| ) | ||
|
|
||
| // TestBuildState_IsTerminal tests the IsTerminal method for all build states. | ||
| func TestBuildState_IsTerminal(t *testing.T) { |
There was a problem hiding this comment.
change detector test. likely not needed at all.
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| assert.Equal(t, tc.expected, string(tc.state), | ||
| "BuildState constant should have expected string value") |
There was a problem hiding this comment.
AI..... please stop....
| } | ||
|
|
||
| // TestBuildState_TerminalStates verifies all terminal states are accounted for. | ||
| func TestBuildState_TerminalStates(t *testing.T) { |
There was a problem hiding this comment.
this change detector test repeats another change detector test TestBuildState_IsTerminal
|
|
||
| // FinishedAt is the Unix timestamp in milliseconds when the build finished (passed, failed, or cancelled). | ||
| // Zero if the build has not finished yet. | ||
| FinishedAt int64 |
There was a problem hiding this comment.
why do we need all these fields in SQ?
| @@ -0,0 +1,62 @@ | |||
| package build | |||
There was a problem hiding this comment.
seems my comments on prev revision do not reflect inline on this revision, but please check them as some of them are still relevant.
| ScheduleBuild( | ||
| ctx context.Context, | ||
| head string, | ||
| base []string, |
There was a problem hiding this comment.
ScheduleBuild implementation should probably not rely on its knowledge of the Batches.
I think we need to translate Batches to real buildable arguments before we call this function.
There was a problem hiding this comment.
would it make sense to have list of Change(s) to be applied on main and list of Change(s) for head?
ScheduleBuild(ctx context.Context, queue string, base []Change, head []Change)
There was a problem hiding this comment.
I was having a same impression, but after discussing with @behinddwalls we thought all of these could go into the implementation details.
| Poll(ctx context.Context, id string) (entity.BuildStatus, error) | ||
|
|
||
| // CancelBuild requests cancellation of a queued or running build. | ||
| // Builds that have already completed cannot be cancelled. |
There was a problem hiding this comment.
SQ can still request it. It's up for the implementation how it wants to process it at this point.
| // - error: ErrBuildNotFound if the build doesn't exist, ErrProviderUnavailable if CI provider is unreachable | ||
| Poll(ctx context.Context, id string) (entity.BuildStatus, error) | ||
|
|
||
| // CancelBuild requests cancellation of a queued or running build. |
There was a problem hiding this comment.
Worth saying that the implementation is asynchronous and does not need to wait for the cancellation to complete.
| // Returns: | ||
| // - error: ErrBuildNotFound if the build doesn't exist, | ||
| // ErrBuildNotCancellable if the build has already finished, | ||
| // ErrProviderUnavailable if the CI provider is unreachable |
There was a problem hiding this comment.
Do we need a special error for this, if yes, why?
The provider may be reachable but error out this request for another reason.
There was a problem hiding this comment.
The intent here is to treat provider availability failures differently from request failures.
| ctx context.Context, | ||
| head string, | ||
| base []string, | ||
| jobName string, |
There was a problem hiding this comment.
seems like this is in queconfig now? What about just passing the QueName?
There was a problem hiding this comment.
instead of jobName?
… with actions Refactor BuildManager interface based on review feedback to operate on individual changes (diff IDs, PR numbers) rather than batches, and to accept explicit actions for each change. Changes: - Add BuildAction enum (validate, apply) in entity/build.go - Add BuildChange struct with ChangeID and Action fields - Replace BuildManager.ScheduleBuild() with Schedule() - Takes queueName parameter for job configuration lookup - Takes []BuildChange instead of batch-focused parameters - Returns only error (implementations track builds internally) - Update mocks and tests to match new interface signature The new interface allows callers to specify what action to perform on each change (validate for testing, apply for landing), with the implementation responsible for looking up job configuration based on queue name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cepted Address review feedback to improve BuildManager interface design and add missing build status for better lifecycle tracking. Changes: - Add BuildStatusAccepted status to entity/build.go - Represents build accepted by CI provider (between queued and running) - Non-terminal state in the build lifecycle - Added test coverage in entity/build_test.go - Update BuildManager.Schedule to return build ID - Returns (string, error) instead of just error - Build ID enables callers to poll and cancel builds - Updated mocks and tests to match new signature - Enhance BuildManager interface documentation - Clarify implementations can be singletons or on-demand instances - Allows flexibility based on CI provider requirements - Clarify CancelBuild behavior - Document asynchronous operation (fire-and-forget) - Remove restriction on terminal state cancellation - Implementation decides how to handle terminal state requests - Callers should use Poll to verify cancellation Build lifecycle now: queued → accepted → running → passed/failed/cancelled Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nges Update README.md to reflect all recent changes to BuildManager interface and entity types. Changes: - Update interface signature from ScheduleBuild to Schedule - New parameters: queueName and []BuildChange instead of head, base, jobName - Document Close() method - Add documentation for new types - BuildChange struct (ChangeID, Action) - BuildAction enum (validate, apply) - Update BuildStatus lifecycle - Add BuildStatusAccepted state - Document lifecycle: queued → accepted → running → passed/failed/cancelled - Enhance interface documentation - Clarify singleton vs on-demand implementation flexibility - Document CancelBuild asynchronous behavior - Update terminal state cancellation handling - Update usage examples - Show new Schedule API with BuildChange - Update mock examples - Add cancellation workflow example - Update provider implementation guide - Add queue config integration steps - Document action handling (validate vs apply) - Add implementation design choice guidance - Revise design principles - Replace "No persistent state" with "Implementation flexibility" - Add "Change-focused" and "Asynchronous cancellation" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| const ( | ||
| // BuildActionUnknown is the sentinel value for uninitialized actions. | ||
| BuildActionUnknown BuildAction = "" | ||
| // BuildActionValidate runs validation/testing on the change without applying it. |
There was a problem hiding this comment.
actually, it will have to apply it first
| } | ||
|
|
||
| // BuildAction defines the action to perform on a change submitted to the build system. | ||
| type BuildAction string |
Simplify BuildManager interface by removing unnecessary build statuses and removing unexpected errors from the interface contract. Changes to BuildStatus enum: - Remove BuildStatusQueued - no longer needed - Remove BuildStatusRunning - no longer needed - Keep only 4 statuses: accepted, passed, failed, cancelled - Update lifecycle: accepted → passed/failed/cancelled - Update tests to remove queued and running test cases Changes to BuildManager interface: - Remove ErrProviderUnavailable from all method contracts - Schedule: only ErrInvalidRequest expected - Poll: only ErrBuildNotFound expected - CancelBuild: only ErrBuildNotFound and ErrBuildNotCancellable expected - Implementations can still handle provider errors internally - Only expected/actionable errors are part of the interface contract Documentation updates: - Update BuildStatus enum in README - Update build lifecycle diagram - Remove ErrProviderUnavailable from error handling section - Remove 5xx error mapping from implementation guide - Update ErrBuildNotCancellable description The simplified interface makes it clearer what errors callers need to handle explicitly, while implementations retain flexibility to handle provider-specific issues internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…enum Enhance BuildManager.Poll to return build metadata and reorder BuildAction constants for better semantic ordering. Changes to Poll method: - Add map[string]string return value for build metadata - Document common metadata keys (build_url, commit_sha, duration_ms, etc.) - Update interface signature: Poll(ctx, id) → (BuildStatus, map[string]string, error) - Update mocks and tests to handle metadata return value Changes to BuildAction enum: - Reorder constants: BuildActionApply now comes before BuildActionValidate - Apply is the primary action, validate is secondary - Update all tests and examples to reflect new ordering Documentation updates: - Add Build Metadata section explaining common keys - Update all Poll usage examples to show metadata handling - Update mock examples with metadata assertions - Clarify that metadata keys are implementation-defined The metadata map allows implementations to return provider-specific information (build URLs, commit SHAs, timestamps, etc.) without changing the interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| ScheduleBuild( | ||
| ctx context.Context, | ||
| head string, | ||
| base []string, |
There was a problem hiding this comment.
would it make sense to have list of Change(s) to be applied on main and list of Change(s) for head?
ScheduleBuild(ctx context.Context, queue string, base []Change, head []Change)
Remove ErrProviderUnavailable and its helper functions since it's no longer part of the BuildManager interface contract. Update remaining error comments to reflect current interface. Changes: - Remove ErrProviderUnavailable, IsProviderUnavailable, WrapProviderUnavailable - Update ErrInvalidRequest comment to reference Schedule parameters (queueName, changes) instead of outdated parameters (baseSHA, batchToBeTest.ID, etc.) - Update ErrBuildNotCancellable comment to clarify conditions are implementation-defined Only expected errors (ErrBuildNotFound, ErrBuildNotCancellable, ErrInvalidRequest) remain in errors.go, matching the BuildManager interface contract. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename BuildStatusPassed to BuildStatusSucceeded for better semantic clarity. Change the status value from "passed" to "succeeded". Changes: - Rename BuildStatusPassed → BuildStatusSucceeded - Change status value "passed" → "succeeded" - Update IsTerminal() method comment and implementation - Update all test cases from "passed is terminal" → "succeeded is terminal" - Update all documentation references from passed → succeeded - Update mock tests to use BuildStatusSucceeded - Update error comments to reference succeeded instead of passed Build lifecycle now: accepted → succeeded/failed/cancelled The term "succeeded" is more consistent with standard CI/CD terminology and provides clearer semantics for the successful completion state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CancelBuild Rename parameter from generic 'id' to more descriptive 'buildID' in Poll and CancelBuild methods for better clarity and consistency. Changes: - Poll(ctx, id) → Poll(ctx, buildID) - CancelBuild(ctx, id) → CancelBuild(ctx, buildID) - Update parameter documentation from "id:" to "buildID:" - Update README interface signature - Regenerate mocks with updated parameter names The more descriptive parameter name makes the API clearer and more self-documenting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename BuildAction type to ChangeAction for better semantic clarity and consistency with the domain model. Changes: - Rename type: BuildAction → ChangeAction - Rename constants: - BuildActionUnknown → ChangeActionUnknown - BuildActionApply → ChangeActionApply - BuildActionValidate → ChangeActionValidate - Update all references in tests and documentation - Update BuildChange struct to use ChangeAction type - Update all usage examples in README - Update implementation guide references The term "ChangeAction" is more semantically accurate as it describes the action to perform on a change, not on a build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace raw map[string]string with named BuildMetadata type in Poll method. This provides better type safety and allows for future extension with methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update documentation to use entity.BuildMetadata instead of map[string]string. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| @@ -24,17 +21,12 @@ const ( | |||
| // BuildStatusCancelled indicates the build was cancelled before completion. | |||
There was a problem hiding this comment.
Important to mention that this is SQ that canceled the build. Any Build system originated cancellation is an error in SQ's world.
| ChangeActionUnknown ChangeAction = "" | ||
| // ChangeActionApply applies the change to the target branch. | ||
| ChangeActionApply ChangeAction = "apply" | ||
| // ChangeActionValidate runs validation/testing on the change without applying it. |
There was a problem hiding this comment.
It still has to apply the change first, then build targets changed in this change vs previous change in a chain
| // ChangeID is the unique identifier for this change. | ||
| // This is typically a diff ID (e.g., "D12345") or PR number (e.g., "42"), | ||
| // depending on the source control provider. | ||
| ChangeID string |
There was a problem hiding this comment.
This has to be either Change structure (already defined in entity model) or list or change URLs
|
|
||
| **Thread-safety**: All implementations must be thread-safe and support concurrent operations. | ||
|
|
||
| **Implementation Design**: Implementations may be designed as heavy singletons (with connection pooling and caching) or lightweight instances created on-demand, depending on the CI provider's requirements and implementation strategy. |
There was a problem hiding this comment.
I assume we initialize build manager only once (at least one per each build provider), which means that the implementation needs to be long living. Similar to other components like Storage.
|
|
||
| ### Cancelling Builds | ||
|
|
||
| Cancellation is **asynchronous** - the method initiates the cancellation request and returns immediately. Use `Poll` to check if the build has transitioned to `BuildStatusCancelled`. |
There was a problem hiding this comment.
I think BuildStatusCancelled will be set without confirmation from the build system.
| - **Batch operations**: Poll/cancel multiple builds at once | ||
| - **Streaming logs**: Real-time log streaming from builds | ||
| - **Build caching**: Cache build status to reduce API calls | ||
| - **Partial cancellation**: Cancel individual changes in a multi-change build |
| // | ||
| // Returns: | ||
| // - error: ErrBuildNotFound if the build doesn't exist, | ||
| // ErrBuildNotCancellable if the build cannot be cancelled (implementation-specific) |
There was a problem hiding this comment.
I do not think it needs to return ErrBuildNotCancellable
From SW perspective, it should be an idempotent operation. The build needs to be stopped only to save capacity, its results are not relevant anymore, if it managed to complete or fail it should not be an error.
An error is when an infra failure occurs within the Build ecosystem processing this request. Even in this case SQ will likely only need to see this error for observability and can still proceed with the workflow.
| CancelBuild(ctx context.Context, buildID string) error | ||
|
|
||
| // Close gracefully shuts down the build manager. | ||
| // Implementations should cancel pending requests, close HTTP clients, and clean up resources. |
There was a problem hiding this comment.
should they cancel with error?
- Change BuildChange to use Change entity instead of string ChangeID for better type safety - Add context parameter to Close() method for shutdown timeout control - Remove ErrBuildNotCancellable error - CancelBuild now only returns ErrBuildNotFound - Clarify BuildStatusCancelled is set immediately by SubmitQueue without waiting for build system confirmation - Update ChangeActionValidate comment to reflect it applies the change first, then validates - Clarify implementations are long-lived singletons initialized at service startup - Update README with all interface changes and usage examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update ChangeActionValidate comment to explicitly describe the sequence: applies the change first, then validates by running respective validation/test suites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove redundant "via CancelBuild" from comment - "cancelled by SubmitQueue" is sufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the explanatory comment about context controlling shutdown timeout from the Close method documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove context parameter from Close() to be consistent with other extensions (Storage, Queue, Publisher, Subscriber) which all use the standard io.Closer pattern: Close() error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| This format allows the implementation to encode both the provider name and provider-specific build identifier in a single string. | ||
|
|
||
| ### Build Status Enum |
There was a problem hiding this comment.
avoid exact domain definitions in docs as they run out of sync. Prefer samples / examples, only when needed.
| // This is used by BuildManager to specify what changes to build and what action to perform. | ||
| type BuildChange struct { | ||
| // Change is the code change to process. | ||
| // This references the same Change entity used in Request, containing the source provider |
There was a problem hiding this comment.
at this moment Change is just a list of URLs and provider is encoded into schema part of it
There was a problem hiding this comment.
isn't line number 76 good enough?
| } | ||
| } | ||
|
|
||
| func TestBuildChange_Creation(t *testing.T) { |
| ) | ||
|
|
||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) |
| ) | ||
|
|
||
| // TestMockBuildManager_Compilation verifies the mock compiles and basic setup works. | ||
| func TestMockBuildManager_Compilation(t *testing.T) { |
There was a problem hiding this comment.
I do not think we need a test for a mock :)
| @@ -0,0 +1,60 @@ | |||
| package mock | |||
There was a problem hiding this comment.
use gomock bazel rule, refer to docs on how
| @@ -0,0 +1,347 @@ | |||
| # BuildManager Extension | |||
There was a problem hiding this comment.
plz consider significantly reducing this doc, only leave what's applicable to Build Manager specifically.
|
|
||
| - **Webhook support**: Accept push notifications from CI providers instead of polling | ||
| - **Build artifacts**: Track and retrieve build artifacts | ||
| - **Retry logic**: Automatic retry for transient failures |
There was a problem hiding this comment.
I think we'll handle retries on the Queue/Controller
| - **Webhook support**: Accept push notifications from CI providers instead of polling | ||
| - **Build artifacts**: Track and retrieve build artifacts | ||
| - **Retry logic**: Automatic retry for transient failures | ||
| - **Batch operations**: Poll/cancel multiple builds at once |
There was a problem hiding this comment.
I can remove it, but if we have to cancel multiple builds at once. Can't we do it in a single call if the provider supports it?
| - **Build artifacts**: Track and retrieve build artifacts | ||
| - **Retry logic**: Automatic retry for transient failures | ||
| - **Batch operations**: Poll/cancel multiple builds at once | ||
| - **Streaming logs**: Real-time log streaming from builds |
There was a problem hiding this comment.
also not sure if really needed.
a URL for the build may be good enough?
I suggest to avoid documenting speculations, which AI very likes to do.
There was a problem hiding this comment.
Hmm, definitely not a critical requirement. I can remove them!
|
Will just redo the PR! |
Add vendor-agnostic BuildManager interface for managing builds with external CI providers (BuildKite, Jenkins, etc.). Enables orchestrator to schedule builds, poll status, and cancel running builds.