Skip to content

feat(speculate): naive happy-path speculation in speculate controller#148

Merged
sbalabanov merged 1 commit into
mainfrom
sergeyb/speculate
May 20, 2026
Merged

feat(speculate): naive happy-path speculation in speculate controller#148
sbalabanov merged 1 commit into
mainfrom
sergeyb/speculate

Conversation

@sbalabanov
Copy link
Copy Markdown
Contributor

Summary

Replaces the speculate controller's unconditional fan-out (always publishing to both build and merge) with a state-machine driver that advances a batch one step per invocation along a single happy-path speculation chain (batch.Dependencies + [batch.ID]).

Per call, based on batch.State:

State Action
Created / Scored publish to build, transition to Speculating
Speculating + all deps Succeeded publish to merge, transition to Finalizing
Speculating, deps still in flight no-op (re-evaluated on next event)
Speculating, dep Failed/Cancelled no-op + warn (TODO: replan)
Finalizing / terminal no-op

Inspired by the legacy Java SpeculationTree/StrategyImpl (under dev-platform/cicd/submitqueue). Deliberately omitted from this PR — these are TODOs for follow-ups:

  • Score-driven prioritization (the legacy NodeScorer / PrioritizerImpl).
  • 2^N tree of orderings with true/false branches per batch (ExhaustiveEnumerator).
  • Failure replanning when a dependency fails.
  • Conflict resolution and capacity limits.

Implementation notes

  • Publishes first, then CAS-updates state via the existing optimistic-locking UpdateState. A publish failure leaves the state unchanged for the retry to re-attempt cleanly.
  • Speculation chain is currently only logged; the build controller does not yet receive it as part of the message (TODO: thread SpeculationPathInfo through to build/buildsignal).
  • Re-triggering across batches still relies on the existing merge → speculate and buildsignal → speculate hops; downstream batches getting re-evaluated when an upstream batch concludes is a separate gap (not addressed here).

Test plan

  • go test ./orchestrator/controller/speculate/... — passes
  • go test ./orchestrator/controller/... — passes (no regressions in adjacent controllers)
  • go vet ./... — clean
  • New table/state tests cover: start-from-Created, start-from-Scored, finalize-no-deps, finalize-all-deps-succeeded, waiting-on-deps, failed-dep halts, terminal/finalizing no-op, storage failures (primary + dep fetch), publish failures (start + finalize), bad payload.

🤖 Generated with Claude Code

@sbalabanov sbalabanov requested review from a team and behinddwalls as code owners April 29, 2026 19:57
@sbalabanov sbalabanov marked this pull request as draft April 29, 2026 19:59
// startSpeculation kicks off CI for this batch on top of the speculative head
// (batch.Dependencies assumed to all pass), then transitions to Speculating.
func (c *Controller) startSpeculation(ctx context.Context, batch entity.Batch) error {
c.logger.Infow("starting speculation",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to use the context aware logger for tracing/ maybe we deal it with separately to track request journey in the logs end to end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not today, I just want to have something working first

@sbalabanov sbalabanov force-pushed the sergeyb/speculate branch 4 times, most recently from 70eb275 to dae5582 Compare May 7, 2026 22:46
Replaces the unconditional fan-out with a state-machine driver that
advances a batch one step per invocation along a single happy-path
speculation chain (batch.Dependencies + [batch.ID]):

  - Created/Scored             → publish to build, Speculating
  - Speculating + deps OK      → publish to merge, Merging
  - Speculating + dep failed   → fail batch, log per request, publish to conclude
  - Speculating, waiting       → no-op (re-evaluated when deps advance)
  - Merging/terminal           → no-op
  - Unrecognized state         → error (nack so the event is retried)

When a dependency reaches Failed/Cancelled the batch cannot land on top
of it; rather than livelock in Speculating forever the controller now
emits a RequestStatusError log per contained request, transitions the
batch to Failed, and publishes to conclude so the request store and
log get reconciled.

Assumes every in-flight build will pass and ignores score; failure
replanning, score-driven prioritization, and the full Java-style
SpeculationTree (2^N orderings + scoring) are explicit TODOs. Publishes
before CAS-updating state so a publish failure leaves state unchanged
for retry.

Also renames BatchStateFinalizing → BatchStateMerging across the entity,
batch controller, and tests.
@sbalabanov sbalabanov force-pushed the sergeyb/speculate branch from dae5582 to 43fb559 Compare May 20, 2026 22:36
@sbalabanov sbalabanov marked this pull request as ready for review May 20, 2026 22:39
@sbalabanov sbalabanov merged commit 17775b4 into main May 20, 2026
13 checks passed
@github-actions github-actions Bot deleted the sergeyb/speculate branch May 20, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants