Skip to content

feat(controller): hooking up merge controller#105

Closed
kevinlnew wants to merge 5 commits into
kevin.new/github-landproviderfrom
kevin.new/mergecontroller
Closed

feat(controller): hooking up merge controller#105
kevinlnew wants to merge 5 commits into
kevin.new/github-landproviderfrom
kevin.new/mergecontroller

Conversation

@kevinlnew
Copy link
Copy Markdown
Contributor

@kevinlnew kevinlnew commented Feb 27, 2026

Summary

feat(controller): hooking up merge controller

What?
Hooked up the merge controller with the land provider.
Also wired up the example server with the merge controller along with default github land provider.

Idempotency checks:

  • Batch state
    • if batch is already terminal, this means we can skip landing and store updates and publish the message.
    • otherwise, try to land.
  • Landing:
    • Implementation can pass back ErrIsAlreadyLanded, which is a happy path signal to set the batch to succeeded.
  • Errors
    • All the returned errors except bad payload and ErrLandRejected are retryable.

Test Plan

Unit tests, CI

Issues

Stack

  1. feat(landprovider): Adding land provider extension interface #104
  2. feat(landprovider): basic github implementation #109
  3. @ feat(controller): hooking up merge controller #105

@kevinlnew kevinlnew force-pushed the kevin.new/landprovider branch from 2cffebf to e8b31bc Compare February 27, 2026 21:46
@kevinlnew kevinlnew force-pushed the kevin.new/mergecontroller branch from 0815e08 to ec6b0b5 Compare February 27, 2026 21:46
@kevinlnew kevinlnew force-pushed the kevin.new/landprovider branch from e8b31bc to bfbfc86 Compare February 27, 2026 21:57
@kevinlnew kevinlnew force-pushed the kevin.new/mergecontroller branch from ec6b0b5 to 6c6c3e3 Compare February 27, 2026 22:11
@kevinlnew kevinlnew changed the title feat(controller): hooking up merge controller [WIP] feat(controller): hooking up merge controller Feb 27, 2026
Comment thread entity/batch.go Outdated
// BatchStateLandFailed is the state of a batch that has failed to land.
BatchStateLandFailed BatchState = "land_failed"
// BatchStateLandSucceeded is the state of a batch that has successfully landed.
BatchStateLandSucceeded BatchState = "land_succeeded"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it differ from "succeeded" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed land_failed and land_succeeded, they weren't necessary. Just artifacts from me trying to understand state transitions.
The merge controller now just updates the batch to the terminal states (succeeded and failed) for the "concluder" controller to pick up later.

Comment thread example/server/orchestrator/main.go Outdated
Comment thread orchestrator/controller/merge/merge.go Outdated
Comment thread orchestrator/controller/merge/merge.go Outdated
)
c.metricsScope.Counter("idempotent_skip").Inc(1)
case entity.BatchStateLanding:
newState, err := c.land(ctx, batch)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also process the land error "already merged" for another idempotency check.
If it is for the future, add TODO comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the IsAlreadyLanded error to the interface in a previous PR in the stack and hooked it up to the land function

Comment thread orchestrator/controller/merge/merge.go Outdated
Comment thread orchestrator/controller/merge/merge.go Outdated
Comment thread orchestrator/controller/merge/merge.go Outdated
@kevinlnew kevinlnew force-pushed the kevin.new/landprovider branch from bfbfc86 to c78ac99 Compare March 2, 2026 18:25
@kevinlnew kevinlnew force-pushed the kevin.new/mergecontroller branch from 6c6c3e3 to 40bfe00 Compare March 2, 2026 18:25
@kevinlnew kevinlnew changed the base branch from kevin.new/landprovider to kevin.new/github-landprovider March 2, 2026 18:25
@kevinlnew kevinlnew requested a review from sbalabanov March 2, 2026 18:25
@kevinlnew kevinlnew force-pushed the kevin.new/github-landprovider branch from 28f56d9 to d9f72cd Compare March 2, 2026 18:31
@kevinlnew kevinlnew force-pushed the kevin.new/mergecontroller branch 3 times, most recently from b6bd849 to bab9708 Compare March 2, 2026 20:41
@kevinlnew kevinlnew force-pushed the kevin.new/github-landprovider branch from d511180 to 32190ca Compare March 2, 2026 20:41
@kevinlnew kevinlnew force-pushed the kevin.new/mergecontroller branch from bab9708 to 93bcc1f Compare March 2, 2026 20:54
@kevinlnew kevinlnew marked this pull request as ready for review March 2, 2026 22:29
@kevinlnew kevinlnew requested review from a team and behinddwalls as code owners March 2, 2026 22:29
@kevinlnew kevinlnew changed the title [WIP] feat(controller): hooking up merge controller feat(controller): hooking up merge controller Mar 3, 2026
c.metricsScope.Counter("received").Inc(1)

msg := delivery.Message()
batchID := string(msg.Payload)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to define a serialization format for this. Using direct string-to-byte conversion is Golang-specific.
Probably a separate serializable structure.

// previous attempt; skip straight to publish.
batch, err := c.storage.GetBatchStore().Get(ctx, batchID)
if err != nil {
c.logger.Errorw("failed to fetch batch",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we return error to upstream, we do not have to log. Otherwise most likely we'll log twice.
Logging should happen at error chain termination (top-most) point.

)
c.metricsScope.Counter("publish_errors").Inc(1)
return errs.NewRetryableError(fmt.Errorf("failed to publish to merge-signal: %w", err))
c.metricsScope.Counter("batch_fetch_errors").Inc(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz rebase and integrate with new metrics infrastructure that processes errors automagically

"error", err,
)
c.metricsScope.Counter("batch_update_errors").Inc(1)
return errs.NewRetryableError(fmt.Errorf("failed to update batch state: %w", err))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be retryable if the error is expected type, like version mismatch, but we seem to retry on all of the errors? If the database is operational but "table does not exist" is the error, this is not retryable.


batch.State = newState
} else {
c.logger.Infow("batch already in terminal state, skipping to publish",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be a warning level, as this should only happen on infra errors

err := c.landProvider.Land(ctx, batch.Queue, entries)

switch {
case err == nil, landprovider.IsAlreadyLanded(err):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be at least log landprovider.IsAlreadyLanded, but better tally too?
this also should not happen for a normal workflow, so good to have some observability
also, comment here scenarios which could cause this

"error", err,
)
c.metricsScope.Counter("land_rejected").Inc(1)
return entity.BatchStateFailed, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to propagate the error information about why land was rejected. Can be done in another diff, but then needs a TODO comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants