rpadmin: add PreRestartProbe and PostRestartProbe#170
Merged
Conversation
Adds Go bindings for the broker-local restart probes introduced in
Redpanda 25.1 (src/v/redpanda/admin/api-doc/broker.json):
GET /v1/broker/pre_restart_probe -> PreRestartCheckResult{Risks}
GET /v1/broker/post_restart_probe -> PostRestartCheckResult{LoadReclaimedPercent}
PreRestartProbe lets a caller ask the broker whether restarting it is
safe right now. The response groups the partitions that would be
affected into four risk categories:
- rf1_offline (RF=1 partitions go offline; usually OK)
- full_acks_produce_unavailable (acks=-1 produce would be rejected)
- unavailable (both produce and consume rejected)
- acks1_data_loss (acks=1 produce could lose data)
PostRestartProbe reports how much load the broker has reclaimed
since the most recent restart, as a percentage of in-sync replicas.
Both probes answer for the broker that handles the request, so callers
typically scope the client to a specific broker via ForHost before
invoking. The optional limit parameter caps the number of partitions
returned per risk category (server default is 128).
The K8s operator's rolling-restart gate is the immediate consumer:
today it relies on the cluster-wide health overview, which lags pod
state and conflates "this broker isn't safe to restart" with "the
cluster has any under-replicated partition". The per-broker probe
gives the operator a precise per-decision signal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
CI flagged the table-driven test cases: when the longest key in a map literal forces aligned spacing, shorter keys need padding to match. Pre-existing lint issues in other files on main are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closed
3 tasks
andrewstucki
requested changes
May 20, 2026
Contributor
andrewstucki
left a comment
There was a problem hiding this comment.
Can you either swap this to run a real broker via something like testcontainers or at least link to the restart probe API docs?
Contributor
Author
Will do, will run a real broker with testcontainers. Thank you. |
Replace the httptest mocks for PreRestartProbe/PostRestartProbe with integration tests that drive a real Redpanda broker via testcontainers (redpandadata/redpanda-nightly:latest). The pre-restart test creates an RF=1 topic and asserts the partitions surface in rf1_offline; both tests also exercise the limit query-param round-trip. The probe doc comments now also link directly to the admin API spec in src/v/redpanda/admin/ api-doc/broker.json, addressing both halves of the reviewer's suggestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Test workflow already lists every module in its matrix but only fires on *.go path changes, so a PR that touches just go.mod/go.sum (e.g. a dependency bump) bypassed CI. Mirror the Lint workflow's trigger so module-only updates run the tests too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 20, 2026
andrewstucki
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Go bindings for the broker-local restart probes introduced in
Redpanda 25.1
(
src/v/redpanda/admin/api-doc/broker.json):GET /v1/broker/pre_restart_probePreRestartProbe(ctx, limit)PreRestartCheckResult{Risks}GET /v1/broker/post_restart_probePostRestartProbe(ctx, limit)PostRestartCheckResult{LoadReclaimedPercent}PreRestartProbeasks the broker whether restarting it right now wouldaffect partitions, and groups the affected partitions into four risk
categories:
rf1_offline— RF=1 partitions go offline (generally acceptable risk)full_acks_produce_unavailable—acks=-1produce would be rejectedunavailable— both produce and consume rejectedacks1_data_loss—acks=1produce could lose dataPostRestartProbereports how much load this broker has reclaimedsince the most recent restart as a percentage of in-sync replicas.
Both probes answer for the broker that handles the request, so callers
typically scope the client to a specific broker via
ForHostbeforeinvoking. The optional
limitparameter caps the number of partitionsreturned per risk category (server default is 128).
Why
The K8s operator's rolling-restart gate is the immediate consumer
(redpanda-operator#1537):
today it relies on the cluster-wide health overview, which lags pod
state and conflates "this broker isn't safe to restart" with "the
cluster has any under-replicated partition."
PreRestartProbegivesthe operator a precise per-decision signal;
PostRestartProbelets itdefer the next pod delete until the just-restarted broker has actually
caught up rather than just passing a liveness check.
Tracked in Redpanda's ENG-222.
Testing
Per review feedback, the probes are now exercised end-to-end against a
real Redpanda broker rather than an
httptestmock. The integrationtests in
rpadmin/api_broker_test.gospin upredpandadata/redpanda-nightly:latestvia testcontainers-go's redpandamodule
(same dependency already used by
kvstoreandredpanda-otel-exporter)and drive the rpadmin client against the container's admin API.
TestPreRestartProbekadmrf1_offline(polled, since theprobe trails topic creation briefly)
JSON-tag drift between client and server)
limit=1caps every category to ≤1 entryTestPostRestartProbeload_reclaimed_pc ∈ [0, 100]limitpropagates without the broker rejecting the requestBoth tests skip under
go test -shortand gate behind testcontainers'Docker discovery so contributors without Docker can still run the rest
of the rpadmin suite. The probe doc comments now also link directly to
the admin API spec (
broker.json) for callers who want to verify theresponse shape without spelunking the rpadmin source.
CI: the
Testworkflow already runs the rpadmin matrix entry on everyPR touching
**.go; this PR also broadens its trigger paths to include**/go.modand**/go.sumso dependency-only updates run the tests too.Test plan
go test -run "TestPreRestartProbe|TestPostRestartProbe" ./...against
redpandadata/redpanda-nightly:latest— both greengo test -short ./...— integration tests skip cleanlygo vet ./...golangci-lint run --new-from-rev=main ./...— 0 issues🤖 Generated with Claude Code