Skip to content

admin: Support creating incidents and adding serials#8740

Open
beautifulentropy wants to merge 3 commits intomainfrom
admin-tool-sa-incidents
Open

admin: Support creating incidents and adding serials#8740
beautifulentropy wants to merge 3 commits intomainfrom
admin-tool-sa-incidents

Conversation

@beautifulentropy
Copy link
Copy Markdown
Member

@beautifulentropy beautifulentropy commented Apr 29, 2026

Add a StorageAuthorityAdmin gRPC service and four admin subcommands that move incident management into the admin tool. Register the service only for the admin client and write through a new incidents_sa_admin MySQL user; the existing incidents_sa user stays SELECT-only.

Incidents can impact tens to hundreds of millions of certificates, and operators may load serials from multiple overlapping time spans during the affected window. admin load-incident-serials reads serials from a file, fans them out across N workers (default 10), and streams them to the SA in batches of 10,000. The server re-batches at the same threshold and uses INSERT IGNORE on insertion, so overlapping bulk loads and within-batch duplicates are idempotent.

The remaining subcommands cover CRUD on the incidents table:

  • admin create-incident adds a row and creates the per-incident serials table.
  • admin list-incidents prints each incident's name, enabled state, renewBy, and URL.
  • admin update-incident changes any subset of url, renewBy, and enabled on an incident by name.

Fixes #6943

@beautifulentropy beautifulentropy force-pushed the admin-tool-sa-incidents branch 7 times, most recently from da7c02c to 5392654 Compare April 29, 2026 20:59
@beautifulentropy beautifulentropy force-pushed the admin-tool-sa-incidents branch from 5392654 to 24c16e5 Compare April 29, 2026 21:01
@beautifulentropy beautifulentropy marked this pull request as ready for review April 29, 2026 21:10
@beautifulentropy beautifulentropy requested a review from a team as a code owner April 29, 2026 21:10
@github-actions
Copy link
Copy Markdown
Contributor

@beautifulentropy, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values.

Copy link
Copy Markdown
Contributor

@aarongable aarongable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a detailed review of admin/incident_test.go or saa_test.go, but I've done a pass on everything else.

Comment thread cmd/admin/dryrun.go
var _ saAdminClient = (*dryRunSAAdmin)(nil)

func (d dryRunSAAdmin) CreateIncident(_ context.Context, req *sapb.CreateIncidentRequest, _ ...grpc.CallOption) (*sapb.Incident, error) {
d.log.Infof("dry-run: Create incident %q (url=%q, renewBy=%s)", req.SerialTable, req.Url, req.RenewBy.AsTime())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up that the blog PR totally overhauls how the dry-run clients above log, so keep an eye on that depending on whether this lands before or after the blog PR.

Comment thread sa/proto/sa.proto
Comment thread cmd/admin/incident.go
return errors.New("-parallelism must be > 0")
}

file, err := os.Open(s.serialsFile)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: if s.serialsFile is -, then open stdin. Alternatively, add documentation noting that a user could pass /dev/stdin as the file to read from if they want to stream serials on stdin.

Comment thread sa/proto/sa.proto
Comment thread sa/proto/sa.proto Outdated
Comment thread sa/db/02-users_next.sql Outdated
Comment thread cmd/boulder-sa/main.go Outdated
Comment thread cmd/admin/incident.go Outdated
Comment thread sa/saa.go

var incidentTable string
var buf []string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding an explicit check that the table exists, so we can provide a better error message if it doesn't? Totally optional, feel free to ignore.

@aarongable aarongable requested review from a team and jsha and removed request for a team April 30, 2026 00:01
aarongable
aarongable previously approved these changes May 2, 2026
Comment thread sa/saa.go Outdated
Copy link
Copy Markdown
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like AddSerialsToIncident could be a plain RPC rather than a streaming one, and it would simplify the code significantly. We have a batching mechanism in the load-incident-serials subcommand, so we know the messages will never be too big. In fact, they'd be the same size as each streaming message is in the current setup.

Comment thread cmd/admin/incident.go Outdated
Comment thread cmd/admin/incident.go
Comment on lines +130 to +136
if s.enable != "" {
v, err := strconv.ParseBool(s.enable)
if err != nil {
return fmt.Errorf("parsing -enable as bool: %w", err)
}
req.Enabled = &v
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use flag.BoolVar for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag.BoolVar binds to a bool, so -enable=false and "flag not passed" both result in false. We need a third state to distinguish between those.

Comment thread cmd/admin/incident.go Outdated
Comment thread sa/saa.go
Comment on lines +176 to +179
// INSERT IGNORE no-ops duplicate-key violations from concurrent or
// repeated inserts, avoiding the next-key locks ON DUPLICATE KEY UPDATE
// would take. Only the serial PK can fire today; revisit if the schema
// gains a FK, CHECK, or NOT NULL column we do not populate.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little redundant, since this is the plain meaning of INSERT IGNORE. It would be more informative for the method (and maybe the admin subcommand) to document that duplicates will be ignored.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this as more of a "why" comment covering the choice of INSERT IGNORE, which comes with some benefits and some surprising complications. INSERT IGNORE skips rows for FK and CHECK violations, which the simple CRUD tests at TestAddSerialsToIncident() on saa_test.go:242 would catch. However, for NOT NULL violations and string truncation, it inserts the row anyway with the column's implicit default and emits a warning which is suppressed. So a future schema change that adds NOT NULL without a default to incident_* lands silently: row count is correct which means that test passes, and the new column ends up with junk default values.

The ON DUPLICATE KEY UPDATE note is also important. We picked INSERT IGNORE over ODKU to avoid the extra row locks ODKU takes for UPDATEs, which can deadlock under concurrent bulk loads.

@beautifulentropy
Copy link
Copy Markdown
Member Author

It seems like AddSerialsToIncident could be a plain RPC rather than a streaming one, and it would simplify the code significantly. We have a batching mechanism in the load-incident-serials subcommand, so we know the messages will never be too big. In fact, they'd be the same size as each streaming message is in the current setup.

I'm happy to refactor this into a unary RPC, but I do like the way the streaming form reads. The per-batch messages are also slightly smaller since they don't carry the incident table name on every call, though the difference is negligible when compared against the size of a typical batch of serials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

admin: Add commands to create and enable incident tables

3 participants