Skip to content

Extend TLS auto-reload to CPM and monitoring service clients#19045

Open
kaisecheng wants to merge 11 commits intoelastic:mainfrom
kaisecheng:xpack-reload-certs
Open

Extend TLS auto-reload to CPM and monitoring service clients#19045
kaisecheng wants to merge 11 commits intoelastic:mainfrom
kaisecheng:xpack-reload-certs

Conversation

@kaisecheng
Copy link
Copy Markdown
Contributor

@kaisecheng kaisecheng commented Apr 24, 2026

Release notes

Logstash now automatically rebuilds the Elasticsearch clients used by Central Pipeline Management (CPM) and legacy internal monitoring when their TLS certificate files change on disk. This removes the need to restart Logstash after certificate rotation for those services. Requires ssl.reload.automatic: true and config.reload.automatic: true, which is automatically enabled when xpack.management.enabled: true.

What does this PR do?

Extends the SSL file watch infrastructure introduced in #18978 to the x-pack background service clients that live outside the pipeline lifecycle:

  • ElasticsearchSource (CPM's pipeline-config fetcher) rebuilds its ES client when CPM TLS files change.
  • LicenseReader in both LicenseManager instances (CPM's and monitoring's) rebuilds the license-check client on TLS rotation.

Tracking IDs

SslFileTracker keeps one set of registered paths per id. X-pack service ids use a leading . to keep them disjoint from user pipeline ids:

Tracking ID Owner Cert paths from Reload mechanism
:".cpm" ElasticsearchSource (CPM config fetcher) xpack.management.elasticsearch.ssl.* Rebuild ES fetch client (this PR)
:".cpm_license" LicenseReader inside CPM's LicenseManager xpack.management.elasticsearch.ssl.* Rebuild license-check client (this PR)
:".monitoring-logstash" The monitoring pipeline itself (elasticsearch output) xpack.monitoring.elasticsearch.ssl.* Pipeline reload via PR2 mechanism
:".monitoring_license" LicenseReader inside monitoring's LicenseManager xpack.monitoring.elasticsearch.ssl.* Rebuild license-check client (this PR)

Implementation notes

  • Moved the creation of Tracker to Runner, not Agent. CPM's ElasticsearchSource is built during :after_bootstrap_checks before Agent.new, so Agent cannot own the tracker. The exact creation site is between bootstrap_checks.each and :after_bootstrap_checks: CPM's BootstrapCheck flips config.reload.automatic = true during each, and CPM's hook builds its source on :after_bootstrap_checks. Anywhere outside that window either misreads the activation gate or constructs the source before the tracker exists.

  • LogStash::Helpers::ElasticsearchClientHolder owns one lazy-built ES client. The factory create(tracker, id) { build_client } returns either:

    • Lazy (no tracker) — memoise the factory result.
    • SslRebuildable (with tracker) — call tracker.consume_stale on every get; when stale, close the old client and rebuild before returning.

    LicenseReader and ElasticsearchSource just call holder.get. Rebuild lifecycle is invisible to callers.

Why is it important/What is the impact to the user?

Without this PR, rotating certificates used by CPM or internal monitoring forces a full Logstash restart, which interrupts pipeline processing. With this PR, certificate rotation is hot-applied by the next scheduled fetch tick, matching the user-pipeline behavior that PR2 already delivered.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • Integration tests cover 3 end-to-end scenarios
    • CPM cert rotation rebuilds both the CPM ES client and the license-reader client; subsequent pipeline-config push proves the rebuilt CPM client still connects (cpm_tls_hot_reload_spec.rb)
    • Monitoring cert rotation with config.reload.automatic: true rebuilds the license-reader client; monitoring docs continue to ship (monitoring_tls_hot_reload_spec.rb)
    • Monitoring cert rotation with config.reload.automatic: false (default) is silently ignored; no error logged (monitoring_tls_hot_reload_spec.rb)

How to test this PR locally

Manual

  1. Start Logstash with CPM enabled and config.reload.automatic: true + ssl.reload.automatic: true
  2. Confirm pipelines fetch from ES via TLS
  3. Trigger a content change on the CA bundle referenced by xpack.management.elasticsearch.ssl.certificate_authority. The tracker hashes regular file content with SHA-256, so any byte change is detected:
    • Quick smoke test: printf '\n' >> /path/to/your/ca.crt. Adds a trailing newline outside the PEM block, so ES connectivity is unaffected.
    • Realistic rotation: generate a second CA with openssl req -x509 ... and append its PEM to the bundle (the connection survives because both the old and the new CA are trusted). See cpm_tls_hot_reload_spec.rb for the full pattern.
  4. Set log.level: debug (the rebuild events are logged at DEBUG). Within ~30s expect two rebuilt elasticsearch client `<id>` on certificate change log lines (one from CPM, one from LicenseReader), no errors.

Related issues

@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

This pull request does not have a backport label. Could you fix it @kaisecheng? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

Move tracker creation from Agent to Runner so x-pack services (CPM and
monitoring) receive it via their constructors at build time. Removes the
post-construction wiring step that previously attached the tracker after
each source was created.
- CPM: CA cert rotation rebuilds both ElasticsearchSource and LicenseReader clients, and the new connection remains functional
- monitoring (auto-reload on): CA cert rotation rebuilds the license reader client and monitoring docs continue to accumulate
- monitoring (auto-reload off): cert rotation is ignored and no error is raised
@kaisecheng
Copy link
Copy Markdown
Contributor Author

run exhaustive tests

Copy link
Copy Markdown
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks like this does what it intends to do.

I would prefer to encapsulate the complexity of rebuilding the client so that the callers can instantiate a client-holder with or without a tracker, and then can use that client-holder to get a current client without needing to manage the client's lifecycle.

I have opened kaisecheng#1 as an alternative.

Comment thread x-pack/lib/helpers/ssl_rebuildable.rb Outdated
Comment on lines +28 to +30
def client
@client ||= @client_factory.call
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pull the complexity into this method so that the caller doesn't need to concern itself with sending #maybe_rebuild?

One approach is kaisecheng#1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted in 0d77add. Thanks for the cleaner encapsulation.

For symlink paths, every client invocation now calls tracker.consume_stale to refresh the latest mtime. This doubles the calls vs my prior maybe_rebuild in the happy path. For the retry case (retry_handler.try(10.times, ...)), worst case multiplies by 10. For regular files this is negligible. Failure retries are slow by nature anyway, so I'm not too concerned about the doubled calls

Comment thread logstash-core/lib/logstash/ssl_file_tracker.rb
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on tests.

return if ids.empty?
target_ids = Set.new(Array(ids).map(&:to_sym))
target_ids = Array(ids).map(&:to_sym)
return if target_ids.empty?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a reason why target_ids is not anymore a Set of symbols and switched to Array which could contains duplicates?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_ids can never be duplicated. Updated the comment to remove "Array" from the parameter description

Comment thread logstash-core/spec/logstash/runner_spec.rb Outdated
Comment on lines +568 to +573
it "returns absolute paths for every configured SSL suffix" do
allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.certificate_authority").and_return("/tmp/ca.crt")
allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.truststore.path").and_return("relative/ts.p12")
allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.keystore.path").and_return(nil)
allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.certificate").and_return("")
allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.key").and_return(nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mocking this way, we could use the same pattern used in other tests, where we clone the SETTINGS and set just these settings we are interested into using the helper mock_settings, like in

let(:settings) { mock_settings("pipeline.batch.metrics.sampling_mode" => batch_sampling_mode) }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloning SETTINGS won't work for xpack configs, as core doesn't register xpack and raises error for unknown

kaisecheng and others added 4 commits April 30, 2026 12:09
Introduces a new `LogStash::Helpers::ElasticsearchClientHolder` that callers
can use to get a current elasticsearch client WITHOUT needing to manage the
state of that client.

Provides two implementations, one of which is selected depending on whether
a tracker is provided:

 - `Lazy`: simply creates the client on first use and returns the same
   client instance for all subsequent calls
 - `SslRebuildable`: similarly lazily creates the client on first use, but
   rebuilds clients that have been marked stale by the provided tracker
   before returning them to the caller.
@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @kaisecheng

@kaisecheng kaisecheng requested review from andsel and yaauie April 30, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

License checker supports cert reload

4 participants