Extend TLS auto-reload to CPM and monitoring service clients#19045
Extend TLS auto-reload to CPM and monitoring service clients#19045kaisecheng wants to merge 11 commits intoelastic:mainfrom
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label. Could you fix it @kaisecheng? 🙏
|
Add SSL tracking API to LicenseReader
Move tracker creation from Agent to Runner so x-pack services (CPM and monitoring) receive it via their constructors at build time. Removes the post-construction wiring step that previously attached the tracker after each source was created.
- CPM: CA cert rotation rebuilds both ElasticsearchSource and LicenseReader clients, and the new connection remains functional - monitoring (auto-reload on): CA cert rotation rebuilds the license reader client and monitoring docs continue to accumulate - monitoring (auto-reload off): cert rotation is ignored and no error is raised
57a920f to
0ecb053
Compare
|
run exhaustive tests |
yaauie
left a comment
There was a problem hiding this comment.
Overall, it looks like this does what it intends to do.
I would prefer to encapsulate the complexity of rebuilding the client so that the callers can instantiate a client-holder with or without a tracker, and then can use that client-holder to get a current client without needing to manage the client's lifecycle.
I have opened kaisecheng#1 as an alternative.
| def client | ||
| @client ||= @client_factory.call | ||
| end |
There was a problem hiding this comment.
Could we pull the complexity into this method so that the caller doesn't need to concern itself with sending #maybe_rebuild?
One approach is kaisecheng#1
There was a problem hiding this comment.
Adopted in 0d77add. Thanks for the cleaner encapsulation.
For symlink paths, every client invocation now calls tracker.consume_stale to refresh the latest mtime. This doubles the calls vs my prior maybe_rebuild in the happy path. For the retry case (retry_handler.try(10.times, ...)), worst case multiplies by 10. For regular files this is negligible. Failure retries are slow by nature anyway, so I'm not too concerned about the doubled calls
| return if ids.empty? | ||
| target_ids = Set.new(Array(ids).map(&:to_sym)) | ||
| target_ids = Array(ids).map(&:to_sym) | ||
| return if target_ids.empty? |
There was a problem hiding this comment.
There is a reason why target_ids is not anymore a Set of symbols and switched to Array which could contains duplicates?
There was a problem hiding this comment.
target_ids can never be duplicated. Updated the comment to remove "Array" from the parameter description
| it "returns absolute paths for every configured SSL suffix" do | ||
| allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.certificate_authority").and_return("/tmp/ca.crt") | ||
| allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.truststore.path").and_return("relative/ts.p12") | ||
| allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.keystore.path").and_return(nil) | ||
| allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.certificate").and_return("") | ||
| allow(settings).to receive(:get_value).with("xpack.management.elasticsearch.ssl.key").and_return(nil) |
There was a problem hiding this comment.
Instead of mocking this way, we could use the same pattern used in other tests, where we clone the SETTINGS and set just these settings we are interested into using the helper mock_settings, like in
There was a problem hiding this comment.
cloning SETTINGS won't work for xpack configs, as core doesn't register xpack and raises error for unknown
Introduces a new `LogStash::Helpers::ElasticsearchClientHolder` that callers can use to get a current elasticsearch client WITHOUT needing to manage the state of that client. Provides two implementations, one of which is selected depending on whether a tracker is provided: - `Lazy`: simply creates the client on first use and returns the same client instance for all subsequent calls - `SslRebuildable`: similarly lazily creates the client on first use, but rebuilds clients that have been marked stale by the provided tracker before returning them to the caller.
💚 Build Succeeded
History
cc @kaisecheng |
Release notes
Logstash now automatically rebuilds the Elasticsearch clients used by Central Pipeline Management (CPM) and legacy internal monitoring when their TLS certificate files change on disk. This removes the need to restart Logstash after certificate rotation for those services. Requires
ssl.reload.automatic: trueandconfig.reload.automatic: true, which is automatically enabled whenxpack.management.enabled: true.What does this PR do?
Extends the SSL file watch infrastructure introduced in #18978 to the x-pack background service clients that live outside the pipeline lifecycle:
ElasticsearchSource(CPM's pipeline-config fetcher) rebuilds its ES client when CPM TLS files change.LicenseReaderin bothLicenseManagerinstances (CPM's and monitoring's) rebuilds the license-check client on TLS rotation.Tracking IDs
SslFileTrackerkeeps one set of registered paths per id. X-pack service ids use a leading.to keep them disjoint from user pipeline ids::".cpm"ElasticsearchSource(CPM config fetcher)xpack.management.elasticsearch.ssl.*:".cpm_license"LicenseReaderinside CPM'sLicenseManagerxpack.management.elasticsearch.ssl.*:".monitoring-logstash"elasticsearchoutput)xpack.monitoring.elasticsearch.ssl.*:".monitoring_license"LicenseReaderinside monitoring'sLicenseManagerxpack.monitoring.elasticsearch.ssl.*Implementation notes
Moved the creation of Tracker to
Runner, notAgent. CPM'sElasticsearchSourceis built during:after_bootstrap_checksbeforeAgent.new, soAgentcannot own the tracker. The exact creation site is betweenbootstrap_checks.eachand:after_bootstrap_checks: CPM'sBootstrapCheckflipsconfig.reload.automatic = trueduringeach, and CPM's hook builds its source on:after_bootstrap_checks. Anywhere outside that window either misreads the activation gate or constructs the source before the tracker exists.LogStash::Helpers::ElasticsearchClientHolderowns one lazy-built ES client. The factorycreate(tracker, id) { build_client }returns either:Lazy(no tracker) — memoise the factory result.SslRebuildable(with tracker) — calltracker.consume_staleon everyget; when stale, close the old client and rebuild before returning.LicenseReaderandElasticsearchSourcejust callholder.get. Rebuild lifecycle is invisible to callers.Why is it important/What is the impact to the user?
Without this PR, rotating certificates used by CPM or internal monitoring forces a full Logstash restart, which interrupts pipeline processing. With this PR, certificate rotation is hot-applied by the next scheduled fetch tick, matching the user-pipeline behavior that PR2 already delivered.
Checklist
Author's Checklist
cpm_tls_hot_reload_spec.rb)config.reload.automatic: truerebuilds the license-reader client; monitoring docs continue to ship (monitoring_tls_hot_reload_spec.rb)config.reload.automatic: false(default) is silently ignored; no error logged (monitoring_tls_hot_reload_spec.rb)How to test this PR locally
Manual
config.reload.automatic: true+ssl.reload.automatic: truexpack.management.elasticsearch.ssl.certificate_authority. The tracker hashes regular file content with SHA-256, so any byte change is detected:printf '\n' >> /path/to/your/ca.crt. Adds a trailing newline outside the PEM block, so ES connectivity is unaffected.openssl req -x509 ...and append its PEM to the bundle (the connection survives because both the old and the new CA are trusted). Seecpm_tls_hot_reload_spec.rbfor the full pattern.log.level: debug(the rebuild events are logged at DEBUG). Within ~30s expect tworebuilt elasticsearch client `<id>` on certificate changelog lines (one from CPM, one from LicenseReader), no errors.Related issues