Conversation
|
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
|
6f29f5b to
bfe8b8e
Compare
✅ Vale Linting ResultsNo issues found on modified lines! The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
🔍 Preview links for changed docs |
bfe8b8e to
fb2df12
Compare
|
This pull request is now in conflicts. Could you fix it @ycombinator? 🙏 |
The test previously referenced ErrOpAMPDisabled and handleOpAMP which no longer exist. The feature flag check now happens at route registration time, so test the Enabled() method directly instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The server's IdleTimeout (30s) matches the OTel Collector's polling interval (~30s), causing a race where the server closes the idle connection just as the client tries to reuse it. Setting Connection: close on OpAMP responses forces a fresh connection per poll, eliminating the race with negligible overhead given the 30s polling interval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The otelcol-opamp.tpl template accesses {{ .OpAMP.APIKey }} and
{{ .OpAMP.InstanceUID }}, so the template data must nest these under
an "OpAMP" key rather than passing them as flat top-level keys.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The otelcol config was being written to config.yml, overwriting the fleet-server config in the same temp dir. Rename it to otelcol.yml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use runtime.GOOS and runtime.GOARCH to build the download URL dynamically instead of hardcoding darwin_arm64. Also chmod the extracted binary since extractTarGz doesn't preserve permissions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use explicit Close() instead of defer since resp is reassigned later in the function, which would cause the deferred close to act on the wrong response. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase context timeout from 1 to 3 minutes to account for the otelcol-contrib download. Use defer for cancel() and cmd.Wait() so cleanup happens even on test failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract instanceUID and apiKey into variables, remove the placeholder time.Sleep, and start the otelcol-contrib binary with the OpAMP extension config pointing at fleet-server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
michel-laterman
left a comment
There was a problem hiding this comment.
I'm assuming we'll need to update the expected state for this test once Kibana changes so that opamp agents don't appear as "Updating"
41597ba
michel-laterman
left a comment
There was a problem hiding this comment.
We should also consider moving the opamp e2e test into its own suite so it can be easily extended in the future
|
@Mergifyio backport 9.2 9.3 |
✅ Backports have been createdDetails
|
* Implement API boilerplate for POST /v1/opamp endpoint
* Add OpAMP section to dev doc
* Flesh out dev doc
* Update dev doc to use Fleet enrollment token
* Check feature flag before handing OpAMP requests
* Allow running specific tests with TEST_RUN env var
* Removing irrelevant file
* WIP: Reimplement using opamp-go server package
* Update spec
* Move OpAMP documentation to separate file
* Remove error that's no longer needed
* Update OpAMP feature flag test to use Enabled() method
The test previously referenced ErrOpAMPDisabled and handleOpAMP which
no longer exist. The feature flag check now happens at route registration
time, so test the Enabled() method directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Disable HTTP keep-alive for OpAMP requests to fix EOF errors
The server's IdleTimeout (30s) matches the OTel Collector's polling
interval (~30s), causing a race where the server closes the idle
connection just as the client tries to reuse it. Setting Connection:
close on OpAMP responses forces a fresh connection per poll, eliminating
the race with negligible overhead given the 30s polling interval.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Adding configuration files to be used by OpAMP E2E test
* WIP: Adding OpAMP E2E test
* Fix otelcol template data to use nested OpAMP keys
The otelcol-opamp.tpl template accesses {{ .OpAMP.APIKey }} and
{{ .OpAMP.InstanceUID }}, so the template data must nest these under
an "OpAMP" key rather than passing them as flat top-level keys.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use distinct filename for otelcol config in TestOpAMP
The otelcol config was being written to config.yml, overwriting the
fleet-server config in the same temp dir. Rename it to otelcol.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Make otelcol-contrib download URL platform-aware in TestOpAMP
Use runtime.GOOS and runtime.GOARCH to build the download URL
dynamically instead of hardcoding darwin_arm64. Also chmod the
extracted binary since extractTarGz doesn't preserve permissions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix resp.Body handling in TestOpAMP
Use explicit Close() instead of defer since resp is reassigned later
in the function, which would cause the deferred close to act on the
wrong response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Increase TestOpAMP timeout and use defer for cleanup
Increase context timeout from 1 to 3 minutes to account for the
otelcol-contrib download. Use defer for cancel() and cmd.Wait() so
cleanup happens even on test failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Start OTel Collector in TestOpAMP
Extract instanceUID and apiKey into variables, remove the placeholder
time.Sleep, and start the otelcol-contrib binary with the OpAMP
extension config pointing at fleet-server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Verify agent enrollment in TestOpAMP
Poll Kibana via AgentIsOnline to confirm the OTel Collector was
enrolled as an agent in Fleet Server after connecting via OpAMP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Extract OTel Collector version into package-level constant
Move the hardcoded otelcol-contrib version into otelColContribVersion
in const.go so it can be easily updated in one place.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Continue writing TestOpAMP e2e test
- Configure fleet-server with a static policy token for dummy-policy so
that GetEnrollmentTokenForPolicyID can find the enrollment token
- Fetch enrollment token before the raw POST to /v1/opamp
- Add Authorization and Content-Type headers to the raw POST
- Assert HTTP 200 response from the raw POST
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix TestOpAMP e2e test
- Enroll a dummy agent before starting the OTel Collector to initialize
the .fleet-agents index. Without this, findEnrolledAgent fails with
index_not_found_exception in a standalone fleet-server environment
(unlike agent-managed fleet-server which self-enrolls on startup).
- Add AgentHasStatus scaffold method that accepts multiple acceptable
statuses, and AgentIsUpdating that delegates to it.
- Use AgentIsUpdating in TestOpAMP: OpAMP agents communicate via the
OpAMP protocol rather than Fleet's normal checkin/ack protocol, so
they never acknowledge the initial policy change action and Kibana
shows them as "updating" rather than "online".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fixing conflicts during rebase
* Download OTel Contrib source and build collector from it
* Running go fmt
* Fetch entire Agent doc from ES and make finer-grained assertions on its contents
* Check status from doc field
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7aededf)
* Implement API boilerplate for POST /v1/opamp endpoint
* Add OpAMP section to dev doc
* Flesh out dev doc
* Update dev doc to use Fleet enrollment token
* Check feature flag before handing OpAMP requests
* Allow running specific tests with TEST_RUN env var
* Removing irrelevant file
* WIP: Reimplement using opamp-go server package
* Update spec
* Move OpAMP documentation to separate file
* Remove error that's no longer needed
* Update OpAMP feature flag test to use Enabled() method
The test previously referenced ErrOpAMPDisabled and handleOpAMP which
no longer exist. The feature flag check now happens at route registration
time, so test the Enabled() method directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Disable HTTP keep-alive for OpAMP requests to fix EOF errors
The server's IdleTimeout (30s) matches the OTel Collector's polling
interval (~30s), causing a race where the server closes the idle
connection just as the client tries to reuse it. Setting Connection:
close on OpAMP responses forces a fresh connection per poll, eliminating
the race with negligible overhead given the 30s polling interval.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Adding configuration files to be used by OpAMP E2E test
* WIP: Adding OpAMP E2E test
* Fix otelcol template data to use nested OpAMP keys
The otelcol-opamp.tpl template accesses {{ .OpAMP.APIKey }} and
{{ .OpAMP.InstanceUID }}, so the template data must nest these under
an "OpAMP" key rather than passing them as flat top-level keys.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use distinct filename for otelcol config in TestOpAMP
The otelcol config was being written to config.yml, overwriting the
fleet-server config in the same temp dir. Rename it to otelcol.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Make otelcol-contrib download URL platform-aware in TestOpAMP
Use runtime.GOOS and runtime.GOARCH to build the download URL
dynamically instead of hardcoding darwin_arm64. Also chmod the
extracted binary since extractTarGz doesn't preserve permissions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix resp.Body handling in TestOpAMP
Use explicit Close() instead of defer since resp is reassigned later
in the function, which would cause the deferred close to act on the
wrong response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Increase TestOpAMP timeout and use defer for cleanup
Increase context timeout from 1 to 3 minutes to account for the
otelcol-contrib download. Use defer for cancel() and cmd.Wait() so
cleanup happens even on test failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Start OTel Collector in TestOpAMP
Extract instanceUID and apiKey into variables, remove the placeholder
time.Sleep, and start the otelcol-contrib binary with the OpAMP
extension config pointing at fleet-server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Verify agent enrollment in TestOpAMP
Poll Kibana via AgentIsOnline to confirm the OTel Collector was
enrolled as an agent in Fleet Server after connecting via OpAMP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Extract OTel Collector version into package-level constant
Move the hardcoded otelcol-contrib version into otelColContribVersion
in const.go so it can be easily updated in one place.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Continue writing TestOpAMP e2e test
- Configure fleet-server with a static policy token for dummy-policy so
that GetEnrollmentTokenForPolicyID can find the enrollment token
- Fetch enrollment token before the raw POST to /v1/opamp
- Add Authorization and Content-Type headers to the raw POST
- Assert HTTP 200 response from the raw POST
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix TestOpAMP e2e test
- Enroll a dummy agent before starting the OTel Collector to initialize
the .fleet-agents index. Without this, findEnrolledAgent fails with
index_not_found_exception in a standalone fleet-server environment
(unlike agent-managed fleet-server which self-enrolls on startup).
- Add AgentHasStatus scaffold method that accepts multiple acceptable
statuses, and AgentIsUpdating that delegates to it.
- Use AgentIsUpdating in TestOpAMP: OpAMP agents communicate via the
OpAMP protocol rather than Fleet's normal checkin/ack protocol, so
they never acknowledge the initial policy change action and Kibana
shows them as "updating" rather than "online".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fixing conflicts during rebase
* Download OTel Contrib source and build collector from it
* Running go fmt
* Fetch entire Agent doc from ES and make finer-grained assertions on its contents
* Check status from doc field
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 7aededf)
* Implement API boilerplate for POST /v1/opamp endpoint
* Add OpAMP section to dev doc
* Flesh out dev doc
* Update dev doc to use Fleet enrollment token
* Check feature flag before handing OpAMP requests
* Allow running specific tests with TEST_RUN env var
* Removing irrelevant file
* WIP: Reimplement using opamp-go server package
* Update spec
* Move OpAMP documentation to separate file
* Remove error that's no longer needed
* Update OpAMP feature flag test to use Enabled() method
The test previously referenced ErrOpAMPDisabled and handleOpAMP which
no longer exist. The feature flag check now happens at route registration
time, so test the Enabled() method directly instead.
* Disable HTTP keep-alive for OpAMP requests to fix EOF errors
The server's IdleTimeout (30s) matches the OTel Collector's polling
interval (~30s), causing a race where the server closes the idle
connection just as the client tries to reuse it. Setting Connection:
close on OpAMP responses forces a fresh connection per poll, eliminating
the race with negligible overhead given the 30s polling interval.
* Adding configuration files to be used by OpAMP E2E test
* WIP: Adding OpAMP E2E test
* Fix otelcol template data to use nested OpAMP keys
The otelcol-opamp.tpl template accesses {{ .OpAMP.APIKey }} and
{{ .OpAMP.InstanceUID }}, so the template data must nest these under
an "OpAMP" key rather than passing them as flat top-level keys.
* Use distinct filename for otelcol config in TestOpAMP
The otelcol config was being written to config.yml, overwriting the
fleet-server config in the same temp dir. Rename it to otelcol.yml.
* Make otelcol-contrib download URL platform-aware in TestOpAMP
Use runtime.GOOS and runtime.GOARCH to build the download URL
dynamically instead of hardcoding darwin_arm64. Also chmod the
extracted binary since extractTarGz doesn't preserve permissions.
* Fix resp.Body handling in TestOpAMP
Use explicit Close() instead of defer since resp is reassigned later
in the function, which would cause the deferred close to act on the
wrong response.
* Increase TestOpAMP timeout and use defer for cleanup
Increase context timeout from 1 to 3 minutes to account for the
otelcol-contrib download. Use defer for cancel() and cmd.Wait() so
cleanup happens even on test failure.
* Start OTel Collector in TestOpAMP
Extract instanceUID and apiKey into variables, remove the placeholder
time.Sleep, and start the otelcol-contrib binary with the OpAMP
extension config pointing at fleet-server.
* Verify agent enrollment in TestOpAMP
Poll Kibana via AgentIsOnline to confirm the OTel Collector was
enrolled as an agent in Fleet Server after connecting via OpAMP.
* Extract OTel Collector version into package-level constant
Move the hardcoded otelcol-contrib version into otelColContribVersion
in const.go so it can be easily updated in one place.
* Continue writing TestOpAMP e2e test
- Configure fleet-server with a static policy token for dummy-policy so
that GetEnrollmentTokenForPolicyID can find the enrollment token
- Fetch enrollment token before the raw POST to /v1/opamp
- Add Authorization and Content-Type headers to the raw POST
- Assert HTTP 200 response from the raw POST
* Fix TestOpAMP e2e test
- Enroll a dummy agent before starting the OTel Collector to initialize
the .fleet-agents index. Without this, findEnrolledAgent fails with
index_not_found_exception in a standalone fleet-server environment
(unlike agent-managed fleet-server which self-enrolls on startup).
- Add AgentHasStatus scaffold method that accepts multiple acceptable
statuses, and AgentIsUpdating that delegates to it.
- Use AgentIsUpdating in TestOpAMP: OpAMP agents communicate via the
OpAMP protocol rather than Fleet's normal checkin/ack protocol, so
they never acknowledge the initial policy change action and Kibana
shows them as "updating" rather than "online".
* Fixing conflicts during rebase
* Download OTel Contrib source and build collector from it
* Running go fmt
* Fetch entire Agent doc from ES and make finer-grained assertions on its contents
* Check status from doc field
---------
(cherry picked from commit 7aededf)
Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
What is the problem this PR solves?
// Please do not just reference an issue. Explain WHAT the problem this PR solves here.
This PR ensures that an OTel Collector (from an upstream contrib release) is able to successfully connect to Fleet Server over OpAMP. Preliminary OpAMP support was added in Fleet Server in #6270 so this PR here is a follow up to that work.
Note: There will be a follow up PR that adds a test (or extends this one) to ensure that EDOT Collector is able to successfully connect to Fleet Server over OpAMP (#6394)
How does this PR solve the problem?
// Explain HOW you solved the problem in your code. It is possible that during PR reviews this changes and then this section should be updated.
By adding a new E2E test,
TestStandAloneRunningSuite/TestOpAMPthat downloads and extracts the OTel Collector binary from an upstream contrib release, configures it with theopampextension, configures Fleet Server to turn on thefeature_flags.enable_opampfeature flag, runs the Collector, and verifies that the Collector is connecting to Fleet Server over OpAMP.How to test this PR locally
Design Checklist
Checklist
./changelog/fragmentsusing the changelog toolRelated issues