[oracle] Reduce false negatives in metric system tests by shmsr · Pull Request #17959 · elastic/integrations

shmsr · 2026-03-21T10:10:23Z

Summary

This PR reduces false negatives in the Oracle metric system tests by making each metric datastream test wait for at least two indexed documents before the test case is considered complete. The change applies to the memory, performance, sysmetric, system_statistics, and tablespace system test configs.

Problem

The failures in build 40045 were initially reported as Oracle connection failures, but the full job evidence showed a more specific pattern. The uploaded oracle-system JUnit artifact reported that the built-in elastic-agent logs - ... checks failed because the Oracle metric input briefly transitioned from HEALTHY to DEGRADED with ORA-12541: TNS:no listener. However, the main Buildkite log also showed that the same datastreams eventually reached healthy Oracle containers and successfully indexed documents.

That combination matters. It means the package was not failing because it could not collect data at all. Instead, the system suite was failing because a transient Oracle connection error appeared in the agent logs during the test lifecycle, even though collection recovered and data was indexed successfully.

Why this fix

The Oracle package test configs do not have a package-side option to narrowly suppress or filter the built-in elastic-agent logs assertion for a known transient Oracle error. The practical lever available in the package is to change when the test considers collection successful.

Using assert.min_count is the safest way to do that. It keeps the change local to the system tests, does not alter Oracle package runtime behavior, and avoids relying on changes in elastic-package itself. It also fits this failure mode better than assert.hit_count, which is exact and therefore brittle for metric collection, or assert.fields_present, which validates document contents but does not reliably extend the test long enough past the first successful fetch.

Why `min_count: 2`

2 is the smallest value that meaningfully changes the timing of the tests. With the previous behavior, a test could finish immediately after the first successful document and start teardown while the sql/metrics input was still active. If Oracle disappeared during or just after that first success window, the input could briefly emit a HEALTHY -> DEGRADED event with ORA-12541, and the log assertion would fail the test even though the datastream had already proven it could collect data.

Requiring at least two documents forces the test to survive one additional successful collection cycle. That gives the Oracle metric input more time to settle after startup and makes the test less sensitive to a single transient connection blip around the first successful fetch. The value is intentionally minimal: it extends the observation window without turning the tests into exact-hit-count checks or adding more delay than necessary.

Evidence used

The change is based on the logs from build 40045.

The uploaded JUnit artifact build/test-results/oracle-system-1774069460280051161.xml showed five failures, all caused by transient HEALTHY -> DEGRADED Oracle metric input log events with ORA-12541. At the same time, the main Buildkite console log showed that memory, performance, sysmetric, system_statistics, and tablespace all progressed to healthy Oracle containers and indexed documents successfully. Based on that evidence, this PR is aimed at reducing a false-negative test outcome rather than changing the behavior of the integration itself.

Test plan

elastic-package test system --data-streams memory -v --report-format human
elastic-package test system --data-streams performance -v --report-format human
elastic-package test system --data-streams sysmetric -v --report-format human
elastic-package test system --data-streams system_statistics -v --report-format human
elastic-package test system --data-streams tablespace -v --report-format human
Inspect build 40045 logs and the uploaded oracle-system JUnit artifact.
Confirm from the Buildkite console log that the affected Oracle metric datastreams still indexed data successfully.
Local execution is currently blocked in this environment because Docker is unavailable to elastic-package.

Related issues

Relates #17957.
Relates #17958.

Require two documents in Oracle metric system tests so the agent has time to settle after the first successful collection. This reduces false failures from transient sql.query degradations that occur even when data is indexed successfully.

elastic-vault-github-plugin-prod · 2026-03-21T10:47:13Z

🚀 Benchmarks report

Package `oracle` 👍(0) 💚(0) 💔(1)

Expand to view

Data stream	Previous EPS	New EPS	Diff (%)	Result
`database_audit`	17241.38	12345.68	-4895.7 (-28.4%)	💔

To see the full report comment with /test benchmark fullreport

elasticmachine · 2026-03-21T10:47:15Z

💚 Build Succeeded

Buildkite Build
Commit: 29f9427

cc @shmsr

shmsr requested a review from a team as a code owner March 21, 2026 10:10

shmsr self-assigned this Mar 21, 2026

shmsr changed the title ~~[oracle] Reduce transient system test log failures~~ [oracle] Reduce false negatives in metric system tests Mar 21, 2026

andrewkroh added Integration:oracle Oracle Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] labels Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oracle] Reduce false negatives in metric system tests#17959

[oracle] Reduce false negatives in metric system tests#17959
shmsr wants to merge 1 commit intoelastic:mainfrom
shmsr:fix/oracle-healthcheck-startup-tolerance

shmsr commented Mar 21, 2026 •

edited

Loading

Uh oh!

elastic-vault-github-plugin-prod bot commented Mar 21, 2026

Uh oh!

elasticmachine commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shmsr commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Why this fix

Why min_count: 2

Evidence used

Test plan

Related issues

Uh oh!

elastic-vault-github-plugin-prod bot commented Mar 21, 2026

🚀 Benchmarks report

Package oracle 👍(0) 💚(0) 💔(1)

Uh oh!

elasticmachine commented Mar 21, 2026

💚 Build Succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shmsr commented Mar 21, 2026 •

edited

Loading

Why `min_count: 2`

Package `oracle` 👍(0) 💚(0) 💔(1)