Skip to content

filebeat: add autodiscover takeover reingest integration test#49606

Draft
github-actions[bot] wants to merge 3 commits intomainfrom
ai/filestream-autodiscover-takeover-test-49579-17218a1339d5e990
Draft

filebeat: add autodiscover takeover reingest integration test#49606
github-actions[bot] wants to merge 3 commits intomainfrom
ai/filestream-autodiscover-takeover-test-49579-17218a1339d5e990

Conversation

@github-actions
Copy link
Contributor

Summary

Adds a new integration regression test for autodiscover + filestream takeover to reproduce issue #49579.

Changes

  • Added TestAutodiscoverFilestreamTakeOverDoesNotReingest in filebeat/tests/integration/autodiscover_test.go.
  • Added test config template filebeat/tests/integration/testdata/autodiscover/docker-take-over.yml.

Test behavior

  1. Start a Docker flog container.
  2. Run Filebeat with a static filestream input (id: container-logs) to create initial state/events.
  3. Restart Filebeat with Docker autodiscover and filestream take_over.from_ids: [container-logs].
  4. Assert new events after restart stay below threshold (detecting re-ingestion).

Validation

  • Attempted: go test ./filebeat/tests/integration -run TestAutodiscoverFilestreamTakeOverDoesNotReingest -tags integration -count=1
  • In this runner, full execution is constrained (heavy integration dependencies / docker availability), so please run the command above in CI or a Docker-enabled dev environment.

Requested context: issue #49579.


What is this? | From workflow: Mention in Issue

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot requested a review from a team as a code owner March 23, 2026 16:36
@github-actions github-actions bot added the ai label Mar 23, 2026
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 23, 2026
@mergify
Copy link
Contributor

mergify bot commented Mar 23, 2026

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @github-actions[bot]? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@belimawr belimawr self-assigned this Mar 23, 2026
@belimawr belimawr marked this pull request as draft March 23, 2026 16:37
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 23, 2026
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 23, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This change adds an integration test and corresponding configuration to verify that Filebeat's filestream input with take_over.enabled: true does not re-ingest files when used alongside autodiscover. The test starts a Docker container, ingests initial logs, stops Filebeat, restarts it with an updated autodiscover configuration, and verifies that the number of additional ingested lines remains below a threshold after the take-over occurs.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR adds a regression test that reproduces issue #49579 by verifying that filestream take_over with autodiscover does not cause file re-ingestion.
Out of Scope Changes check ✅ Passed All changes are scoped to test implementation and configuration; no production code modifications outside PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ai/filestream-autodiscover-takeover-test-49579-17218a1339d5e990
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filebeat/tests/integration/autodiscover_test.go`:
- Around line 157-188: The test uses a fixed cap of 8 lines in the
require.LessOrEqual assertion comparing newEvents (computed from initialEvents
and eventsAfterTakeOver) which makes the test flaky on slow CI; instead compute
a dynamic cap based on the wall-clock elapsed time between when initialEvents is
recorded and when eventsAfterTakeOver is recorded (use the same scope where
initialEvents, filebeat.Start()/WaitLogsContains and the Sleep happen), then
replace the static "8" in the require.LessOrEqual with a derived maxAllowed
value (e.g., maxAllowed = int(elapsed.Seconds())*estimatedLinesPerSecond +
grace) using conservative estimatedLinesPerSecond and a small grace margin to
tolerate legitimate lines produced while Filebeat was stopped/restarting; update
the assertion around newEvents (the require.LessOrEqual call) to use this
computed maxAllowed so the test is stable across slower hosts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa432a05-f384-49df-b908-e0d77ed02b3e

📥 Commits

Reviewing files that changed from the base of the PR and between 11d1fe7 and d1db5b7.

📒 Files selected for processing (2)
  • filebeat/tests/integration/autodiscover_test.go
  • filebeat/tests/integration/testdata/autodiscover/docker-take-over.yml

Comment on lines +157 to +188
filebeat.Stop()
initialEvents := filebeat.CountFileLines(outputFile)

cfgYAML := getConfig(
t,
map[string]any{
"containerID": containerID,
"homePath": workDir,
},
"autodiscover",
"docker-take-over.yml",
)
filebeat.WriteConfigFile(cfgYAML)
filebeat.Start()

filebeat.WaitLogsContains(
fmt.Sprintf(
`"message":"Input 'filestream' starting","service.name":"filebeat","id":"%s-logs"`,
containerID,
),
30*time.Second,
"Filestream did not start for the test container using autodiscover")

time.Sleep(5 * time.Second)

eventsAfterTakeOver := filebeat.CountFileLines(outputFile)
newEvents := eventsAfterTakeOver - initialEvents
require.LessOrEqual(
t,
newEvents,
8,
"autodiscover filestream takeover re-ingested old events",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Flaky threshold: fixed 8-line cap can fail without re-ingestion

Line 180 + Lines 184-188 use a fixed wait and static cap, but newEvents also includes legitimate lines produced while Filebeat is stopped/restarting. On slower CI hosts this can exceed 8 even when takeover is correct, making the regression test nondeterministic.

Proposed fix (derive cap from elapsed wall time)
 func TestAutodiscoverFilestreamTakeOverDoesNotReingest(t *testing.T) {
@@
-	filebeat.Stop()
+	filebeat.Stop()
+	stoppedAt := time.Now()
 	initialEvents := filebeat.CountFileLines(outputFile)
@@
-	time.Sleep(5 * time.Second)
+	time.Sleep(5 * time.Second)

 	eventsAfterTakeOver := filebeat.CountFileLines(outputFile)
 	newEvents := eventsAfterTakeOver - initialEvents
+	// flog emits ~1 line/sec. Account for downtime + observation window + small slack.
+	maxExpectedNewEvents := int(time.Since(stoppedAt).Seconds()) + 3
 	require.LessOrEqual(
 		t,
 		newEvents,
-		8,
+		maxExpectedNewEvents,
 		"autodiscover filestream takeover re-ingested old events",
 	)
 }

As per coding guidelines, this is a merge-blocking reliability issue because it can cause false test failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@filebeat/tests/integration/autodiscover_test.go` around lines 157 - 188, The
test uses a fixed cap of 8 lines in the require.LessOrEqual assertion comparing
newEvents (computed from initialEvents and eventsAfterTakeOver) which makes the
test flaky on slow CI; instead compute a dynamic cap based on the wall-clock
elapsed time between when initialEvents is recorded and when eventsAfterTakeOver
is recorded (use the same scope where initialEvents,
filebeat.Start()/WaitLogsContains and the Sleep happen), then replace the static
"8" in the require.LessOrEqual with a derived maxAllowed value (e.g., maxAllowed
= int(elapsed.Seconds())*estimatedLinesPerSecond + grace) using conservative
estimatedLinesPerSecond and a small grace margin to tolerate legitimate lines
produced while Filebeat was stopped/restarting; update the assertion around
newEvents (the require.LessOrEqual call) to use this computed maxAllowed so the
test is stable across slower hosts.

GenAI-Assisted: Yes
Human-Reviewed: Yes
Tool: Cursor-CLI, Model: GPT-5.3 Codex High
 - Use files instead of adding the config in the middle of the test
 - Remove time.Sleep
 - Fix duplication count
@github-actions
Copy link
Contributor Author

TL;DR

golangci-lint failed on all 3 OS jobs due to three new lint findings in filebeat/tests/integration/autodiscover_test.go (typo, staticcheck simplification, and testifylint assertion style). Update those lines and re-run CI.

Remediation

  • In filebeat/tests/integration/autodiscover_test.go, change occuredoccurred, replace require.EqualValuesf(...) with require.Equalf(...), and simplify the if err != nil { return false }; return true block to return err == nil.
  • Re-run the same PR workflow (golangci-lint) and confirm lint (ubuntu-latest), lint (macos-latest), and lint (windows-latest) all pass.
Investigation details

Root Cause

The run failed consistently across all runners because the same three linter violations were introduced in the new integration test file. This is a source-level lint issue, not an environment-specific failure.

Evidence

  • Workflow: https://github.com/elastic/beats/actions/runs/23459008504
  • Job/step: lint (ubuntu-latest)golangci-lint (same findings in macOS/windows jobs)
  • Key log excerpt:
    • filebeat/tests/integration/autodiscover_test.go:177:26: \occured` is a misspelling of `occurred` (misspell)`
    • filebeat/tests/integration/autodiscover_test.go:218:3: S1008: should use 'return err == nil' ... (staticcheck)
    • filebeat/tests/integration/autodiscover_test.go:173:2: equal-values: use require.Equalf (testifylint)

Validation

  • Not run locally (read-only detective workflow); diagnosis is based on failed job logs for run 23459008504.

Follow-up

  • After applying the three lint fixes, no workflow/config changes should be required.

What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Filestream] TakeOver with Autodiscover causes files re-ingestion.

1 participant