Tag retried-test non-final attempts as skip in JUnit report#11443
Tag retried-test non-final attempts as skip in JUnit report#11443mhdatie wants to merge 1 commit into
Conversation
The Develocity testRetry plugin emits one <testcase> per attempt sharing the same (classname, name); CI treats only the final attempt as authoritative but Test Optimization was surfacing earlier failed attempts as real failures. Mark non-final attempts with final_status=skip in the JUnit XML post-processor so reports match what CI considers the outcome. Jira: APMLP-1297 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@bric3 Tagged you for early review to compare between this change and the alternative provided in the description before I open it up for review 🙇 . |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
I'm getting a tad concerned by the lack of visibility on retries.
Also as a matter of followup on the initiative here's the previous work from @cbeauchesne
| var reportChangedBeforeFinalStatus = report.addFileAttribute(sourceFile); | ||
| reportChangedBeforeFinalStatus |= report.normalizeStableTestNames(); | ||
| report.tagSyntheticFailures(); | ||
| report.tagRetriedTests(); |
There was a problem hiding this comment.
issue: This happens after normalizeStableTestNames();, so basically two different tests could end up with the same name, and one can get silently skipped.
E.g.
localhost:12345andfinal_status=fail=> after normalization :localhost:PORTand this one would incorrectly switch tofinal_status=skiplocalhost:23456endfinal_status=pass=> after normalization :localhost:PORT
There was a problem hiding this comment.
Great catch..
- Do we normalize localhost failures and then just look at the count in Test Optim.?
- What do you think of excluding the normalized cases from this operation?
I'm getting a tad concerned by the lack of visibility on retries.
Can you elaborate? It would be great to autodetect retries. Are these manually detected by us today? (cc @cbeauchesne) - I know that Flaky tests are labelled in Datadog and from there we can detect if they were retried (grouped by name)
There was a problem hiding this comment.
On catching retries proactively, I'm not sure if there are plans to auto-detect these retries by @DataDog/ci-app-libraries team.
I did some planning with Claude, and the suggestion was to:
- Spend time understanding how detection can be implemented with Develocity's testRetry plugin (Gradle TestListener, JUnit 5 TestExecutionListener, etc)
- Use
CiVisibilityMetricCollectorandCiVisibilityCountMetricsince they have the metadata/tags to detect retries - Document ownership
Claude Details
Data model — trivial
Add a RetryReason.develocity enum value in internal-api/.../telemetry/tag/RetryReason.java. The existing TEST_EVENT_FINISHED count metric and Tags.TEST_RETRY_REASON span tag pick it up automatically.
Detection — the real work
Identifying that a given test execution is a Develocity retry requires one of:
- A new instrumentation module hooking the plugin's retry executor class.
- Extending the existing Gradle build-system instrumentation in
agent-ci-visibilityto detect thetestRetryextension on theTesttask and propagate a sysprop into forked test JVMs. - A same-session re-execution heuristic — fallback only, if neither of the above is viable.
Wiring
The resulting signal lands in TestEventsHandlerImpl.java:263, and must not override existing atr / efd / attemptToFix reasons.
Open decisions
- Whether to also set
IsRetry— probably yes. - Whether to set
HasFailedAllRetries— probably no, since Develocity controls the retry budget out-of-process.
Validation
A smoke test under dd-smoke-tests/ with the plugin applied to a flaky test, plus a telemetry assertion that event_finished carries retry_reason:develocity_test_retry.
Ownership
RetryReason.java, TestEventsHandlerImpl.java, and anything under agent-ci-visibility/ are owned by @DataDog/ci-app-libraries — the change will require their review even if apm-lang-platform-java drives
it.
Recommendation
Do a short Phase-0 spike on the Develocity plugin's internals before committing to a detection approach.
There was a problem hiding this comment.
Pull request overview
This PR updates the .gitlab/collect-result JUnit XML post-processor so that when the same test (classname, name) appears multiple times (as produced by Develocity’s testRetry), only the final attempt retains its computed status while earlier attempts are tagged with dd_tags[test.final_status]=skip to avoid surfacing them as real failures in Test Optimization.
Changes:
- Adds
JUnitReport.tagRetriedTests()to tag non-final retry attempts asskip. - Wires
tagRetriedTests()intoResultCollectorbetweentagSyntheticFailures()andtagFinalStatuses()to prevent later fallback tagging from overriding theskipstatus.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
.gitlab/collect-result/ResultCollector.java |
Calls tagRetriedTests() during report post-processing, before final status tagging. |
.gitlab/collect-result/JUnitReport.java |
Implements tagRetriedTests() to identify duplicate (classname, name) testcases and tag earlier ones as skip. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (var i = 0; i < all.size(); i++) { | ||
| var current = all.get(i); | ||
| var classname = current.getAttribute("classname"); | ||
| var name = current.getAttribute("name"); | ||
| for (var j = i + 1; j < all.size(); j++) { |
There was a problem hiding this comment.
Will apply based on outcome of above convo #11443 (review)
What Does This Do
Add
JUnitReport.tagRetriedTests()and wire it into theResultCollectorpost-processing pipeline betweentagSyntheticFailures()andtagFinalStatuses().The new method scans every
<testcase>element and, when a later sibling shares the same(classname, name), marks the earlier one withdd_tags[test.final_status]=skipvia the existingaddFinalStatusProperty(..., APPEND_TO_TESTCASE)helper.skip.Ordering matters —
tagRetriedTestsmust run beforetagFinalStatuses, otherwise the per-testcase fallback tagger would overwrite theskipwithfail.Motivation
When Develocity's testRetry plugin retries a failed test, the JUnit XML contains one
<testcase>per attempt, all sharing the same(classname, name). CI treats the test as green if the final attempt passes, but Test Optimization was tagging the earlier failed attempts asfinal_status=failand surfacing them as real failures.Example trace where a retried-then-passed test shows up as a failure
Additional Notes
Alternatives considered
Path B — generalize and delete the
initializationErrorbranch fromtagSyntheticFailures. The existinginitializationErrorhandling is a strict subset ofthe new logic: same group-and-skip-all-but-last pattern, just restricted to one literal name. Folding it into
tagRetriedTestswould remove a few lines, but wasrejected because:
addFinalStatusPropertyshort-circuits when afinal_statusproperty already exists, so the second tag against the same testcase isa no-op.
initializationErrorbranch carries source-pinned links to JUnit 4 and Gradle that justify why that specific literal is treated as a framework synthetic. Deleting it sheds context that was deliberately added in Stop skipping"test exception"test cases, they are not synthetic #11427.Scope
Touches only
.gitlab/collect-result/(the build-time JUnit XML post-processor that uploads results to Test Optimization). No agent / tracer code paths are affected — CI tooling, not user-visible.Jira ticket: APMLP-1297