ci: tiered PR builds + performance test infrastructure#34
ci: tiered PR builds + performance test infrastructure#34
Conversation
Adds a separate performance test suite that runs on dedicated hardware via a self-hosted GitHub Actions runner, so the volume tests don't slow down or destabilize regular CI. Test changes: - Tag VeryLargeMessageVolumeTest, LargeVolumeInMemoryTests, and MultiInstanceHighVolumeTest with @tag("performance") - Failsafe now reads excluded.groups (defaults to "performance") and included.groups (defaults to empty), so the performance group is excluded by default. Override with -Dincluded.groups=performance to run only perf, or -Dexcluded.groups= to run everything. Scripts: - bin/performance-test.sh and bin/performance-test.cmd run the perf suite locally (.cmd for Windows since the runner is on Windows) Workflow: - .github/workflows/performance.yml targets a self-hosted runner labelled [self-hosted, windows, performance] - Triggers: workflow_dispatch (manual) + weekly schedule - Never runs on PRs (security: self-hosted + untrusted code = bad) Documentation: - docs/SELF_HOSTED_RUNNER.md walks through one-time runner setup on Windows with Docker Desktop and the WSL2 backend - README template and AGENTS.md mention the performance suite and link to the runner setup doc - Fix a stale link in README to the renamed VolumeTests file Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a "License headers" section to AGENTS.md explaining how to skip the Mycila license-maven-plugin check. This avoids two recurring problems: - The plugin's git-derived copyright years break inside git worktrees - It auto-bumps years on every touched file, creating noise in git status that distracts from real changes The bin/build.sh and bin/ci-build.sh scripts already pass the flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR builds now run in tiers, each gated on the previous: Tier 1: Unit tests (no Docker, ~3 min) Tier 2: Integration tests (TestContainers, ~10 min) Tier 3: Kafka version matrix (2.8.1 + 3.9.1 only) Tier 4: Performance tests (@tag("performance")) Tiers 3 and 4 run in parallel after Tier 2 passes. Push builds (master) still run the full Kafka matrix (2.8.1, 3.1.0, 3.7.0, 3.9.1) plus an experimental [3.9.1,5) range to catch Kafka 4.x compatibility. Dropped 3.5.0 from the matrix (EOL Aug 2025). Updated the experimental range from [3.1.0,4) to [3.9.1,5) to cover 4.x. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Push events on feature branches were triggering the full Kafka matrix build intended for master. Restrict on.push to master only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Retry failing tests up to 2 times to handle known flaky tests (queuedMessagesNotProcessedOrCommittedIfSubmittedDuringShutdown, JStreamParallelEoSStreamProcessorTest timing issues). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The unit/integration split added overhead without real benefit. Simplified to: 1. Build & Test: full verify (unit + integration), default Kafka 2. Kafka Matrix (2.8.1 + 3.9.1) — after Build & Test 3. Performance Tests — after Build & Test Matrix and performance fan out in parallel after the main build passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use bin/ci-build.sh instead of raw mvnw to match existing CI setup (includes -Pci profile). Add -Dexcluded.groups=performance and -Dsurefire.rerunFailingTestsCount=2 to the script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures -Pci profile is used for compilation consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add [3.9.1,5) range to the PR kafka-matrix with continue-on-error so we get early visibility into 4.x compatibility without blocking PRs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude check review |
|
Claude finished @astubbs's task in 1m 31s —— View job Review complete
|
Co-authored-by: Antony Stubbs <astubbs@users.noreply.github.com>
|
@claude check action run code review failure |
|
Claude finished @astubbs's task in 1m 12s —— View job CI Action Run — Code Review Failure Analysis
Root CauseJob: What happened: Every time This is expected/harmless — it's the action protecting against recursion — but it shows up as a red X on the PR. FixAdd a bot-exclusion guard to jobs:
claude-review:
if: github.event.sender.type != 'Bot'This skips the review job entirely when the
Build and Test is still |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The integration tests were hardcoded to cp-kafka:7.6.0 (Kafka 3.6) regardless of the -Dkafka.version override from the CI matrix. This caused 2.8.1 client to talk to a 3.6 broker, producing BrokerPollSystem errors. Now derives the CP Kafka Docker image version from the actual client version at runtime (CP major = AK major + 4, CP minor = AK minor). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ssigned With Kafka 2.x's eager rebalance protocol, poll() can return records for a partition before onPartitionsAssigned() has fired, leaving the epoch map empty. This caused a NullPointerException in EpochAndRecordsMap. Fix: skip records for partitions with no epoch yet. These records are safe to skip — they haven't been committed, so Kafka will re-deliver them on the next poll after the assignment callback completes. Includes lifecycle test proving: skip on first poll → assignment fires → re-poll succeeds with correct epoch → work created. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kafka 2.8.1 is end-of-life (released April 2021, no security patches). The upstream never ran integration tests against 2.x either. Keep the defensive null-epoch handling and dynamic broker version matching as they're good practice regardless. PR matrix: 3.9.1 + experimental [3.9.1,5) for 4.x visibility Push matrix: 3.1.0, 3.7.0, 3.9.1 + experimental [3.9.1,5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three test suites now start simultaneously on PRs. If unit tests fail (~5 min), GitHub cancels integration and performance immediately. Kafka version matrix is gated on all tests passing. Replaces the sequential Build & Test → fan out approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The default Kafka version (3.9.1) is already tested by the three parallel test suites. The kafka-matrix was duplicating this work. Multi-version testing is covered by the push-to-master matrix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upload JaCoCo reports to Codecov from all test suites and push builds. Fail PRs if overall coverage drops by more than 1% from the base branch. Patch coverage is reported but informational only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov requires a token when branch protection is enabled. Added token reference to both PR and push upload steps. Documented the secret and coverage setup in AGENTS.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #34 +/- ##
=========================================
Coverage ? 77.18%
Complexity ? 1154
=========================================
Files ? 82
Lines ? 4190
Branches ? 386
=========================================
Hits ? 3234
Misses ? 765
Partials ? 191
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Added prepare-agent-integration and report-integration executions to JaCoCo plugin so failsafe (integration test) coverage is captured. Updated Codecov upload to include jacoco-it reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs in parallel with test suites. Reports duplicate Java code blocks across all source including tests. No failure threshold yet — reporting only to establish a baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… testing Three new parallel CI jobs for PRs: - SpotBugs: static analysis for null derefs, concurrency bugs, resource leaks - Dependency Vulnerabilities: OSS Index audit for known CVEs - Mutation Testing (PIT): verifies test assertions are meaningful All run in parallel with existing test suites. SpotBugs and PIT are reporting-only for now (no build failure). Dependency scan uses the existing ossindex plugin config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SpotBugs failed on vertx module due to Jabel cross-compilation. Restrict to parallel-consumer-core for now. OSS Index audit gets 401s from the API intermittently — make it continue-on-error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create bin/ci-unit-test.sh and bin/ci-integration-test.sh so all CI jobs use scripts with consistent -Pci flags instead of inline commands - Add timeout-minutes to all jobs (30 for tests, 10 for SpotBugs, 5 for duplicate/dependency checks, 15 for PIT) - Restore all modules for integration and performance tests (reverts the -pl restriction — slow tests should be investigated, not removed) - Cherry-pick Mutiny release.target=9 fix for full module compilation - Revert performance-test.sh -pl restriction - Document new scripts in README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- jscpd: posts summary table + duplicate block list as PR comment - SpotBugs: uses spotbugs-github-action for inline PR annotations - PIT: posts mutation score summary as PR comment with artifact link All comments are updated in-place on subsequent runs (no spam). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Duplicate Code ReportTwo engines run in parallel for cross-validation. Each has its own thresholds tuned to its baseline - the real safety net is the per-engine "max increase vs base" check. ✅ PMD CPD (Java-aware)
No new clones introduced by this PR. ✅ jscpd (language-agnostic)
No new clones introduced by this PR. Powered by astubbs/duplicate-code-cross-check |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHub's own action is more reliable (no external API auth issues), posts a summary comment on the PR, and uses GitHub's advisory database. Fails on high/critical severity CVEs in new dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dependency ReviewThe following issues were found:
License Issues.github/workflows/performance.yml
pom.xml
OpenSSF Scorecard
Scanned Files
|
…tect latch The test stalled for 50+ minutes on CI because: 1. The 20s wait for 50 concurrent requests was too short for resource-constrained runners 2. If the wait failed, the WireMock threads were orphaned blocking on an unreleased latch, each waiting 30s before timing out Fixes: - Increase wait-for-requests from 20s to 120s - Add @timeout(5 minutes) to the test method - Move latch release into a finally block so threads are always unblocked, even on test failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive Agent Rules section covering git safety, development discipline, code quality, test discipline, CI/automation, documentation, communication, rule sync, and working directory conventions. Update CI section to reflect current parallel build setup with SpotBugs, PIT mutation testing, jscpd duplicate detection, and dependency scanning. These rules were previously scattered across global Claude config and memory files. Consolidating into AGENTS.md ensures all contributors and agents working on this project follow the same standards. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
The Copilot Autofix commit added two top-level permissions blocks, which is invalid YAML. Consolidate into a single top-level block with contents:read, pull-requests:write, checks:write. Remove redundant per-job permissions blocks. Fix em dash characters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run jscpd on both base and PR branches. PR comment now shows a comparison table with delta. Fails if: - Total duplication exceeds 3% (absolute ceiling) - PR introduces more than 2 new duplicate blocks vs base (tolerance: 2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ones - Extract duplicate comparison logic to bin/ci-duplicate-report.js - Fail condition: >3% total OR >0.5% increase vs base - Always post PR comment with comparison table and pass/fail status - Annotate new clones directly on the PR diff as review comments - List new clones separately in the PR comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0.5% was too generous - would allow ~135 new duplicated lines. 0.1% allows ~27 lines, roughly one small duplicated block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Match clones by content hash (md5 of fragment) instead of file:line keys, so line number shifts from refactoring don't create false positives - Use consistent terminology: "clones" everywhere - Remove the "all duplicate blocks" section from PR comment - run jscpd locally if you want the full list - Show "No new clones introduced" when clean - Tighten tolerance to 0.1% Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a parallel CI job that detects semantically similar files using gensim TF-IDF analysis. Complements the existing jscpd block detection with whole-file similarity scoring. Uses our fork with base-vs-PR comparison to report new similarities introduced by the PR. Fails if any file pair exceeds 80% similarity or increases by more than 10%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PMD CPD has deep Java syntax understanding - fewer false positives from imports, annotations, and literals. jscpd's language-agnostic token matching was flagging Java boilerplate as duplication. Changes: - Install PMD 7.9.0 as a step in the duplicate-detection job - Run CPD with --language java --minimum-tokens 70 --format xml - Rewrite ci-duplicate-report.js to parse CPD XML instead of jscpd JSON - Keep all existing features: base-vs-PR comparison with content hash matching, PR comment with delta table, diff annotations on new clones, fail conditions (>3% total or >0.1% increase) - Update AGENTS.md CI section to reflect PMD CPD Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PMD CPD is more aggressive at detecting Java duplication than jscpd (3.94% baseline vs 2.65% with jscpd). Raise the absolute ceiling to 5% so existing code passes. The +0.1% increase rule still prevents regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… report Run both engines in the same job for cross-validation during evaluation. Post one unified PR comment with side-by-side results from each engine. Per-engine thresholds tuned to baseline: - PMD CPD: 5% max, +0.1% increase (baseline ~3.94%) - jscpd: 4% max, +0.1% increase (baseline ~2.65%) The "max increase vs base" rule is the real safety net regardless of absolute baseline. PR fails if either engine reports a regression. Diff annotations use PMD CPD's more accurate clone locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scale emoji reactions based on magnitude: - Big decrease: heart (celebrate!) - Modest decrease: thumbs up - Tiny decrease: slight smile - Tiny increase: diagonal mouth (hmm) - Moderate increase: monocle (investigating) - Big increase: raised eyebrow (concerned) Makes the report more readable at a glance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the inline duplicate-detection job with the extracted, reusable action. Removes ~55 lines of inline workflow steps and the local bin/ci-duplicate-report.js script in favor of a single action call. The action is at https://github.com/astubbs/duplicate-code-cross-check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
depends on #25
needs:dependencies for fast feedback@Tag("performance"), scripts, docs)-Dlicense.skipusage in AGENTS.mdPR build tiers
Tiers 3 and 4 run in parallel after Tier 2 passes.
Push builds (master)
Full Kafka matrix: 2.8.1, 3.1.0, 3.7.0, 3.9.1 + experimental
[3.9.1,5)range for 4.x compatibility.Changes
[3.1.0,4)to[3.9.1,5)to catch Kafka 4.x@Tag("performance")so they're excluded from regular CIbin/performance-test.shandbin/performance-test.cmddocs/SELF_HOSTED_RUNNER.mdfor future dedicated runner setupTest plan
🤖 Generated with Claude Code