Audit fixes: build reconciliation, integration tests, XML/YAML hardening, housekeeping#79
Merged
pratyush618 merged 11 commits intomainfrom Apr 23, 2026
Merged
Audit fixes: build reconciliation, integration tests, XML/YAML hardening, housekeeping#79pratyush618 merged 11 commits intomainfrom
pratyush618 merged 11 commits intomainfrom
Conversation
Maven now owns all 22 non-plugin modules. Gradle owns only the plugin, which must be Gradle-native for Gradle Plugin Portal publication via publishPlugins. Eliminates the drift the audit found (Gradle silently skipping 8 modules) by construction: Gradle only builds one thing. - Narrow settings.gradle.kts to a single include - Rewrite agenteval-gradle-plugin/build.gradle.kts to depend on published Maven artifacts of the same version rather than sibling project(...) refs; adds com.gradle.plugin-publish - Replace CI gradle job with a narrow gradle-plugin job that first mvn-installs the locally-resolvable artifacts - Narrow dependabot gradle ecosystem to /agenteval-gradle-plugin - Tighten .gitignore with secret patterns (audit finding #12)
11 tests across three production classes. Covers null-arg contracts, text and tool-call capture from AiMessage responses, TokenUsage mapping, ChatLanguageModel stub forwarding with latency capture, and content retriever delegation + consume semantics.
12 tests across four production classes. Covers null-arg contracts on the builder and capture, ChatResponse text + TokenUsage mapping, ChatModel forwarding with latency capture, advisor metadata, Document capture from the qa_advisor_retrieved_documents context key, consume/clear semantics, and auto-configuration bean production. Adds mockito-core test dep for stubbing CallAdvisorChain.
Configure DocumentBuilderFactory with disallow-doctype-decl, external entity/DTD disabling, and xinclude/entity-expansion off. The reporter only writes XML today so no external entity is ever parsed, but this prevents regression if parsing is added later and establishes the template pattern for any future DocumentBuilderFactory use. Addresses audit finding HIGH #3.
Switch YAMLFactory construction to the builder with explicit LoaderOptions: disallow duplicate and recursive keys, cap alias expansion at 50, cap nesting depth at 50, cap code points at 3 MiB. SnakeYAML 2.0+ already uses SafeConstructor and blocks custom global tags, so this hardening is defense-in-depth against billion-laughs / deeply-nested / oversized payloads rather than gadget-chain RCE. Addresses audit finding HIGH #4.
- Document 1.0.0 removal milestone on the PromptTemplate delegate and SemanticSimilarityMetric.cosineSimilarity so users can plan migrations - Add explanatory comment to agenteval-bom explaining why build-tooling modules are deliberately omitted (independent release cadences) - Replace sk-test / sk-ant-test in judge provider tests with neutral strings so the fixtures don't look like the real key shape to scanners - Reindent two JSON text-block fixtures to satisfy editorconfig's 4-space rule (content semantics unchanged) Addresses audit findings MEDIUM #6, #7, #8.
Replace 50 ms sleeps that were forcing distinct filesystem mtimes between two sequential tag() calls with explicit Files.setLastModifiedTime calls. The test now deterministically proves the listVersions ordering instead of depending on CI clock resolution. Addresses audit finding MEDIUM #9.
Replace regex class-family patterns with explicit Or/Class lists for the json/jsonl/yaml loader-writer classes and the datasets.version package. A new class added to any of these packages will now surface genuine EI_EXPOSE_REP[2] findings in spotbugs:check instead of being silently blanket-suppressed. Addresses audit finding LOW #10 — the broadest and most load-bearing regex patterns were the ones explicitly called out; other suppressions still use package patterns where the bug class is common across many record types in the package.
22 tests across three previously untested classes: - LatencyInjectorTest: 8 tests covering ms addition, zero-case, empty-tool-calls identity, field preservation, and constructor bounds - SchemaMutationInjectorTest: 10 tests covering each MutationType, default constructor, null result handling, escaping, and empty case - ResilienceEvaluatorTest: 4 tests covering judge delegation, rendered-prompt field substitution, and null-response placeholder Lifts agenteval-chaos coverage from 3/11 to 6/11 production classes. Addresses audit finding LOW #13.
Keep-a-Changelog format. The 0.1.0 entry retrospectively documents what shipped on 2026-03-29 (23 metrics, 7 judge providers, multi-model consensus, JUnit 5 integration, datasets, reporting, framework integrations, red teaming, build plugins, IntelliJ plugin). The [Unreleased] entry covers post-0.1.0 work: six new Tier 2 modules (contracts, statistics, chaos, replay, mutation, fingerprint), cost metrics, Dependabot bumps, the audit-remediation fixes landing in this PR (Gradle scope-down, langchain4j/spring-ai tests, XXE + YAML hardening, deprecation removal milestones, SpotBugs narrowing, chaos coverage), and the AUDIT.md report.
MINOR bump per semver: six new modules (contracts, statistics, chaos, replay, mutation, fingerprint), XXE and YAML hardening, new integration test coverage, two deprecations marked for 1.0.0 removal. - Root + all 23 module poms (via mvnw versions:set) - BOM pom - README, INSTALL, advanced docs code samples - agenteval-gradle-plugin/build.gradle.kts default property - CHANGELOG: stamp [0.2.0] - 2026-04-24 and keep Unreleased open - Normalize pom XML-attribute continuation indent from 9 spaces to 4 to satisfy editorconfig-checker (hook flags every modified pom)
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses findings from the
AUDIT.mdreport. 9 atomic commits,mvn verifygreen across all 23 modules.Fixes
CRITICAL
settings.gradle.ktswas missing 8 modules Maven was building. Narrowed Gradle toagenteval-gradle-pluginonly (required forpublishPlugins→ Gradle Plugin Portal); Maven now owns the other 22 modules exclusively. The two module lists can no longer drift because Gradle only builds one thing. (de3118f)a228fde,4045e6f)HIGH
DocumentBuilderFactorynow setsdisallow-doctype-decl, disables external entities and DTDs, and turns off xinclude / entity expansion. Defense-in-depth pattern for any futureDocumentBuilderFactoryusage. (ce15e20)YAMLFactory.builder().loaderOptions(...)with explicit caps: disallow duplicate/recursive keys, ≤ 50 aliases, ≤ 50 nesting depth, ≤ 3 MiB code points. Belt + suspenders on top of SnakeYAML 2.x's defaultSafeConstructor. (10ba4c9)MEDIUM
PromptTemplatedelegate andSemanticSimilarityMetric.cosineSimilaritynow document 1.0.0 removal. (1560a2e)agenteval-bom/pom.xmlexplaining why build-tooling modules are intentionally omitted. (1560a2e)sk-test/sk-ant-test→ neutralfake-key-for-tests. (1560a2e)Files.setLastModifiedTimeto deterministically order versions instead of relying on 50 ms sleeps between tags. (1bdd03f)LOW
~…Json.*/~…Jsonl.*/~…Yaml.*/~…datasets.version\..*regex patterns with explicit<Or><Class .../></Or>lists. New classes in those packages now surface genuine findings. (f52773c).env*,*.jks,*.keystore,*.p12,credentials.json. (bundled inde3118f)LatencyInjector,SchemaMutationInjector,ResilienceEvaluator. Chaos coverage went from 3/11 to 6/11 production classes. (aec4ba4)Findings not fixed in this PR:
System.outin console reporters — left as-is. These are intentional console reporters; routing through SLF4J would change user-facing output and is out of scope for an audit-fixes PR. Flagged for a separate RFC..claude/memory/project_overview.md(not tracked in git).Test plan
./mvnw verify -B→ BUILD SUCCESS across all 23 modules (1:24)./mvnw -pl agenteval-langchain4j -am test→ 11/11 new tests pass./mvnw -pl agenteval-spring-ai -am test→ 12/12 new tests pass./mvnw -pl agenteval-chaos -am test→ 22/22 new chaos tests pass, existing tests still green./mvnw -pl agenteval-datasets -am install→ SpotBugs reports 0 bugs with narrowed suppressions