Add Tier 2 differentiators and documentation by pratyush618 · Pull Request #67 · ByteVeda/agenteval

pratyush618 · 2026-04-07T09:51:42Z

Summary

Cost-Normalized Metrics — CostNormalizedMetric, LatencyNormalizedMetric, CostEfficiencyAnalyzer with Pareto frontier (29 tests)
Regression Root Cause Analysis — RootCauseAnalyzer clusters regressed cases by failure pattern, ranks by impact (11 tests)
Deterministic Replay — Record agent+judge interactions, replay for $0 regression tests (32 tests)
Mutation Testing — 5 built-in prompt mutators, MutationSuite measures eval detection rate (22 tests)
Capability Fingerprinting — Profile agents across 8 dimensions with CapabilityProfiler (17 tests)
Documentation — Updated README module list, 6 new doc pages under docs/advanced/

67 files, ~5,900 lines, 111 new tests

Test plan

All 5 feature test suites pass
Pre-commit hooks pass (checkstyle, editorconfig)
Fix MDX <= parsing error in statistical-analysis doc
Full reactor build
Docs workflow passes

CostNormalizedMetric, LatencyNormalizedMetric, CostEfficiencyAnalyzer, ParetoFrontier in agenteval-metrics/cost package, 29 tests.

RootCauseAnalyzer clusters regressed cases by failure pattern, detects output/tool/cost/latency changes, ranks by impact, 11 tests.

RecordingJudgeModel/AgentWrapper decorators, ReplayJudgeModel/AgentWrapper for $0 regression tests, RecordingStore persistence, ReplaySuite orchestrator, 32 tests.

Sealed Mutator interface with 5 built-in mutators, PluggableMutator, MutationSuite orchestrator, AgentFactory, 22 tests.

CapabilityDimension enum (8 dimensions), CapabilityProfiler orchestrator, CapabilityComparison, CapabilityReporter, 17 tests.

Update README module structure, add 6 doc pages under docs/advanced for contract testing, chaos engineering, statistical analysis, deterministic replay, mutation testing, and capability fingerprinting.

pratyush618 added 8 commits April 7, 2026 13:22

Add cost-normalized metrics for cost/latency-aware evaluation

4e7ee6d

CostNormalizedMetric, LatencyNormalizedMetric, CostEfficiencyAnalyzer, ParetoFrontier in agenteval-metrics/cost package, 29 tests.

Add regression root cause analysis

ce6930c

RootCauseAnalyzer clusters regressed cases by failure pattern, detects output/tool/cost/latency changes, ranks by impact, 11 tests.

Add agenteval-replay module for deterministic evaluation replay

a07e968

RecordingJudgeModel/AgentWrapper decorators, ReplayJudgeModel/AgentWrapper for $0 regression tests, RecordingStore persistence, ReplaySuite orchestrator, 32 tests.

Add agenteval-mutation module for prompt mutation testing

b214608

Sealed Mutator interface with 5 built-in mutators, PluggableMutator, MutationSuite orchestrator, AgentFactory, 22 tests.

Add agenteval-fingerprint module for agent capability profiling

af7acf8

CapabilityDimension enum (8 dimensions), CapabilityProfiler orchestrator, CapabilityComparison, CapabilityReporter, 17 tests.

Register replay, mutation, fingerprint modules in parent POM and BOM

cd67744

Add documentation for new modules

ba66066

Update README module structure, add 6 doc pages under docs/advanced for contract testing, chaos engineering, statistical analysis, deterministic replay, mutation testing, and capability fingerprinting.

Fix MDX parsing error in statistical-analysis doc

b149a2f

pratyush618 closed this Apr 7, 2026

pratyush618 deleted the feature/tier2-differentiators branch April 7, 2026 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tier 2 differentiators and documentation#67

Add Tier 2 differentiators and documentation#67
pratyush618 wants to merge 8 commits intomainfrom
feature/tier2-differentiators

pratyush618 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pratyush618 commented Apr 7, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant