Skip to content

Add Tier 2 differentiators and documentation#67

Closed
pratyush618 wants to merge 8 commits intomainfrom
feature/tier2-differentiators
Closed

Add Tier 2 differentiators and documentation#67
pratyush618 wants to merge 8 commits intomainfrom
feature/tier2-differentiators

Conversation

@pratyush618
Copy link
Copy Markdown
Collaborator

Summary

  • Cost-Normalized MetricsCostNormalizedMetric, LatencyNormalizedMetric, CostEfficiencyAnalyzer with Pareto frontier (29 tests)
  • Regression Root Cause AnalysisRootCauseAnalyzer clusters regressed cases by failure pattern, ranks by impact (11 tests)
  • Deterministic Replay — Record agent+judge interactions, replay for $0 regression tests (32 tests)
  • Mutation Testing — 5 built-in prompt mutators, MutationSuite measures eval detection rate (22 tests)
  • Capability Fingerprinting — Profile agents across 8 dimensions with CapabilityProfiler (17 tests)
  • Documentation — Updated README module list, 6 new doc pages under docs/advanced/

67 files, ~5,900 lines, 111 new tests

Test plan

  • All 5 feature test suites pass
  • Pre-commit hooks pass (checkstyle, editorconfig)
  • Fix MDX <= parsing error in statistical-analysis doc
  • Full reactor build
  • Docs workflow passes

CostNormalizedMetric, LatencyNormalizedMetric, CostEfficiencyAnalyzer,
ParetoFrontier in agenteval-metrics/cost package, 29 tests.
RootCauseAnalyzer clusters regressed cases by failure pattern,
detects output/tool/cost/latency changes, ranks by impact, 11 tests.
RecordingJudgeModel/AgentWrapper decorators, ReplayJudgeModel/AgentWrapper
for $0 regression tests, RecordingStore persistence, ReplaySuite
orchestrator, 32 tests.
Sealed Mutator interface with 5 built-in mutators, PluggableMutator,
MutationSuite orchestrator, AgentFactory, 22 tests.
CapabilityDimension enum (8 dimensions), CapabilityProfiler orchestrator,
CapabilityComparison, CapabilityReporter, 17 tests.
Update README module structure, add 6 doc pages under docs/advanced
for contract testing, chaos engineering, statistical analysis,
deterministic replay, mutation testing, and capability fingerprinting.
@pratyush618 pratyush618 closed this Apr 7, 2026
@pratyush618 pratyush618 deleted the feature/tier2-differentiators branch April 7, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant