feat: write evolution run reports by steezkelly · Pull Request #64 · NousResearch/hermes-agent-self-evolution

steezkelly · 2026-05-09T03:10:59Z

Summary

Implements issue #54 step 2: local, machine-readable promotion artifacts for evolution runs.

Adds:

evolution/core/run_report.py
build_run_report(...) for constructing sanitized report payloads
write_run_report(...) for writing reports/runs/<timestamp>-<target>.json
artifact diffs via skill.diff
wiring in evolve_skill.py so successful non-dry-run evolutions write and print a run report path
failure-path reporting when an evolved skill fails constraints, so failed variants still produce auditable artifacts
tests that generate reports/diffs without live LLM calls

Report fields include:

baseline and optimized artifact hashes/sizes
compatibility aliases: evolved_hash / evolved_size
dataset source and split counts
optimizer/eval model names
constraint results and aggregate pass/fail
holdout score delta when available
null cost/latency estimate placeholders when unavailable
output and diff paths

Safety / privacy notes

The report stores artifact hashes, sizes, metrics, paths, and constraint messages. It does not persist raw session dumps or raw private tool outputs. The diff is the baseline-vs-optimized artifact diff already written for local review.

No remote mutation or PR automation is introduced here.

Test Plan

RED first: pytest tests/core/test_run_report.py -q failed because evolution.core.run_report did not exist
pytest tests/core/test_run_report.py -q
pytest -q
runtime probe for successful and failed-score report writes
static added-line security scan
git diff --check

Result: 142 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #54.

steezkelly · 2026-05-09T03:30:02Z

Closing this split PR in favor of consolidated PR #67. Local integration found review/merge overhead across the stack (notably #61/#64 overlap in evolution/skills/evolve_skill.py), and #67 preserves the combined local test evidence: targeted stack tests 21 passed; full suite 160 passed; GitHub checks were absent on the split PRs. Review #67 instead.

feat: write evolution run reports

b6cf0e1

This was referenced May 9, 2026

feat: consolidate issue 54 ingestion and promotion gates #67

Closed

fix: declare reportlab dependency #60

Closed

fix: fail fast on invalid baseline skills #61

Closed

steezkelly mentioned this pull request May 9, 2026

feat: add benchmark gate for run reports #63

Closed

steezkelly closed this May 9, 2026

This was referenced May 9, 2026

feat: add local-first pr builder #65

Closed

feat: report session source availability #66

Closed

Implement all-agent session ingestion and promotion gates #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: write evolution run reports#64

feat: write evolution run reports#64
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:feat/54-run-report

steezkelly commented May 9, 2026

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steezkelly commented May 9, 2026

Summary

Safety / privacy notes

Test Plan

Uh oh!

steezkelly commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant