Skip to content

feat: route self-evolution through Hermes Codex OAuth#92

Open
stephenschoettler wants to merge 10 commits into
NousResearch:mainfrom
stephenschoettler:feat/codex-oauth-self-evolution
Open

feat: route self-evolution through Hermes Codex OAuth#92
stephenschoettler wants to merge 10 commits into
NousResearch:mainfrom
stephenschoettler:feat/codex-oauth-self-evolution

Conversation

@stephenschoettler
Copy link
Copy Markdown

Summary

  • route self-evolution DSPy model creation through a Hermes Codex OAuth adapter for openai-codex/* model strings
  • move the nightly controller/policy to the Babbage/default ownership path and Codex-backed models
  • cap Codex GEPA cron budgets with explicit max_full_evals so policy iterations maps to a bounded run instead of GEPA's large hosted-model auto preset
  • reject byte-identical baseline/evolved artifacts as no-change even when judge scores improve, so no-op candidates are not reviewable/deployable

Coordination with existing work

I found overlapping open work before opening this PR:

This PR is narrower/different in two ways: it uses Hermes Agent's existing openai-codex OAuth credential/runtime path directly, and it hardens the scheduled cron governance path so a successful score cannot mask a byte-identical no-op artifact.

Validation

  • python -m pytest tests/core/test_dspy_lm_codex.py tests/test_nightly_evolve_cron.py tests/skills/test_evolve_skill_budget.py -q
  • python -m pytest -q -> 165 passed
  • /usr/bin/python /home/w0lf/.hermes/scripts/nightly-self-evolution.py --profile babbage --skill goal-planning --dry-run
  • Codex smoke call through make_dspy_lm('openai-codex/gpt-5.4-mini') returned OK
  • Controlled forced run after budget cap completed in ~81s and produced an artifact
  • Follow-up controlled forced run after no-op gate reported status: no-change, gate: no-material-diff, review: rejected, applied: no

Safety

  • auto_apply remains false
  • no generated skill candidate was applied
  • direct pushes to main were avoided; this is on a named branch

- All dspy.LM() calls: num_retries=8, timeout=120
- LiteLLM backoff env vars: INITIAL_RETRY_DELAY=5, MAX_RETRY_DELAY=60
- Switch nightly from anthropic/sonnet to openai/gpt-4.1 (no rate limit conflicts)
- Robust JSON parsing in dataset_builder (handles trailing commas, unescaped newlines)
- Tested: gpt-4.1 optimizer yielded +10.6% improvement, o3 optimizer yielded 0%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant