Skip to content

fix: fail fast on invalid baseline skills#61

Closed
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-baseline-constraint-gate
Closed

fix: fail fast on invalid baseline skills#61
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/33-baseline-constraint-gate

Conversation

@steezkelly
Copy link
Copy Markdown

Summary

Partially addresses #33 H3 by making baseline constraint failures a hard gate:

  • adds _require_constraints_pass(...) to fail fast on any required constraint failure
  • uses that gate for baseline skill validation before optimizer setup
  • validates the complete baseline skill (skill["raw"]) instead of body-only markdown so skill_structure sees frontmatter
  • adds regression tests for raw baseline validation and fail-fast constraint behavior

Root cause

The pipeline previously logged a warning when the baseline skill failed constraints, then continued optimization anyway. That made improvement metrics unreliable because the candidate was compared against a malformed baseline.

A direct hard-fail would have exposed a second issue: the old baseline validation used skill["body"], but skill_structure requires YAML frontmatter. This PR therefore validates the full raw skill file before enforcing the gate.

Opposite-perspective review notes

I specifically checked the failure mode that could make this PR harmful:

Test Plan

  • RED first: pytest tests/skills/test_evolve_skill_constraint_gates.py -q failed because _require_constraints_pass and _validate_baseline_constraints did not exist
  • pytest tests/skills/test_evolve_skill_constraint_gates.py -q
  • pytest -q
  • static added-line security scan
  • git diff --check

Result: 142 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #33 (H3: evolution proceeds despite baseline constraint violations).

@steezkelly
Copy link
Copy Markdown
Author

Closing this split PR in favor of consolidated PR #67. Local integration found review/merge overhead across the stack (notably #61/#64 overlap in evolution/skills/evolve_skill.py), and #67 preserves the combined local test evidence: targeted stack tests 21 passed; full suite 160 passed; GitHub checks were absent on the split PRs. Review #67 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant