Add behavioral evals framework#12

Merged

HartBrook merged 2 commits intomainfrom

Jan 17, 2026

Owner

HartBrook commented Jan 17, 2026 •

edited

Loading

Summary

Add stag eval command to run behavioral tests against CLAUDE.md configs
Include 25 starter evals covering security, code quality, documentation, and language-specific best practices
Integrate with Promptfoo for LLM-based test assertions
Support eval syncing from team repos via stag sync

What's Included

New Commands:

stag eval - Run evals against your merged config
stag eval list - List available evals
stag eval init - Install starter evals
stag eval info <name> - Show eval details

Features:

Filter by tag (--tag security), name, or test (--test uses-*)
Test specific config layers (--layer team)
Multiple output formats: table, JSON, GitHub Actions annotations
Debug mode with full Claude responses (--debug)
Dry-run to preview without API calls

Starter Evals (25 total):

Category	Evals
Security	secrets, injection, auth, OWASP top 10, validation
Quality	clarity, simplicity, naming, error handling
Review	bugs, tests, performance, maintainability
Docs	API documentation, code comments
Git	commit messages, sensitive files
Language	Python, Go, TypeScript, Rust
Baseline	helpful, focused, honest, minimal

Test plan

stag eval init installs starter evals to ~/.config/staghorn/evals/
stag eval list shows all available evals grouped by source
stag eval --dry-run previews tests without API calls
stag eval security-secrets runs a specific eval
stag eval --tag security filters by tag
stag eval --output json produces valid JSON
stag team validate validates evals in team repos
stag sync fetches evals from team repo's evals/ directory

HartBrook added 2 commits

January 17, 2026 17:15


          add evals framework


          add changelog entry

f2b4450

HartBrook merged commit 77283ed into main

4 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet