Tiny pseudo-evals for SKILL.md.
unvibe pokes a skill with scenario prompts and asks Claude what tools it
would call. It does not execute those tools. The result is a lightweight smoke
test for skill drift: useful, imperfect, and intentionally small.
Run it without installing anything:
uvx --from git+https://github.com/aaronfc/unvibe.git unvibe path/to/skill-dirOr install it as a persistent tool:
uv tool install git+https://github.com/aaronfc/unvibe.git
unvibe path/to/skill-dirFrom a local checkout, uv run unvibe ... (or the bin/unvibe wrapper) runs
the same command against your working tree.
unvibe path/to/skill-dir
unvibe path/to/skill-dir --scenario happy_path
unvibe path/to/skill-dir --verbose
unvibe path/to/skill-dir --parallel 5
unvibe --create path/to/skill-dirEach skill directory must contain:
SKILL.md
EVALUATION.yaml
unvibe uses claude -p by default. Set CLAUDE_BIN to use a different
Claude executable.
Use --create to generate a first-pass eval file from an existing SKILL.md:
bin/unvibe --create path/to/skill-dirThis writes path/to/skill-dir/EVALUATION.yaml. If that file already exists,
unvibe exits without changing it. Use --force to replace it:
unvibe --create path/to/skill-dir --forceThe generated file is a starting point. Read it before trusting it.
version: 1
scenarios:
- id: happy_path
user_message: |
Verify PR #123 with evals and update the PR description.
must_include:
- "ssh .*eval-runner\\.sh.*--label[= ]before"
- "ssh .*eval-runner\\.sh.*--label[= ]after"
must_not_include:
- "gh pr merge"
rubric:
- "The plan preserves the before log before running the after pass."Assertions:
must_include: case-insensitive Python regexes that must appear in the planned tool calls.must_not_include: case-insensitive Python regexes that must not appear in the planned tool calls.rubric: optional natural-language claims judged against the planned tool calls.
Exit code is 0 when every scenario passes and 1 otherwise.
A runnable example lives in examples/sample-skill.
The pure functions in unvibe.cli (response parsing, spec validation, plan
flattening, pass/fail evaluation) have fast, offline unit tests. From a source
checkout:
uv run pytesttests/smoke.sh builds and runs the packaged command against that example
using a stubbed CLAUDE_BIN, so it stays offline and deterministic:
tests/smoke.sh