unvibe

Tiny pseudo-evals for SKILL.md.

unvibe pokes a skill with scenario prompts and asks Claude what tools it would call. It does not execute those tools. The result is a lightweight smoke test for skill drift: useful, imperfect, and intentionally small.

Install

Run it without installing anything:

uvx --from git+https://github.com/aaronfc/unvibe.git unvibe path/to/skill-dir

Or install it as a persistent tool:

uv tool install git+https://github.com/aaronfc/unvibe.git
unvibe path/to/skill-dir

From a local checkout, uv run unvibe ... (or the bin/unvibe wrapper) runs the same command against your working tree.

Usage

unvibe path/to/skill-dir
unvibe path/to/skill-dir --scenario happy_path
unvibe path/to/skill-dir --verbose
unvibe path/to/skill-dir --parallel 5
unvibe --create path/to/skill-dir

Each skill directory must contain:

SKILL.md
EVALUATION.yaml

unvibe uses claude -p by default. Set CLAUDE_BIN to use a different Claude executable.

Creating EVALUATION.yaml

Use --create to generate a first-pass eval file from an existing SKILL.md:

bin/unvibe --create path/to/skill-dir

This writes path/to/skill-dir/EVALUATION.yaml. If that file already exists, unvibe exits without changing it. Use --force to replace it:

unvibe --create path/to/skill-dir --force

The generated file is a starting point. Read it before trusting it.

EVALUATION.yaml

version: 1
scenarios:
  - id: happy_path
    user_message: |
      Verify PR #123 with evals and update the PR description.
    must_include:
      - "ssh .*eval-runner\\.sh.*--label[= ]before"
      - "ssh .*eval-runner\\.sh.*--label[= ]after"
    must_not_include:
      - "gh pr merge"
    rubric:
      - "The plan preserves the before log before running the after pass."

Assertions:

must_include: case-insensitive Python regexes that must appear in the planned tool calls.
must_not_include: case-insensitive Python regexes that must not appear in the planned tool calls.
rubric: optional natural-language claims judged against the planned tool calls.

Exit code is 0 when every scenario passes and 1 otherwise.

Development

A runnable example lives in examples/sample-skill.

The pure functions in unvibe.cli (response parsing, spec validation, plan flattening, pass/fail evaluation) have fast, offline unit tests. From a source checkout:

uv run pytest

tests/smoke.sh builds and runs the packaged command against that example using a stubbed CLAUDE_BIN, so it stays offline and deterministic:

tests/smoke.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
bin		bin
examples/sample-skill		examples/sample-skill
src/unvibe		src/unvibe
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unvibe

Install

Usage

Creating EVALUATION.yaml

EVALUATION.yaml

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

unvibe

Install

Usage

Creating EVALUATION.yaml

EVALUATION.yaml

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages