Skip to content

Add bakery Claude skill with inspect_ai evals#585

Open
bschwedler wants to merge 10 commits into
mainfrom
feat/bakery-skill-on-main
Open

Add bakery Claude skill with inspect_ai evals#585
bschwedler wants to merge 10 commits into
mainfrom
feat/bakery-skill-on-main

Conversation

@bschwedler

Copy link
Copy Markdown
Contributor

Adds a Claude Code skill for the bakery CLI and an inspect_ai eval harness to validate it.

What's included

.claude/skills/bakery/SKILL.md — A skill that encodes 13 critical invariants for working in Posit container image repos: never editing rendered files, always using uv run, reading bakery.yaml before constructing filter flags, forwarding flags consistently to ci merge, and more. Available in all sibling repos that have images-shared as an additional working directory.

.claude/skills/bakery/evals/ — An inspect_ai eval harness with 13 dataset entries covering the skill's key invariants. Uses a behavior scorer that checks expected behaviors and forbidden behaviors against a grader model's assessment.

.github/workflows/bakery-skill-eval.yml — CI workflow that runs the evals when skill files change, uploads results as artifacts. Path-filtered to .claude/skills/bakery/** so it only fires on skill changes. Not wired into the required CI gate — evals are a signal, not a hard block.

bschwedler added 10 commits June 9, 2026 14:50
Encodes critical bakery invariants and common workflows for use
across the posit-dev image repos. Covers the template editing rule,
uv invocation, cross-repo change protocol, CI merge flag forwarding
(the UID collision mitigation), and reference tables for CLI flags
and template variables.

Symlinked to ~/.claude/skills/bakery/ for local use during iteration;
will move to posit-dev-skills marketplace when stable.
Running `bakery update files` without filter flags re-renders every
version, which is almost never correct. The skill now requires filter
flags and defaults to scoping renders to the most recent version.

Also adds guidance for the systematic-change case: render one version
to validate, then apply to the template and re-render.
Six test cases covering the critical invariants:
- Never edit rendered files (template editing rule)
- Always use uv run bakery
- Always scope bakery update files with filter flags
- Read sibling repo CLAUDE.md/bakery.yaml before cross-repo changes
- Forward filter flags to bakery ci merge
- Correct version creation flow

Adapted from the doc-reviewer eval pattern with a behavior_scorer
that checks expected behaviors (YES/NO) and forbidden antipatterns
(ABSENT/PRESENT) using a grader model.
- Matrix versions are excluded by default: --matrix-versions must be
  set explicitly for connect-content/workbench-session or builds
  silently produce zero targets
- --dev-stream is silently ignored (warning only) without --dev-versions
  include/only — must always pair them
- --plan only works with --strategy bake (the default), errors with
  --strategy build

Also adds bakery get tags as a lightweight preview command alongside
bakery build --plan, and clarifies --push --no-load as the CI pattern.
Based on PR #565 (dev-spec-dispatch):
- --dev-stream is deprecated; replace with --dev-channel
- Add --dev-spec / BAKERY_DEV_SPEC for CI dispatch builds that must
  pin an exact dev version (overrides CDN discovery for the channel)
- Note that channel conflict between --dev-spec and --dev-channel
  raises an error
- Forward --dev-spec alongside --dev-channel in the ci merge invariant
Four new cases:
- 7: --dev-channel is used, not deprecated --dev-stream
- 8: --dev-channel silently ignored without --dev-versions
- 9: --dev-spec / BAKERY_DEV_SPEC for dispatch pinning
- 10: --matrix-versions excluded by default (silent zero targets)

Also updates case 5 (ci-merge-flag-forwarding) to expect --dev-channel
and BAKERY_DEV_SPEC to be forwarded to the merge step, not just
--dev-versions.
CLI findings:
- bakery remove is irreversible (no dry-run, no confirmation)
- bakery create version marks new version as latest by default,
  unmarking all others
- bakery update version --clean deletes files before re-rendering
  (destructive default); use bakery update files for safe re-renders
- bakery dgoss run replaces deprecated bakery run dgoss

Workflow findings:
- --dev-versions in clean must match build or the wrong images are cleaned
- --temp-registry in merge must exactly match the build value
- clean.yml callers must guard with github.ref == 'refs/heads/main'
  to avoid fork PR failures

Also extends invariant 6 (flag forwarding) to cover the clean stage
and temp registry consistency.
Add invariant #3 to the bakery skill directing the model to read
bakery.yaml before constructing filter-flag values. The skill already
encoded how to use --image-name/--image-version/--image-os/--image-variant
but never told the model where to get valid values — leaving it to guess.

Add three inspect_ai eval cases (IDs 11–13) that test whether a model
correctly reads bakery.yaml to discover image names, version names, and
OS name strings rather than inventing or inferring them from context.

Renumber the downstream invariants (#4#11#5–#13) to accommodate
the new entry.
Runs inspect_ai evals against the bakery skill whenever files under
.claude/skills/bakery/ change. Path-filtered so it only fires on
skill changes, not on every PR. Results are uploaded as artifacts.

Not wired into the required CI gate — evals are a signal, not a
hard block, given their cost and non-determinism.
@bschwedler bschwedler requested a review from ianpittwood as a code owner June 9, 2026 19:55
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Test Results

1 634 tests  ±0   1 634 ✅ ±0   8m 14s ⏱️ +11s
    1 suites ±0       0 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 2b6eb36. ± Comparison against base commit 6bc441c.

@ianpittwood ianpittwood left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a great idea! Hopefully this helps fix some of the woes with multi-repo edits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants