Context
The periodic skill curator (crates/bot/src/learning_curator.rs, prompt CURATOR_SYSTEM_PROMPT in crates/right-codegen/src/agent_def.rs) ships an LLM consolidation pass: umbrella-merge near-duplicate rightx-* skills, demote narrow skills into an umbrella's references/, archive with absorbed_into. It is enabled by default and runs today.
What's missing: we have no measurement of how good those consolidation decisions are. We don't know whether it merges the right skills, over-merges, or rarely fires usefully. The marketing claim ("the curator decides two skills are duplicates and merges one into the other") is currently unproven in any deployment.
Blocked on
Curator outcome telemetry + dashboard observability (track "A"). We need run-level outcome data — what each curator pass merged/archived/demoted and why — before we can judge or tune quality. Do this issue after that lands.
Scope (once telemetry exists)
- Evaluate the consolidation pass against a real skill library: are umbrella / demote / archive decisions correct? false merges? missed duplicates? does it act at all?
- Tune
CURATOR_SYSTEM_PROMPT from observed behavior.
- Consider a lightweight eval harness / golden cases for consolidation decisions.
References
- Deferred Phase-2:
docs/superpowers/specs/2026-05-22-prefilter-classifier-and-curator-state-design.md §11 (outcome-driven prompt calibration).
- Curator design:
docs/superpowers/specs/2026-05-22-skill-learning-writer-curator-design.md.
Context
The periodic skill curator (
crates/bot/src/learning_curator.rs, promptCURATOR_SYSTEM_PROMPTincrates/right-codegen/src/agent_def.rs) ships an LLM consolidation pass: umbrella-merge near-duplicaterightx-*skills, demote narrow skills into an umbrella'sreferences/, archive withabsorbed_into. It is enabled by default and runs today.What's missing: we have no measurement of how good those consolidation decisions are. We don't know whether it merges the right skills, over-merges, or rarely fires usefully. The marketing claim ("the curator decides two skills are duplicates and merges one into the other") is currently unproven in any deployment.
Blocked on
Curator outcome telemetry + dashboard observability (track "A"). We need run-level outcome data — what each curator pass merged/archived/demoted and why — before we can judge or tune quality. Do this issue after that lands.
Scope (once telemetry exists)
CURATOR_SYSTEM_PROMPTfrom observed behavior.References
docs/superpowers/specs/2026-05-22-prefilter-classifier-and-curator-state-design.md§11 (outcome-driven prompt calibration).docs/superpowers/specs/2026-05-22-skill-learning-writer-curator-design.md.