sb-arnav · luw2007 · May 21, 2026 · May 22, 2026 · May 24, 2026 · May 26, 2026
diff --git a/.claude/nightly_skill-lifecycle-integration-spec.md b/.claude/nightly_skill-lifecycle-integration-spec.md
@@ -0,0 +1,149 @@
+# SPEC: skill-lifecycle — NIGHTLY × skill-router/manage.py 集成
+
+## 背景
+
+`manage.py` 提供基于 session 使用频率的 skill 归档/召回能力：
+- **归档**: N 天内零活动的 skill → `_archive_skills/`
+- **召回**: 已归档但 session 中仍被引用 ≥ min_hits 次 → 移回 active
+- **保护**: 标记永不归档的 skill（`protected_skills.json`）
+- **信号源**: `~/.claude/projects/` session transcripts + `route_log.jsonl`
+
+当前 NIGHTLY 的 5 种策略（rule-rewrite, hook-tighten, memory-add, skill-description-tighten, rule-reorder）均不涉及 skill 启停。将 manage.py 的归档/召回逻辑作为第 6 种策略接入 NIGHTLY 循环。
+
+## 目标
+
+新增策略 `skill-lifecycle`：每晚基于 session 使用数据，提出**一个** skill 的归档或召回操作，经 replay-score 验证后决定 keep/revert。
+
+## 设计
+
+### 策略定义
+
+```
+| Strategy | When to use |
+| skill-lifecycle | manage.py --status 显示：(a) 活跃 skill 30d 零使用，或 (b) 归档 skill 14d ≥3 hits。提出单个归档或召回。|
+```
+
+### Proposal 结构
+
+```json
+{
+  "run_id": "...",
+  "baseline_commit": "...",
+  "strategy": "skill-lifecycle",
+  "target_file": "~/.claude/skills/<name>/SKILL.md",
+  "action": "archive" | "recall",
+  "skill_name": "some-skill",
+  "skill_source": "claude" | "agents",
+  "change_summary": "Archive skill 'X' (0 hits in 30d) to reduce prompt noise",
+  "evidence": {
+    "days_analyzed": 30,
+    "hit_count": 0,
+    "total_sessions_scanned": 142
+  },
+  "motivating_corrections": [],
+  "proposed_at": "<iso8601>"
+}
+```
+
+### 执行流程
+
+```
+1. Preflight
+   python3 ~/.agents/skills/skill-router/scripts/manage.py --status --days 30
+   → 获取 active/archived 概览 + usage
+
+2. Propose
+   IF 存在 30d 零活动的非保护 skill → 选 token 占用最大的一个 → action=archive
+   ELIF manage.py --recall --days 14 --min-hits 3 有推荐 → 选 hits 最高的 → action=recall
+   ELSE → skip, 选其他策略
+
+3. Apply
+   IF archive: python3 manage.py --archive --days 30 (仅移动目标 skill，非 batch)
+   IF recall:  python3 manage.py --recall --days 14 --min-hits 3 --apply (仅目标)
+   注: manage.py 当前是 batch 操作，需要扩展为支持 --name <skill> 的单目标模式
+
+4. Safety check
+   - skill 在 ALWAYS_KEEP / protected_skills.json 中 → exit 3
+   - skill 是 symlink → exit 3（不动 symlink skill）
+   - archive 后剩余 active skill 数 < 5 → exit 3（防止清空）
+
+5. Replay + Score
+   标准 NIGHTLY 流程：replay benchmark，对比 baseline 分数
+
+6. Decide
+   与其他策略相同的 keep/revert 规则：
+   - score ≥ baseline → keep（skill 归档降噪有正收益）
+   - score < baseline - threshold → revert（该 skill 被隐式依赖）
+
+7. Revert 机制
+   IF revert:
+     对 archive 操作: move_skill(name, source, 'archive', 'active')
+     对 recall 操作: move_skill(name, source, 'active', 'archive')
+   + 将 (skill-lifecycle, skill_name) 写入 dead-letter
+```
+
+### manage.py 需要的改动
+
+| 改动 | 原因 |
+|---|---|
+| 新增 `--name <skill>` 参数 | NIGHTLY 每次只操作一个 skill，不要 batch |
+| `--json` 输出模式 | agent 解析结构化数据，不解析中文 print |
+| `cmd_archive` / `cmd_recall` 支持单目标 | 配合 `--name` |
+| 返回 exit code 区分：0=成功, 1=无操作, 3=安全拒绝 | 对齐 NIGHTLY safety_check 协议 |
+
+### safety_check.py 改动
+
+放开对 skill 目录的移动操作（当前 `plugins/` 是禁区）：
+
+```python
+# 新增白名单规则
+if strategy == 'skill-lifecycle':
+    allowed_paths = [
+        '~/.claude/skills/',
+        '~/.claude/_archive_skills/',
+        '~/.agents/skills/',
+        '~/.agents/_archive_skills/',
+    ]
+    # 仅允许 skill 目录间的移动，不允许删除或内容修改
+```
+
+### 评估指标扩展
+
+标准 replay score 之外，额外记录：
+
+```json
+{
+  "prompt_token_delta": -1847,
+  "active_skill_count_before": 52,
+  "active_skill_count_after": 51
+}
+```
+
+token_delta 作为辅助信号：即使 replay 分数持平，显著的 token 节省（>1000）也可视为正收益。
+
+### 与 strategy_stats.py 的集成
+
+`skill-lifecycle` 作为独立策略参与 effectiveness tracking：
+- 按正常 kept/tried 比率计算 promising/neutral/avoid
+- 子类型（archive vs recall）不单独追踪，统一为一个策略桶
+
+## 不做的事
+
+- 不修改 manage.py 的核心 scan_usage 逻辑（信号质量是 skill-router 的事）
+- 不同时归档/召回多个 skill（NIGHTLY 原则：one change per run）
+- 不触碰 `settings.json`（skill 的 active/archive 是文件系统级操作，不走 settings）
+- 不自动重建 `build_index.py`（归档/召回后由下次 skill-router 使用时自动触发）
+
+## 依赖
+
+- `~/.agents/skills/skill-router/scripts/manage.py` 已安装且可执行
+- Python 3.10+（已有）
+- session transcripts 存在于 `~/.claude/projects/`
+
+## 验收标准
+
+1. `nightly --observation` 能生成 `skill-lifecycle` 类型的 proposal
+2. dry-run 模式正确识别归档/召回候选
+3. revert 能完整还原 skill 位置（包括 symlink 修复）
+4. dead-letter 阻止重复操作同一 skill
+5. strategy_stats 正确追踪 skill-lifecycle 的 kept/tried
diff --git a/.gitignore b/.gitignore
@@ -19,3 +19,4 @@ __pycache__/
 **/dead-letter.jsonl
 **/reports/
 **/logs/
+.omc/
diff --git a/agents/nightly-optimizer.md b/agents/nightly-optimizer.md
@@ -19,7 +19,7 @@ The substrate you're improving is `~/.claude/` itself. The eval suite is `~/.cla
 2. **`~/.claude/` must be a clean git repo at start.** If `git status` shows uncommitted changes, abort with a clear message — never destroy the user's in-flight work.
 3. **All state goes to disk immediately.** Every measurement, every decision. The conversation is not durable storage.
 4. **Always include regressions in the report.** Top 3 regressions are a guardrail against silent overfit.
-5. **Never touch `~/.claude/projects/`, `~/.claude/plugins/`, `~/.claude/statsig/`, or `~/.claude/ide/`** — those are session/cache state, not substrate.
+5. **Never touch `~/.claude/projects/`, `~/.claude/plugins/`, `~/.claude/statsig/`, or `~/.claude/ide/`** — those are session/cache state, not substrate. Exception: `skill-lifecycle` strategy moves skills between `~/.claude/skills/` ↔ `~/.claude/_archive_skills/` and `~/.agents/skills/` ↔ `~/.agents/_archive_skills/` via `skill_lifecycle.py`.
 6. **Budget cap: $3 of Haiku tokens.** If you've spent more, stop and log a partial result.
 7. **Wall-clock cap: 30 minutes total run time.** Record the run's start time. If 30 min elapses before the loop completes, stop immediately, revert any partially-applied change, and log `decision: "timeout"`. Don't try to "finish" past the cap — the next cron fire will start fresh.
 8. **Sanity floor on score: 0.5.** If the experiment scores below 0.5, the loop is broken (not the substrate). Revert, log `decision: "sanity-floor-rejected"`, and write a report that flags the failure. Three consecutive sanity-floor rejections → abort future runs until the user investigates.
@@ -68,6 +68,7 @@ Pick the highest-leverage change from this menu. Bias by:
 | **memory-add** | Two or more recent corrections share a `root_cause`, OR a `proposed_rule` is mechanical enough to live in a SKILL.md. Create a feedback memory or skill file. |
 | **skill-description-tighten** | A skill's description is generic enough that wrong skills trigger. Tighten. |
 | **rule-reorder** | An anti-pattern rule appears below a less-critical one in operating-mode docs. Move it up. |
+| **skill-lifecycle** | `skill_lifecycle.py --propose` returns a candidate. Archive a 30d-unused skill (reduce prompt noise) or recall an archived skill with ≥3 recent hits. |
 
 Write your proposal to `proposal.json` BEFORE applying — this is the audit trail.
 ```json
@@ -82,9 +83,21 @@ Write your proposal to `proposal.json` BEFORE applying — this is the audit tra
 }
 ```
 
+**skill-lifecycle specific proposal flow:**
+```bash
+python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py --propose
+```
+If exit 0, the JSON output has `name`, `source`, `action`, `hits`, `size_bytes`. Use these to fill `proposal.json` with `target_file` = the skill's SKILL.md path. If exit 1, no candidate — pick another strategy.
+
 ### 3. Apply
 Edit the file(s). Stage the change with `git add -A` but do NOT commit yet. The commit only happens if the experiment is kept.
 
+**skill-lifecycle apply:** instead of editing files, run:
+```bash
+python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py --apply --name <name> --action <archive|recall> --source <claude|agents>
+```
+Exit 0 = applied. Exit 3 = safety rejected (protected/min-active-count) — treat as safety_check failure.
+
 ### 3b. Safety check (mandatory)
 Run:
 ```
@@ -191,12 +204,13 @@ Before applying decision, read `~/.claude/nightly/dead-letter.jsonl` if it exist
 
 **Default — observation mode** (auto-commit marker file absent):
 - Regardless of decision (`keep`, `revert`, `held`), **always revert** the change with `git reset --hard <baseline_commit>`. NIGHTLY never mutates substrate without user review while in this mode.
+- **skill-lifecycle revert:** also run `python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py --revert --name <name> --action <action> --source <source>` to move the skill back before `git reset`.
 - Write the proposal, diff, and score to `~/.claude/nightly/proposed/<run_id>.md` so the user can review and manually approve via `/nightly approve <run_id>` (which re-applies the change and commits with the correct author email).
 - Mark the experiment-log `decision: "proposed-<original_decision>"` (e.g. `proposed-kept`, `proposed-reverted`) so the audit trail shows what the loop WOULD have done.
 
 **Auto-commit mode** (user explicitly opted in by creating `~/.claude/nightly/auto-commit.yes`):
 - **Keep**: `cd ~/.claude && git commit -m "nightly <run_id>: <strategy> — score <baseline> → <new> (+<delta>)"`.
-- **Revert / hold**: `cd ~/.claude && git reset --hard <baseline_commit>`.
+- **Revert / hold**: `cd ~/.claude && git reset --hard <baseline_commit>`. For **skill-lifecycle**, also run `python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py --revert --name <name> --action <action> --source <source>` before the git reset.
 
 **Why observation mode is the default:** v0.2 scoring uses six regex heuristics over historical replay. The signals are gameable (e.g. a CLAUDE.md edit that forbids "feels balanced" trivially scores higher without improving reasoning), the Δ ≥ +0.02 threshold is below noise without variance estimation, and ground truth is "what the historical assistant did", not "what should have happened". Until v0.3 adds LLM-as-judge + multi-trial variance + correction-weighted scoring, NIGHTLY should propose changes, not commit them.
 

diff --git a/commands/nightly.md b/commands/nightly.md
@@ -1,5 +1,5 @@
 ---
-description: NIGHTLY autoresearch loop against ~/.claude/. Default = run one experiment in OBSERVATION mode (no auto-commit). Subcommands: status, diff, approve, reject, disapprove, list-proposals.
+description: "NIGHTLY autoresearch loop against ~/.claude/. Default = run one experiment in OBSERVATION mode (no auto-commit). Subcommands: status, diff, approve, reject, disapprove, list-proposals."
 ---
 
 Arguments: `$ARGUMENTS`
@@ -23,6 +23,8 @@ Recognized flags (pass-through):
 - `--dry-run` — skip benchmark replay, use corpus ground-truth as synthetic substitute
 - `--budget <usd>` — override default $3 cap
 - `--n <count>` — override default 10 replayable tasks
+- `--since <YYYY-MM-DD>` — only replay tasks with first_message_at >= this date
+- `--until <YYYY-MM-DD>` — only replay tasks with first_message_at <= this date (inclusive)
 
 ### `status`
 Print a one-screen status:

diff --git a/corrections.jsonl b/corrections.jsonl
diff --git a/docs/spec-skill-lifecycle.md b/docs/spec-skill-lifecycle.md
@@ -0,0 +1,158 @@
+# SPEC: `skill-lifecycle` Strategy
+
+> Add a 6th mutation strategy to the nightly optimizer that archives unused skills and recalls demanded archived skills, using session transcript analysis as the usage signal.
+
+## Motivation
+
+Skills loaded into the Claude Code system prompt consume tokens on every turn. An unused skill wastes prompt budget without providing value. Conversely, a skill archived too aggressively may be needed again — session transcripts reveal latent demand via keyword hits and Skill tool invocations.
+
+The existing 5 strategies (rule-rewrite, hook-tighten, memory-add, skill-description-tighten, rule-reorder) optimize substrate *content*. `skill-lifecycle` optimizes substrate *composition* — which skills are active at all.
+
+## Design
+
+### Usage Signal Detection
+
+The adapter scans `.jsonl` session transcripts from multiple sources:
+
+```
+~/.claude/projects/           # Claude Code sessions
+~/.codex/sessions/            # Codex sessions (if present)
+~/.agents/sessions/           # Other agent harnesses
+```
+
+Three signal types per skill:
+1. **Skill tool call**: `"tool": "Skill"` with `"input": {"skill": "<name>"}` in transcript
+2. **SKILL.md loaded**: `"Base directory for this skill:"` string in assistant messages
+3. **Keyword match**: skill name appears in assistant content
+
+### Adapter Script: `src/skill_lifecycle.py`
+
+Self-contained Python script with three modes:
+
+```
+python3 src/skill_lifecycle.py --propose
+python3 src/skill_lifecycle.py --apply --name X --action archive|recall --source claude|agents
+python3 src/skill_lifecycle.py --revert --name X --action archive|recall --source claude|agents
+```
+
+**Exit codes** (aligned with `safety_check.py` protocol):
+- `0` — action taken / candidate found
+- `1` — no candidate available / revert failed
+- `3` — safety rejected (protected skill, minimum active count, symlink)
+
+**Directory layout:**
+
+| Source | Active path | Archive path |
+|---|---|---|
+| `claude` | `~/.claude/skills/` | `~/.claude/_archive_skills/` |
+| `agents` | `~/.agents/skills/` | `~/.agents/_archive_skills/` |
+
+### `--propose` Logic
+
+1. Scan sessions from last 30 days, count per-skill usage
+2. **Archive candidates**: active skills with 0 hits in 30d, excluding protected set
+3. **Recall candidates**: archived skills with ≥3 hits in last 14d
+4. **Selection priority**: recall (highest hits) > archive (largest file size = most token savings)
+5. Output: single JSON object with `name`, `source`, `action`, `hits`, `size_bytes`
+
+### Safety Guards
+
+Built into `--apply`:
+- Skills in the hardcoded `ALWAYS_KEEP` set (`skill-router`, `context-mode`, `oh-my-claudecode`) are never archived
+- User-defined `protected_skills.json` entries are never archived
+- Symlink skills are never moved (they point to canonical locations)
+- Archive is rejected if total active skill count would drop below 5
+
+### Proposal Structure
+
+```json
+{
+  "run_id": "2026-05-27-0300",
+  "baseline_commit": "abc123",
+  "strategy": "skill-lifecycle",
+  "target_file": "~/.claude/skills/<name>/SKILL.md",
+  "action": "archive",
+  "skill_name": "some-skill",
+  "skill_source": "claude",
+  "change_summary": "Archive skill 'some-skill' (0 hits in 30d, 14KB) to reduce prompt noise",
+  "evidence": {
+    "days_analyzed": 30,
+    "hit_count": 0,
+    "size_bytes": 14497,
+    "archive_pool": 3,
+    "recall_pool": 0
+  },
+  "motivating_corrections": [],
+  "proposed_at": "2026-05-27T03:00:00Z"
+}
+```
+
+### Integration with Agent Workflow
+
+**Step 2 (Propose):**
+```bash
+python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py --propose
+```
+If exit 0 → use output to fill `proposal.json`. If exit 1 → no candidate, pick another strategy.
+
+**Step 3 (Apply):**
+```bash
+python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py \
+  --apply --name <name> --action <archive|recall> --source <claude|agents>
+```
+Exit 3 → treat as safety_check failure (revert, dead-letter, stop).
+
+**Step 7 (Revert — both observation mode and auto-commit revert):**
+```bash
+python3 ~/.claude/plugins/nightly/src/skill_lifecycle.py \
+  --revert --name <name> --action <archive|recall> --source <claude|agents>
+```
+Must run BEFORE `git reset --hard` since the filesystem move is not tracked by git.
+
+### Scoring Considerations
+
+Standard replay + mechanical scorer applies. Additional context for the report:
+
+```json
+{
+  "prompt_token_delta": -1847,
+  "active_skill_count_before": 52,
+  "active_skill_count_after": 51
+}
+```
+
+A token savings >1000 with score parity (Δ within noise floor) can be treated as positive signal — less prompt noise without quality regression.
+
+### Strategy Stats Integration
+
+`skill-lifecycle` participates in `strategy_stats.py` effectiveness tracking as a single strategy bucket. Sub-types (archive vs recall) are not tracked separately — the sample size would be too small for meaningful signal.
+
+## Hard Rule 5 Exception
+
+The existing hard rule "Never touch `~/.claude/plugins/`" remains. `skill-lifecycle` operates on `~/.claude/skills/` and `~/.agents/skills/` — these are substrate, not plugin/cache state. The `safety_check.py` path allowlist should whitelist:
+
+```
+~/.claude/skills/
+~/.claude/_archive_skills/
+~/.agents/skills/
+~/.agents/_archive_skills/
+```
+
+Only for `strategy == "skill-lifecycle"`, and only for directory moves (not content edits or deletions).
+
+## Out of Scope
+
+- Batch archive/recall (violates one-change-per-run principle)
+- Modifying `settings.json` (skill activation is filesystem-level, not settings-level)
+- Rebuilding search indexes after moves (handled lazily on next skill invocation)
+- Changing the usage detection heuristics (signal quality is independent of this strategy)
+
+## Acceptance Criteria
+
+1. `--propose` correctly identifies archive candidates (30d zero usage) and recall candidates (≥3 hits in 14d)
+2. `--apply` with a protected skill exits 3
+3. `--apply` with a non-existent skill exits 3
+4. `--revert` restores original filesystem state (including symlink repointing)
+5. Dead-letter blocks re-trying the same `(skill-lifecycle, skill_name)` pair
+6. `strategy_stats.py` tracks `skill-lifecycle` kept/tried ratios correctly
+7. Observation mode always reverts the filesystem move after scoring
diff --git a/plugin.json b/plugin.json
@@ -0,0 +1,7 @@
+{
+  "name": "nightly",
+  "version": "1.0.0",
+  "description": "Nightly self-improvement loop for Claude Code",
+  "commands": ["commands/nightly.md"],
+  "hooks": []
+}