Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
b7ef867
feat: [US-001][US-002] AgentLock direct notification delivery + Test …
peguesj Mar 30, 2026
7565f3e
feat: lily-ai-phx showcase — queryable tabs + mermaid diagrams
peguesj Mar 30, 2026
1df1004
feat(CCEMHelper): notification permission setup guide and status display
peguesj Mar 30, 2026
bc0803a
fix: add osascript fallback for notification delivery on unsigned apps
peguesj Mar 30, 2026
8a9baf4
feat(ccemhelper): graduated notification actions for AgentLock
peguesj Mar 30, 2026
cb2229d
feat: US-305 - UnixSocketTransport actor for native IPC
peguesj Apr 5, 2026
a4747d0
feat: US-325 - CCEMHelper grouped approval UI with focus management
peguesj Apr 7, 2026
3c6a9a4
feat(CLAUDE.md): add CP-64–CP-70 checkpoints for ccem_security_native…
peguesj Apr 11, 2026
515c12a
fix(ccemhelper): harden APMClient URL construction against corrupt Us…
peguesj Apr 11, 2026
d7550e2
fix(ccemhelper): permanently prevent stall from Docker daemon wedge a…
peguesj Apr 11, 2026
24453e4
feat(showcase): add YJ64 SwiftUI storyboard + /showcase storyboard co…
peguesj Apr 14, 2026
6ea270b
feat(hooks): max-payload telemetry upgrade v9.1.1 across all 9 APM hooks
peguesj Apr 19, 2026
1389e05
chore: sync showcase data, opsdoc, checkpoints, and gitignore
peguesj Apr 19, 2026
9913b42
feat(widgetization): Dashboard Widgetization Engine CP-93–CP-106
peguesj Apr 19, 2026
aee5721
chore: mark testmaxxing formation v2 stories done, update checkpoints…
peguesj Apr 23, 2026
345fcfc
feat(dashboard): collapsible sidebar/inspector, widget grid wiring, s…
peguesj Apr 23, 2026
46e65e2
feat(apm): dashboard UX overhaul — collapsible panels, widget grid, i…
peguesj Apr 23, 2026
77006c5
chore(apm-v4): bump submodule to 278ca19 — Wave 1 formation fix + des…
peguesj May 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
670 changes: 244 additions & 426 deletions .claude/CLAUDE.md

Large diffs are not rendered by default.

576 changes: 576 additions & 0 deletions .claude/checkpoints.md

Large diffs are not rendered by default.

82 changes: 82 additions & 0 deletions .claude/checkpoints.session-0414.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Session Checkpoint — 2026-04-14

## Session Identity
- **Branch**: `ralph/agentlock-notifications-v860`
- **Project**: CCEM (Plane ID: `a20e1d2e-3139-406e-ae03-dc6d1d8cb995`)
- **Version**: v9.0.0 + v9.1.0 (Security Native Types)
- **APM PID**: 56361 (port 3032, started via mise shims)
- **CCEMHelper PID**: was 42338/43277 (healthy, 6+ sockets to APM)

## Completed Work This Session

### Fixes (all committed or applied, all tracked in Plane)
1. **CCEM-384 (urgent)**: CCEMHelper permanent stall fix — `DockerSocketRepair.asyncStatus()` + `shellWithTimeout()` replaced blocking `waitUntilExit`; moved `monitor.start()` from `MenuBarView.task{}` to `CCEMHelperApp.init()` via `MainActor.assumeIsolated`. Commit `d7550e2`. Files: `DockerSocketRepair.swift`, `CCEMHelperApp.swift`, `MenuBarView.swift`
2. **CCEM-383 (high)**: Lottie animation glimpse-then-disappear — unique `ind` per layer in `lottieBase()`, split fallback/mount into absolutely-positioned siblings with `DOMLoaded` fade. File: `assets/js/hooks/getting_started_showcase.js`
3. **CCEM-385 (high)**: APM dep graph race condition — `_hierarchyDataReceived` flag in `dependency_graph.js` so `hierarchy_data` handler wins over competing `_fetchInitialData` and `agents_updated` writes.
4. **CCEM-386 (high)**: APM CodeReloader crash — created `apm/start-apm-server.sh` with explicit mise PATH, launchd plist at `~/Library/LaunchAgents/io.pegues.agent-j.labs.ccem.apm-server.plist`

### Operations (all tracked in Plane)
5. **CCEM-387 (low)**: OpsDoc generation — 25 sections, 7 categories, interactive HTML at `~/Developer/ccem/opsdoc/`, served at `:3080`
6. **CCEM-388 (low)**: Coalesce skill sync — 7 skills updated (ccem-apm, apm, apm-api-reference, apm-auth, apm-telemetry, apm-usage). Version drift v7→v9, stale paths apm-v5→apm-v4, 5 missing LiveViews added.
7. **CCEM-389 (low)**: Showcase + DocsMax — 5 showcase data files updated to v9.1.0 (features.json, narrative-content.json, skill-interaction.json, projects.json)

### UPM/PM Sync
8. **UPM sync**: 15 checkpoints (CP-56–CP-70) aligned across prd.json, Plane, CLAUDE.md. Zero drift.
9. **Plane PM backfill**: 13 checkpoint issues verified Done (seq 370–382). 7 new session issues created (CCEM-383–389). Total Plane issues: 389.
10. **Formation deploy**: formation-deploy-all-0414, 10 agents (Alpha-TDD + Bravo-Ops), compile clean, 854 tests (747 pass, 107 pre-existing).
11. **APM auth**: allow-all granted + UPM policy `ap-36c067eb` created via AutoApprovalStore (24h TTL, project=ccem, tools=Skill/Agent/Bash/file, risk≤high).

## Current Infrastructure State
- **APM**: Running on :3032 (PID 56361, beam.smp via `/opt/homebrew/Cellar/erlang/28.3.1/`)
- **CCEMHelper**: Running, healthy, 6+ ESTABLISHED sockets to APM
- **Disk**: ~275-412MB free (was ENOSPC during concurrent agents; freed via log truncation)
- **Docker**: BROKEN at VM layer — raw socket never appears. Needs manual intervention, `/docksock nuke --force`, or reinstall.
- **OpsDoc server**: http://localhost:3080/client/
- **Dashboard**: http://localhost:3032 (formation view at /formation?id=formation-deploy-all-0414)

## Key File Locations
| Resource | Path |
|----------|------|
| APM server | `~/Developer/ccem/apm-v4/` |
| APM start script | `~/Developer/ccem/apm/start-apm-server.sh` |
| APM launchd plist | `~/Library/LaunchAgents/io.pegues.agent-j.labs.ccem.apm-server.plist` |
| CCEMHelper | `~/Developer/ccem/CCEMHelper/` |
| Showcase data | `~/Developer/ccem/showcase/data/` |
| OpsDoc | `~/Developer/ccem/opsdoc/` |
| Checkpoints (historical) | `~/Developer/ccem/.claude/checkpoints.md` |
| prd.json | `~/Developer/ccem/.claude/ralph/prd.json` |
| Plane align script | `/Volumes/DDRV902/plane_align_selfrun.py` |

## Plane PM State
- **Project**: CCEM (a20e1d2e-3139-406e-ae03-dc6d1d8cb995)
- **Total issues**: 389
- **States**: Done=9bab16dd, In Progress=0d7e0c82, Todo=8904905c, Backlog=111ce4ff, Cancelled=80645a72
- **API**: `X-Api-Key: plane_api_73588ec6f1c34e09b389b8565b7b63c9`
- **Non-Done issues**: ~180+ from v6-v8 era need state reconciliation (Agent B from backfill plan was not deployed due to scope — these are older issues that may or may not be completed)

## Post-Compact Continuation Plan

### Phase 1: `/upm sync` → `/plane-pm align`
- Re-run UPM sync to verify post-backfill state
- Run `/plane-pm align` with FULL Plane API access (now unblocked — disk freed, APM running)
- Focus on the ~180 non-Done issues from v6-v8 era — cross-reference git log to identify which are actually completed
- Batch-transition completed issues to Done

### Phase 2: `/coalesce` full coalescence
- Full coalescence across ALL skills, not just the 7 updated earlier
- Include: commands, methodologies, agent specs, project configs
- Verify all cross-references are current

### Phase 3: Prepare for exploratory/refinement/refactoring work
- Reference base established via:
- **OpsDoc** at `~/Developer/ccem/opsdoc/` (25 sections, architecture, runbooks)
- **Showcase** at `~/Developer/ccem/showcase/data/` (features, narratives, skill interactions)
- **Wiki/docs** at `~/Developer/ccem/apm-v4/priv/docs/` (audit, feature inventory, versions)
- **OpenAPI** at `http://localhost:3032/api/v2/openapi.json` (103+ paths)
- These should be READ as reference before any refactoring decisions
- The 107 pre-existing test failures (PubSub/GenServer lifecycle) are candidates for refactoring
- The stale v6-v8 Plane issues may reveal unfinished work worth revisiting

## Auth Context
- UPM policy active: `ap-36c067ebdd9404d0` (expires 2026-04-15T17:34:15Z)
- Allow-all token: `atk_953ec1e8067f1d8067b67c21258e2089`
144 changes: 144 additions & 0 deletions .claude/hook-audit-handoff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Hook Audit Handoff — CCEM APM Context
**Date**: 2026-04-18
**Source session**: LFG DevDrive v2 context (~/tools/@yj/lfg)
**Target**: /upm work in ~/Developer/ccem

## Plane Project

**Project ID**: `a20e1d2e-3139-406e-ae03-dc6d1d8cb995`
**Name**: CCEM - Claude Code Environment Manager
**Workspace**: lgtm
**Prefix**: CCEM
**URL**: https://plane.lgtm.build/lgtm/projects/a20e1d2e-3139-406e-ae03-dc6d1d8cb995/

Create one Plane issue per bug below. Suggested label: `hooks`.

---

## Bugs Found (10 total)

### CRITICAL

**Bug 1 — session_init.sh APM kill race**
File: `~/Developer/ccem/apm/hooks/session_init.sh`
Problem: `start_apm()` kills any process on port 3032 before verifying it's the BEAM. `is_apm_running()` PID check races with the kill — if APM is healthy, it still gets killed and restarted, dropping live sessions.
Fix: Before killing, run `curl -sf http://localhost:3032/ > /dev/null 2>&1` (returns HTML 200). Only kill if this fails. NOTE: `/api/health` returns 404 on APM v9.0.0 — do NOT use it. Use `GET /` or `GET /api/v2/sessions`.
Priority: CRITICAL
Plane issue title: `[HOOKS] session_init.sh kills healthy APM — add HTTP health check before port kill`

**Bug 2 — pre_tool_use.sh reads wrong JSON field**
File: `~/.claude/hooks/pre_tool_use.sh`
Problem: Reads `.tool_args` to get tool arguments. Claude Code sends `.tool_input`. All security checks (dangerous command detection, path boundary enforcement) are silently bypassed because `.tool_args` is always null on real traffic.
Fix: Replace all `.tool_args` references with `.tool_input` throughout the file.
Priority: CRITICAL
Plane issue title: `[HOOKS] pre_tool_use.sh reads .tool_args — should be .tool_input; all security checks inert`

**Bug 3 — pre_compact.py calls nonexistent method**
File: `~/.claude/hooks/pre_compact.py`
Problem: Calls `MemoryManager.store_conversation_context()` which does not exist in the MemoryManager API. Raises AttributeError on every compaction → hook exits 1 every time → pre-compact hook is broken.
Fix: Replace `store_conversation_context()` calls with `store_learning()` and/or `record_event()` per the actual MemoryManager API.
Priority: CRITICAL
Plane issue title: `[HOOKS] pre_compact.py calls nonexistent MemoryManager.store_conversation_context() — exits 1 every compaction`

---

### HIGH

**Bug 4 — claude-flow post-edit hook: --format true is invalid**
File: `~/.claude/settings.json` (inline PostToolUse hook for Write/Edit events)
Problem: Uses `--format true` flag with `npx claude-flow@alpha` (which resolves to ruflo v3.5.80). `--format true` is not a valid flag for ruflo; the correct flag is `--format json`. Hook silently misbehaves.
Fix: Change `--format true` → `--format json` in the inline hook in settings.json.
Priority: HIGH
Plane issue title: `[HOOKS] claude-flow post-edit hook uses --format true; ruflo requires --format json`

**Bug 5 — apm_schema_sync.sh hits 404 endpoint, caching HTML as schema**
File: `~/Developer/ccem/apm/hooks/apm_schema_sync.sh`
Problem: Hits `/api/v2/schema` which returns 404 HTML on APM v9.0.0. Hook uses `curl -s` without `-f`, so it doesn't detect the HTTP error. Has been writing `<!DOCTYPE html>...` as schema cache since ~Apr 14.
Fix: Use `/api/v2/openapi.json` instead (verify this exists in APM v9 routes first). Add `-f` to curl. Immediate remediation: `rm ~/.claude/.apm-schema.json ~/.claude/.apm-schema-sync`.
Priority: HIGH
Plane issue title: `[HOOKS] apm_schema_sync.sh hitting /api/v2/schema (404) — caching HTML 404 as schema since Apr 14`

**Bug 6 — catalog_links.py reads wrong field, link cataloging is inert**
File: `~/.claude/hooks/catalog_links.py` (called from `run_hook.sh`)
Problem: Reads `.content` key from PostToolUse payload. Claude Code PostToolUse sends `.tool_response`. Link cataloging is a complete no-op on all real traffic.
Fix: Change `.content` → `.tool_response` in field extraction.
Priority: HIGH
Plane issue title: `[HOOKS] catalog_links.py reads .content; Claude Code sends .tool_response — link cataloging is inert`

**Bug 10 — No DevDrive mount check at session start**
File: `~/.claude/settings.json` (SessionStart hooks — missing)
Problem: No hook verifies that fleet.json APFS volumes are mounted at session start. 904MEMVT corruption caused a compact ENOENT crash after hours of silent failures — the crash happened because `~/.claude/projects` symlinked to an unmounted/corrupt volume.
Fix: Create `devdrive_mount_check.sh` that reads `~/DevDrive/fleet.json`, checks each expected mount point, and warns (does not block) if any volume is unmounted. Register in SessionStart.
Priority: HIGH
Plane issue title: `[HOOKS] Missing session-start DevDrive mount check — 904MEMVT corruption caused compact ENOENT crash`

---

### MEDIUM

**Bug 7 — subagent_stop.sh reads wrong field for child agent ID**
File: `~/Developer/ccem/apm/hooks/subagent_stop.sh`
Problem: Reads `.agent_id` to identify the stopping child agent. Claude Code's SubagentStop event sends `parent_session_id` (not `.agent_id`). All child agents recorded as "unknown" in APM lineage tracking.
Fix: Change `.agent_id` → `.parent_session_id` (verify exact field name against Claude Code SubagentStop schema first).
Priority: MEDIUM
Plane issue title: `[HOOKS] subagent_stop.sh reads .agent_id; SubagentStop sends parent_session_id — all child agents "unknown"`

**Bug 8 — upm-sync-cron.sh exists on disk but not registered in settings.json**
File: `~/.claude/settings.json` (missing registration)
Actual file: `~/.claude/hooks/upm-sync-cron.sh`
Problem: The UPM sync cron hook exists on disk but is not registered in any settings.json hook event. It never runs.
Fix: Add to settings.json under the appropriate hook event (SessionStart or a cron-compatible trigger). Verify the script works standalone before registering.
Priority: MEDIUM
Plane issue title: `[HOOKS] upm-sync-cron.sh not registered in settings.json — UPM sync automation never runs`

**Bug 9 — No Serena activity tracker hook**
File: `~/.claude/settings.json` (missing)
Problem: Serena MCP plugin is enabled and active for the LFG project (6 memory files written, project activated as "lfg"). However, no hook tracks Serena tool invocations. Zero APM telemetry for Serena activity.
Fix: Create `serena_activity_tracker.sh`. Register in PostToolUse to fire when tool name matches `serena__*` pattern. Emit APM span with tool name, symbol/path operated on, and duration.
Priority: MEDIUM
Plane issue title: `[HOOKS] Missing Serena MCP activity tracker — Serena has zero APM telemetry`

---

## APM Self-Correcting Actions to Implement

These are capabilities the APM server itself should expose to make hooks more resilient:

1. **`GET /api/health` → 200 JSON** (currently 404)
Return `{"status":"ok","version":"9.x.x","uptime_s":N}`. This lets session_init.sh do a clean health check.

2. **Schema endpoint** — expose `GET /api/v2/schema` or `GET /api/v2/openapi.json`
The apm_schema_sync.sh hook has been broken since this endpoint was removed. Either restore it or add a redirect.

3. **`GET /api/v2/hook-contract`** — publish the exact JSON fields Claude Code sends per event type
Format: `{"PreToolUse":{"tool_name":"string","tool_input":{...}},"PostToolUse":{"tool_name":"string","tool_response":{...}},...}`
This prevents future hook bugs caused by guessing field names.

4. **`GET /api/v2/volumes`** — DDRV volume mount status
Return fleet.json volume states: `[{"name":"DDRV900","mounted":true,"mount_point":"/Volumes/DDRV900",...}]`
Surface in dashboard as volume health widget.

5. **Hook error aggregation in dashboard**
Collect hook exit codes (non-zero = error) and surface them per session in the APM UI.
APM can read these from hook stderr/stdout if hooks emit structured JSON to stdout.

---

## Additional Findings (not assigned bug numbers)

- `apm_schema_sync.sh` uses `curl -s` without `-f` throughout — HTTP errors go undetected in multiple other hooks too. Audit all hook files for this pattern.
- `pre_tool_use.sh` dangerous-command regex is too narrow even after the field fix is applied. Consider expanding coverage.
- No `jq` availability guard in any hook file. If `jq` is missing, all JSON parsing silently fails.
- DDRV902 has no auto-mount equivalent to DDRV900's mount logic. Asymmetric resilience.
- `run_hook.sh` wrapper error handling is unaudited — if a hook errors, unclear whether run_hook.sh propagates the exit code correctly.

---

## Note on Plane Sync

When creating Plane issues for these bugs:
- Use project ID: `a20e1d2e-3139-406e-ae03-dc6d1d8cb995`
- State: Backlog
- Priority mapping: CRITICAL→1 (Urgent), HIGH→2 (High), MEDIUM→3 (Medium)
- Label: `hooks`
- Also cross-reference the LFG project (`6bc05edb-a2b4-44c1-9cfc-2c938edb38a3`) for Bug 10 (DevDrive mount check), since that bug originated from LFG's 904MEMVT corruption.
32 changes: 32 additions & 0 deletions .claude/hooks/apm_session_end.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
# ccem SessionEnd / Stop hook
# Marks session complete in APM and emits a final summary notification.
# Always exits 0 (non-blocking).

source "$HOME/Developer/ccem/apm/hooks/hook_common.sh"

INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"' 2>/dev/null || echo "unknown")

load_state "$SESSION_ID"

# Emit final status
PAYLOAD=$(jq -n \
--arg agent_id "session-${SESSION_ID}" \
--arg status "idle" \
--arg message "Session ended (ccem)" \
--arg formation_id "$FORMATION_ID" \
'{agent_id: $agent_id, status: $status, message: $message, formation_id: $formation_id}' 2>/dev/null)

curl -s -X POST "$APM_URL/api/heartbeat" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" >> "$LOG" 2>&1

# Mark session complete in the state dir
STATE_FILE="$STATE_DIR/${SESSION_ID}.json"
if [ -f "$STATE_FILE" ]; then
jq '. + {"status": "complete", "ended_at": "'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}' \
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE" 2>/dev/null || true
fi

exit 0
45 changes: 45 additions & 0 deletions .claude/hooks/apm_subagent_enrich.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash
# CCEM SubagentStart hook
# Injects formation metadata into the pending handoff for child agents
# so they inherit CCEM project context in the APM dashboard.
# Always exits 0 (non-blocking).

source "$HOME/Developer/ccem/apm/hooks/hook_common.sh"

INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id // "unknown"' 2>/dev/null || echo "unknown")
SUBAGENT_ID=$(echo "$INPUT" | jq -r '.subagent_id // ""' 2>/dev/null || echo "")
SUBAGENT_TYPE=$(echo "$INPUT" | jq -r '.subagent_type // "unknown"' 2>/dev/null || echo "unknown")

load_state "$SESSION_ID"

# Write enriched pending handoff for the subagent
if [ -n "$SUBAGENT_ID" ]; then
PENDING_FILE="$STATE_DIR/pending_${SUBAGENT_ID}.json"
cat > "$PENDING_FILE" << ENDJSON
{
"parent_session_id": "$SESSION_ID",
"trace_id": "$TRACE_ID",
"formation_id": "${FORMATION_ID:-ccem-${SESSION_ID}}",
"formation_role": "agent",
"agent_level": $((AGENT_LEVEL + 1)),
"parent_span_id": "$(new_span_id)",
"project": "ccem",
"subagent_type": "$SUBAGENT_TYPE"
}
ENDJSON

# Notify APM of subagent spawn
PAYLOAD=$(jq -n \
--arg agent_id "session-${SESSION_ID}" \
--arg status "working" \
--arg message "Spawning ${SUBAGENT_TYPE} agent (level $((AGENT_LEVEL + 1)))" \
--arg formation_id "${FORMATION_ID:-ccem-${SESSION_ID}}" \
'{agent_id: $agent_id, status: $status, message: $message, formation_id: $formation_id}' 2>/dev/null)

curl -s -X POST "$APM_URL/api/heartbeat" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" >> "$LOG" 2>&1 &
fi

exit 0
Loading
Loading