Skip to content

docs: singularity-crush SPEC.md + design docs#1

Open
mikkihugo wants to merge 11 commits into
mainfrom
docs/spec
Open

docs: singularity-crush SPEC.md + design docs#1
mikkihugo wants to merge 11 commits into
mainfrom
docs/spec

Conversation

@mikkihugo

Copy link
Copy Markdown

Summary

  • Adds SPEC.md — authoritative 1721-line specification for singularity-crush with RFC 2119 normative language and a 55-item conformance checklist across 5 tiers
  • Adds harness.md — engineering practices and design notes (research working doc)
  • Adds migrate.md — migration notes, model routing, knowledge layer design, dispatch scheduling

What SPEC.md covers (26 sections)

Phase state machine, orchestration loop, worker attempt lifecycle, context budget, supervision + circuit breaker, hook pipeline, workspace management (symlink-aware path containment), worktree isolation, verification gates, configuration + dynamic reload, model routing, knowledge layer (Hindsight + memory tiers + anti-pattern library), persistent agents, inter-agent messaging, observability, failure taxonomy, trust boundary, distributed SSH execution, plugin extension points, secret management (Vault), CLI commands, conformance checklist.

🤖 Generated with Claude Code

SPEC.md is the authoritative language-agnostic specification (1721 lines,
RFC 2119 normative language, 55-item conformance checklist) covering all
26 design sections: phase state machine, orchestration loop, worker attempt
lifecycle, context budget, supervision, hook pipeline, workspace management,
verification gates, model routing, knowledge layer (Hindsight), persistent
agents, inter-agent messaging, observability, failure taxonomy, trust boundary,
distributed SSH execution, plugin extension points, and secret management.

harness.md and migrate.md are the research/working notes from which SPEC.md
was synthesised. docs/hooks/FUTURE.md adds the SF hook event table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Add comprehensive singularity-crush specification, harness design, and migration documentation

📝 Documentation

Grey Divider

Walkthroughs

Description
• Adds comprehensive SPEC.md (1721 lines) — authoritative specification for singularity-crush with
  RFC 2119 normative language covering 26 sections including phase state machine, orchestration loop,
  worker lifecycle, knowledge layer, persistent agents, and 55-item conformance checklist across 5
  tiers
• Adds harness.md (899 lines) — engineering practices document defining clean architectural
  boundaries and 20 core harness concerns with RFC 2119 normative rules for budget management, phase
  transitions, hook pipelines, supervision, and distributed execution
• Adds migrate.md — migration guide from SF TypeScript to singularity-crush Go, detailing
  architecture decisions, prompt template contracts, HTTP observability API, git-aware revert
  protocol, dispatch scheduling, Hindsight memory integration, and plugin extension points
• Adds docs/hooks/FUTURE.md — SF harness hook events specification documenting 7 new event types
  (PreDispatch, PostUnit, PhaseChange, AutoLoop, AgentWake, AgentIdle, AgentMessage)
  with payload structures and aggregation behavior
• Adds mise.toml — tool configuration file with environment variables
Diagram
flowchart LR
  SF["SF TypeScript<br/>Codebase"]
  SPEC["SPEC.md<br/>1721-line Spec"]
  HARNESS["harness.md<br/>Engineering Practices"]
  MIGRATE["migrate.md<br/>Migration Guide"]
  HOOKS["docs/hooks/FUTURE.md<br/>Hook Events"]
  SC["singularity-crush<br/>Go Implementation"]
  
  SF -- "migration path" --> MIGRATE
  MIGRATE -- "references" --> SPEC
  MIGRATE -- "references" --> HARNESS
  SPEC -- "defines" --> SC
  HARNESS -- "defines" --> SC
  HOOKS -- "extends" --> SC
Loading

Grey Divider

File Changes

1. SPEC.md 📝 Documentation +1721/-0

Comprehensive singularity-crush specification with phase machine and orchestration

• Adds comprehensive 1721-line specification for singularity-crush with RFC 2119 normative language
 covering 26 sections
• Defines phase state machine (10 phases), orchestration loop, worker attempt lifecycle, and context
 budget management
• Specifies data model with SQLite schema, supervision system with 9 built-in checks, and hook
 pipeline architecture
• Details knowledge layer (Hindsight integration), persistent agents with memory blocks, inter-agent
 messaging, and model routing with three tiers
• Includes 55-item conformance checklist across 5 tiers (core, knowledge layer, model routing,
 persistent agents, extensions)

SPEC.md


2. migrate.md 📝 Documentation +779/-0

Migration guide from SF TypeScript to singularity-crush Go with architecture decisions

• Explains singularity-crush as Crush on autopilot, mapping existing codebases and driving
 autonomous execution through research → plan → execute → verify → complete phases
• Details what Crush already provides (agent loop, LLM multi-provider, MCP, SQLite, TUI) and what SF
 adds (planning system, phase dispatch, git/worktree management, session state)
• Specifies prompt template contract with strict variable checking (unit, attempt, phase,
 session_id) and continuation turn guidance-only prompts
• Covers HTTP observability API, git-aware revert protocol, dispatch scheduling with priority
 ordering and blocker-aware dispatch, and model routing with three tiers and benchmarking
• Describes Hindsight memory integration (two-bank pattern, anti-pattern library, pattern
 maturation, confidence decay) replacing SF's flat KNOWLEDGE.md
• Lists plugin extension points (SupervisorCheck, Shipper, VCS, Store, Notifier) and effort estimate
 (~11-14 weeks for working SF-equivalent with persistent agents)

migrate.md


3. docs/hooks/FUTURE.md 📝 Documentation +62/-0

SF harness hook events for unit lifecycle and persistent agents

• Adds SF harness hook events section documenting 7 new event types: PreDispatch, PostUnit,
 PhaseChange, AutoLoop, AgentWake, AgentIdle, AgentMessage
• Specifies PostUnit payload structure with unit metadata, verdict, duration, tokens, cost, model,
 and learnings
• Details aggregation behavior: PreDispatch and AgentWake follow PreToolUse semantics with
 allow/deny/halt; PostUnit and AutoLoop are notification-only
• Explains AgentWake/AgentIdle hooks for persistent agent lifecycle and AgentMessage hooks for
 enforcing routing policy between agents

docs/hooks/FUTURE.md


View more (2)
4. mise.toml ⚙️ Configuration changes +2/-0

Add mise tool configuration

• Adds mise configuration file with environment variable for OpenAI Codex latest version

mise.toml


5. harness.md 📝 Documentation +899/-0

Harness engineering practices and architectural boundaries specification

• Adds comprehensive 899-line engineering practices document for singularity-crush harness layer
• Defines clean architectural boundaries between agent loop, orchestration logic, and planning
 layers
• Specifies 20 core harness concerns: context budget, phase transitions, unit lifecycle hooks,
 session contract, observability, supervision, tool sandboxing, configuration, knowledge layer
 integration, post-unit hooks, worktree isolation, model routing, verification gates, environment
 variables, persistent agents, inter-agent messaging, failure taxonomy, worker attempt loop,
 distributed SSH execution, and trust boundaries
• Establishes RFC 2119 normative rules for budget compaction, phase state machines, hook pipelines,
 session recovery, structured logging with span-based tracing, supervisor checks (stuck loop,
 timeout, abandon detection, circuit breaker), tool response contracts, and symlink-aware path
 containment

harness.md


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Apr 29, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Phase flow inconsistent🐞 Bug ≡ Correctness
Description
Multiple docs define different canonical phase sequences (5-phase vs 8-phase), making the state
machine ambiguous and likely to be implemented incorrectly.
Code

SPEC.md[46]

+singularity-crush is an autopilot layer built on top of [charmbracelet/crush](https://github.com/charmbracelet/crush). Crush is an interactive coding agent — a human drives it turn by turn. singularity-crush adds a harness that drives Crush autonomously through a structured phase sequence (research → plan → execute → verify → complete) without human intervention per unit, while the human watches or steers.
Evidence
SPEC’s Overview describes a 5-phase loop, while the Phase State Machine section defines an 8-phase
“Standard flow”; harness.md and migrate.md also repeat differing shorthand sequences. This creates
an ambiguous source of truth for the orchestrator’s state machine.

SPEC.md[44-47]
SPEC.md[271-274]
harness.md[54-57]
migrate.md[5-8]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The docs define conflicting “canonical” phase sequences (5-phase vs 8-phase). Because these documents are meant to guide implementation, the phase/state-machine contract needs a single, consistent definition.
### Issue Context
- SPEC.md Overview uses: research → plan → execute → verify → complete.
- SPEC.md Phase State Machine defines: Research → Plan → Execute → TDD → Verify → Review → Merge → Complete.
- harness.md and migrate.md also include shorthand phase sequences that don’t match the standard flow.
### Fix Focus Areas
- SPEC.md[44-47]
- SPEC.md[271-274]
- harness.md[54-57]
- migrate.md[5-8]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Prompt vars inconsistent🐞 Bug ≡ Correctness
Description
SPEC.md defines canonical execute-task template variables, but harness.md and migrate.md document
different variable sets, increasing the risk of template/code drift and strict-render startup
panics.
Code

migrate.md[R67-77]

+## Prompt template contract
+
+Every dispatch renders the unit's prompt template with a strict variable checker — unknown variables fail rendering immediately (not silently). Template input variables:
+
+| Variable | Type | Value |
+|---|---|---|
+| `unit` | object | Full unit record: id, type, phase, title, description, labels, blockers |
+| `attempt` | integer or null | `null` on first dispatch; `1+` on retry or continuation |
+| `phase` | string | Current phase name (`execute`, `tdd`, etc.) |
+| `session_id` | string | Stable session UUID |
+
Evidence
SPEC.md lists canonical variables (including unit_id, unit_type, issue, last_error, etc.),
while migrate.md documents a unit object and fewer fields, and harness.md omits some SPEC-listed
variables. With strict rendering (“unknown variable MUST cause loadPrompt to panic”), divergent
documentation is likely to produce broken templates/tests.

SPEC.md[487-505]
harness.md[131-145]
migrate.md[67-77]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Prompt template variable documentation diverges across the new docs. Given strict template rendering (unknown vars panic at startup), this inconsistency is likely to cause broken templates/tests during implementation.
### Issue Context
- SPEC.md defines the canonical execute-task variables.
- harness.md and migrate.md present different variable sets/structures.
### Fix Focus Areas
- SPEC.md[487-505]
- harness.md[131-145]
- migrate.md[67-77]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. PostUnit semantics ambiguous 🐞 Bug ⚙ Maintainability
Description
docs/hooks/FUTURE.md simultaneously says PostUnit hooks can abort a session on non-zero exit, and
also says PostUnit is “notification-only” and cannot block, leaving implementers without a coherent
contract.
Code

docs/hooks/FUTURE.md[R415-438]

+`PostUnit` hooks that exit non-zero signal `SignalAbort` — the harness stops
+the session. Hooks that time out (default 30s) are killed and logged but do
+not block the next dispatch. This is the primary hook for: git commit/push,
+hermes-memory feedback, test gate execution, custom notifications.
+
+### AgentWake / AgentIdle hooks
+
+These fire per persistent agent, not per session. A hook on `AgentWake` can
+gate which agents are allowed to start (e.g. enforce a fleet size limit). A
+hook on `AgentIdle` is the natural place for post-turn git operations scoped
+to that agent's workspace.
+
+`AgentMessage` hooks fire before the message is delivered to the inbox. A
+`deny` decision drops the message and returns an error to the calling agent's
+`send_message` tool. Use this to enforce routing policy (e.g. an agent cannot
+message outside its designated group).
+
+### Aggregation behaviour for new events
+
+`PreDispatch` and `AgentWake` follow `PreToolUse` semantics: any `deny` or
+`halt` blocks the dispatch/wake. `PostUnit`, `AgentIdle`, and `AutoLoop` are
+notification-only — hooks cannot block these events, only observe them.
+`PhaseChange` and `AgentMessage` support `deny` to block the transition or
+message delivery respectively.
Evidence
One section states PostUnit non-zero exits stop the session (abort behavior), while the aggregation
section classifies PostUnit as non-blocking/notification-only. These statements need reconciliation
or explicit definitions (e.g., whether abort is considered ‘blocking’).

docs/hooks/FUTURE.md[415-418]
docs/hooks/FUTURE.md[432-438]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The document gives two incompatible interpretations of PostUnit hooks:
- non-zero exit aborts the session
- PostUnit is notification-only and cannot block
### Issue Context
Implementers need an explicit contract for whether PostUnit hooks are allowed to influence control flow (abort/pause) or are strictly observational.
### Fix Focus Areas
- docs/hooks/FUTURE.md[415-438]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the specification, migration notes, and harness engineering practices for singularity-crush, an autonomous autopilot layer for the Crush coding agent. The changes define the data model, phase state machine, orchestration loop, and knowledge layer integration. My feedback identifies several inconsistencies in the SQLite schema definitions regarding timestamp types and vector storage, as well as a documentation discrepancy regarding log truncation limits and a redundant interface definition.

Comment thread SPEC.md Outdated
CREATE TABLE memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding F32_BLOB(2560), -- Qwen3-Embedding-4B; NULL until indexed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The F32_BLOB(2560) type and the USING libsql_vector_idx index syntax (line 200) are specific to libsql. Since the project overview (line 54) mentions ncruces/go-sqlite3 and migrate.md (line 443) mentions sqlite-vec, it would be more accurate to use a standard BLOB type or the specific syntax required by sqlite-vec to ensure compatibility with the chosen database engine.

Suggested change
embedding F32_BLOB(2560), -- Qwen3-Embedding-4B; NULL until indexed
embedding BLOB, -- Vector embedding; NULL until indexed

Comment thread SPEC.md Outdated
Comment on lines +192 to +196
last_accessed TEXT,
valid_until TEXT,
superseded_by TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The memories table uses TEXT for several timestamp fields (last_accessed, valid_until, created_at, updated_at), whereas all other tables in this specification (e.g., sessions, units, agents) use INTEGER for Unix timestamps. For consistency and to simplify date-based queries (such as the decay logic in section 16.6), these should be changed to INTEGER.

Suggested change
last_accessed TEXT,
valid_until TEXT,
superseded_by TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
last_accessed INTEGER,
valid_until INTEGER,
superseded_by TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL

Comment thread SPEC.md
Comment on lines +1450 to +1453
type SupervisorCheck interface {
Name() string
Check(ctx context.Context, state SupervisorState) SupervisorSignal
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This definition of the SupervisorCheck interface is redundant as it was already defined identically in section 9.1 (lines 581-584).

Comment thread harness.md
| Phase transitions | `from=`, `to=`, `reason=` |
| Gate execution | `gate=`, `attempt=`, `passed=` |

Include action outcome in the message: `completed`, `failed`, `retrying`, `canceled`. Never log large raw payloads — truncate at 512 bytes and note `[truncated]`. If a log sink fails, continue running and emit a warning through any remaining sink.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The truncation limit for raw payloads is specified as 512 bytes here, but line 270 in this file and several places in SPEC.md (e.g., lines 1240, 1669) specify 2 KB. These should be synchronized to avoid confusion during implementation.

Suggested change
Include action outcome in the message: `completed`, `failed`, `retrying`, `canceled`. Never log large raw payloads — truncate at 512 bytes and note `[truncated]`. If a log sink fails, continue running and emit a warning through any remaining sink.
Include action outcome in the message: `completed`, `failed`, `retrying`, `canceled`. MUST NOT log large raw payloads — truncate hook output at 2 KB and append `(truncated)`. If a log sink fails, continue running and emit a warning through any remaining sink.

Comment thread migrate.md Outdated
### Storage

**sqlite-vec** with `F32_BLOB(2560)` columns in the same SQLite DB Crush already uses (`ncruces/go-sqlite3`). No extra processes, no separate index files. FTS5 virtual table alongside for BM25 hybrid search — FTS5 is the fallback when the embedding endpoint is unreachable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

sqlite-vec does not use the F32_BLOB syntax; that is a libsql specific type. For consistency with the chosen extension, this should refer to BLOB or the specific float[N] syntax if applicable.

Suggested change
sqlite-vec with BLOB columns in the same SQLite DB Crush already uses (ncruces/go-sqlite3).

Comment thread migrate.md Outdated
Comment on lines +508 to +525
CREATE TABLE memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding F32_BLOB(2560), -- Qwen3-Embedding-4B
decay_factor REAL DEFAULT 1.0,
confidence REAL DEFAULT 0.7, -- affects half-life
maturity TEXT DEFAULT 'candidate',
is_negative INTEGER DEFAULT 0,
helpful_hits INTEGER DEFAULT 0,
harmful_hits INTEGER DEFAULT 0,
access_count INTEGER DEFAULT 0,
collection TEXT DEFAULT 'default',
tags TEXT, -- JSON array
last_accessed TEXT,
valid_until TEXT,
superseded_by TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the memories table in SPEC.md, the timestamp fields here should use INTEGER for consistency with the rest of the schema, and F32_BLOB should be replaced with a type compatible with sqlite-vec (e.g., BLOB).

Suggested change
CREATE TABLE memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding F32_BLOB(2560), -- Qwen3-Embedding-4B
decay_factor REAL DEFAULT 1.0,
confidence REAL DEFAULT 0.7, -- affects half-life
maturity TEXT DEFAULT 'candidate',
is_negative INTEGER DEFAULT 0,
helpful_hits INTEGER DEFAULT 0,
harmful_hits INTEGER DEFAULT 0,
access_count INTEGER DEFAULT 0,
collection TEXT DEFAULT 'default',
tags TEXT, -- JSON array
last_accessed TEXT,
valid_until TEXT,
superseded_by TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
CREATE TABLE memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding BLOB, -- Vector embedding
decay_factor REAL DEFAULT 1.0,
confidence REAL DEFAULT 0.7, -- affects half-life
maturity TEXT DEFAULT 'candidate',
is_negative INTEGER DEFAULT 0,
helpful_hits INTEGER DEFAULT 0,
harmful_hits INTEGER DEFAULT 0,
access_count INTEGER DEFAULT 0,
collection TEXT DEFAULT 'default',
tags TEXT, -- JSON array
last_accessed INTEGER,
valid_until INTEGER,
superseded_by TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);

Comment thread SPEC.md Outdated
Comment thread migrate.md
mikkihugo and others added 10 commits April 29, 2026 07:09
…vs-units

Critical-item resolution from v0.1 review.

Removed:
- Local sqlite-vec memories table — Hindsight is the sole knowledge backend.
  Drops F32_BLOB / libsql_vector_idx / vector_top_k references that wouldn't
  have compiled against ncruces/go-sqlite3.
- specs.check as a user-project runtime gate. Moved to project CI on the
  singularity-crush repo itself.

Added:
- §3.3 Task Tracker Integration — Tracker interface, lifecycle states
  (active|blocked|done|cancelled|unknown), built-in adapters (linear, github,
  jira, sqlite), (tracker_kind, tracker_id) unique key, failure handling.
- §4.7 Crash Recovery — concrete model: in-memory state lost; running units
  marked interrupted on startup; fresh re-dispatch from last persisted phase
  boundary; tool calls NOT replayed.
- §17.1 Agent-vs-unit comparison table — defines what's shared (worker
  attempt lifecycle, supervisor checks) and what's different (no phase machine,
  no gates, persistent budget) for persistent agent runs.
- circuit_breakers and schema_migrations tables to §3.1.
- parent_id, claim_holder, claim_until, tracker_kind, tracker_id, interrupted
  status to units schema.
- Claim and Tracker definitions to §2.

Fixed:
- Attempt is 1-indexed (default 1, not 0); retry table aligned with formula.
- Hook timeout default unified to 60s (was 30s in §10.3, 60s in config).
- Doc-sync is now a sub-step of PhaseMerge (was undefined placement).
- max_turns_per_attempt, turn_timeout, stall_timeout, hot_cache_turns,
  max_attempts added to canonical config schema.

Conformance: added C-41..C-46 covering tracker, crash recovery, doc-sync
placement, and SQLite-orchestration-only constraint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…weep, more

Iteration 1 of self-paced review loop. Score moved 0.79 → ~0.88.

Adversarial fixes:
- A1: Tracker BlockedBy translation now defined (placeholder rows for
  upstream not-yet-fetched).
- A2: Claim expiry sweep is now an explicit orchestrator responsibility on
  every poll tick.
- A4: Memory-block "last-writer-wins" rule was wrong — agents don't share
  blocks; replaced with the actual concurrency story (serial tool calls
  within a turn, commit between turns).
- A6: SignalAbort now has a 5s/3s/SIGKILL escalation for in-flight tool
  calls. Documented tool_abort_grace and tool_abort_kill config.
- agent_inbox/messages now have a 30d retention sweep with archive to
  .sf/archive/agents/{id}/inbox-{YYYY-MM}.jsonl.

Architectural fixes:
- B2: Run is now a first-class abstraction. New `runs` table unifies
  unit_attempt and agent_run with ULID id, run_kind, outcome, error_code,
  token/cost columns. Trace and billing key on run_id.
- B7: local_anti_patterns SQLite mirror — anti-patterns survive Hindsight
  outage. The one knowledge category small/critical enough to dual-store.
- B8: Unit ID format now defined in §2 (milestone/m{n}, slice/m{n}/s{n},
  task/m{n}/s{n}/t{n}). Self-describing in logs.
- B9: Per-phase unit_timeout via [harness.unit_timeout_by_phase]; default
  10m was too tight for reasoning-tier phases.
- B10: Tier names enumerated as fixed (fast | standard | reasoning); custom
  tier names not supported.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mination

Score moved 0.88 → ~0.93.

Adversarial fixes:
- C1: units.attempt vs runs.attempt authority resolved — units.attempt is
  current counter, runs is historical; both updated in same transaction.
- C2: runs CHECK constraint enforces XOR between unit_attempt and agent_run.
- C3: outcome enum split — unit_timeout | turn_timeout | stalled distinct.
- C5: atomic claim via conditional UPDATE (rows_affected = 1 gates dispatch);
  safe under multi-orchestrator even if run.lock is missing.
- C6: task_blockers gets FK references with ON DELETE CASCADE.
- C7: PhaseUAT trigger defined — workflow require_uat = true; /sf uat-approve
  resumes; release.toml example added.
- A11: last_error injected only on TurnFirst of attempt >= 2 — clarifies the
  continuation/retry interaction.
- A12: tracker `unknown` mid-run does NOT cancel; only blocks new dispatch.
  `blocked` mid-run also non-cancelling. Protects against flaky tracker APIs.
- C11: Hindsight retain failures queue in pending_retain table with
  exponential backoff; flush to lost-learnings.jsonl after 7d. No silent
  knowledge loss to outage.

Architectural fixes:
- D1: runs aggregate columns documented as end-of-run rollup; spans remain
  authoritative.
- D8: /sf rate target now precise — last completed run in current session.
- D9: workflow selection at dispatch defined: tracker label → default →
  fallback. Pinned at first dispatch, immutable for retries.
- D10/A8: agent run termination conditions enumerated. Hot cache NOT
  preserved across agent runs; durable memory blocks and messages ARE.

Schema:
- units.workflow column added (pinned per-unit).
- runs CHECK constraint on run_kind XOR.
- task_blockers FK with cascade.
- pending_retain table for Hindsight outage queue.

Conformance: C-47 through C-55 added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Score moved 0.93 → ~0.96.

Adversarial fixes:
- E1: ULID consistency required for all runtime PKs.
- E2: workspace authority — runs.workspace per-attempt, units.workspace = latest.
- E4: parent_id depth via CHECK on type + harness validation.
- E5: lock is per-project at <project>/.sf/run.lock; multiple projects can run auto.
- E7: per-hook-type timeouts table; before_run 120s, post_unit 60s, doc_sync 5m.
- E9/F6: soft-delete instead of cascade — units.archived_at, agents.archived_at;
  runs FK uses ON DELETE SET NULL; runs.unit_id_snap and agent_name_snap
  preserve forensics across entity deletion.
- E3: agent compaction preserves wake message + recent 3 inbox arrivals +
  durable memory blocks; thread continuity guaranteed.
- C8: PreToolUse hooks outrank auto_approve list (deny wins).
- C10: SSH disconnect handling — error_code "ssh_disconnected", remote
  zombie cleanup via marker pgrep, host quarantine on cleanup failure,
  orphaned workspace preserved.
- A3: PhaseChange documented as non-vetoable; veto semantics on PreDispatch.

Architectural fixes:
- F2: Binary integration model — sf is single fork binary; Crush internal/
  re-used directly; /sf <subcommand> namespace extends slash-command router.
- F3+F4: Project vs Session boundary defined; per-project DB at
  <project>/.sf/sf.db; sessions are project-scoped, ULID, 30d inactivity TTL.
- F7: PhaseReassess behavior — three outcomes (Re-plan / Abandon / Escalate);
  reasoning tier with Think; max_reassess decrements only on Re-plan.
- B6: Slice merge ordering — dependency-aware via code_depends_on; serial
  merge gate per project; falls back to created_at order.
- C13: Canonical .sf/ directory layout documented in new §14.5 — config,
  workflows, hooks, gates, sf.db, locks, active/, archive/, log/, runtime/,
  trace/.
- D6: Three-pass review — establish-context → parallel chunked review →
  synthesis. Cross-file context no longer blind.
- D7: Doc-sync runs at end of last code-mutating phase, not just Merge.
  Spike workflows that adopt new dependencies now get doc updates.

Schema:
- units.archived_at, agents.archived_at for soft-delete.
- agents.capabilities (JSON), max_turns_per_run.
- runs.unit_id_snap, agent_name_snap for forensic preservation.
- CHECK constraints on enums (units.type, units.phase_status, agents.state).

Conformance: C-56 through C-68.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… handoff

Score moved 0.96 → ~0.98.

Adversarial fixes:
- G2: session_blockers resolution rules table (auto vs user vs command).
- G3: cost stored as INTEGER micro-USD (cost_micro_usd, cost_per_1k_micro_usd);
  float drift eliminated.
- G5: HTTP API ?session=<id> filter; multi-session DB returns sessions array.
- G6: runs.outcome includes 'interrupted' with CHECK enum.
- G7: SSH auth model — agent / key / key+agent; ssh_known_hosts MUST verify.
- G8: UAT timeout = 0 (infinite); advance via /sf uat-approve.
- G10: max_agents_by_phase.review default = 4; merge = 1 (serial).
- G11: last_error capped at 4 KB head+tail; full payload at last-error-full.txt.

Architectural fixes:
- H2: projectHash derivation defined — git remote SHA-256 → path fallback;
  cached in .sf/runtime/project-hash.json so same project hits same Hindsight
  bank from any clone.
- H3: workflow content pinning via workflow_pins table + units.workflow_hash.
  In-flight units locked to pinned content; on-disk template changes affect
  only new units.
- H5: HTTP API auth via Bearer token at .sf/runtime/api.token (mode 0600).
- H6: /sf doctor exit code spec; --json structured output.
- B5: Per-turn semantic outcome via <turn_status> marker (complete | blocked |
  giving_up); checkpoint between turns without phase boundary.
- B4: trace_index SQL table for /sf forensics; spans-on-disk + SQL pointer
  layer for fast lookup.
- B1: handoff supports capability:tag1,tag2 form with round-robin matching;
  ErrNoCapableAgent if none.
- A10: dynamic reload of session-immutable fields warns + keeps in-process
  value + surfaces in /sf status as drift; does NOT crash.
- C12: providers section in canonical config; vault:// required, plaintext
  rejected at startup.
- D5: trace JSONL _meta first-line record with trace_schema_version.
- F1: conformance items tagged [REQUIRED] / [STRONG] / [OPTIONAL].

Schema:
- runs.cost_micro_usd INTEGER (was REAL).
- benchmark_results.cost_per_1k_micro_usd INTEGER (was REAL).
- session_blockers.resolved_by column.
- units.workflow_hash column.
- workflow_pins table.
- trace_index table.

Errors: ErrNoCapableAgent, ErrSshDisconnected, ErrCanceledBySupervisor added.

Conformance: C-69 through C-83.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t, CLI completion

Score moved 0.97 → ~0.98+.

Final-tier fixes:
- Gate script protocol: env vars table, stdin shape, exit codes 0/1/2/3,
  output expectations, timeout escalation.
- Gate retry counter is now distinct from units.attempt; resets on phase
  transition.
- plan.md format: frontmatter (unit_id, created_at, written_by, plan_version)
  + required Goal/Approach/Deliverables/Verification/Notes sections;
  validated at PhasePlan exit.
- Hindsight client interface formalised: Recall/Retain/Feedback/Validate/
  Health methods + RecallOpts and Memory types. The wrapper is the seam
  for testing.
- SF tool registration: SF tools register through Crush's internal/agent/tools
  rather than a parallel registry; PreToolUse and auto_approve apply uniformly.
- Missing CLI commands added to §25: reassess-resolve, force-clear,
  merge-resolve, uat-approve, uat-reject, agent {list, run, reset, delete,
  inspect, history}, history (with filter syntax), clean.
- Trace_index archive-rotation: transactional with file_path UPDATE;
  interrupted move is repairable.
- agent_capabilities indexed table: capability lookup is now O(log n)
  instead of full-scan over agents.capabilities JSON.
- Rate-limit observability-only: justified (per-provider semantics differ;
  router + circuit breaker handle reactive retry).
- Version policy: SemVer; v1.0 freezes §§3, 4, 6, 10, 14, 26.
- Internal/ usage clarified: fork model satisfies Go's internal/ rule.

Conformance: C-84 through C-93.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… layer

These are the working notes that SPEC.md was synthesised from. Both now have
a top-of-file callout pointing readers to SPEC.md as authoritative.

Specifically scrubbed:
- harness.md §9: hermes-memory + hermes_memory_* tools + local
  embedding/reranker pipeline replaced with a redirect to SPEC.md § 16
  (Hindsight is the sole knowledge backend).
- harness.md §10: PostUnit hook line updated from "hermes-memory feedback"
  to "Hindsight feedback via the client wrapper."
- migrate.md "Memory and knowledge" section (~170 lines): the entire
  sqlite-vec + FTS5 + RRF + Qwen3-Reranker-0.6B/4B + memories schema +
  retrieval pipeline replaced with a superseded note pointing to SPEC.md.
- migrate.md sf init step: "FTS5 + Qwen3-Embedding-4B vectors" → "Hindsight
  project bank".

What survives in both docs: phase state machine, hooks, supervisor,
worktree, distributed execution, plugin extension points, Vault, skills,
/sf revert, dispatch scheduling. Those are broadly aligned with SPEC.md;
SPEC.md has the canonical wording.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After singularity-ng/singularity-memory#1 was merged, the engine formerly known as
Hindsight was assimilated into our codebase under singularity_memory_server/
(MIT-attributed). From sf's perspective there is no upstream Hindsight
service — Singularity Memory IS the engine.

Changes:
- §2: definition of Singularity Memory (sm); embedded vs remote modes;
  cross-tool sharing.
- §16.1: rewritten architecture; the "Hindsight is the sole knowledge backend"
  framing replaced with sm-as-our-engine; embedded-vs-remote table.
- §16.1.1: client interface renamed `Hindsight` → `Memory`; type renamed
  `Memory` (the struct) → `Entry` to avoid collision; client library is
  `singularity-memory-client-go` generated from openapi.yaml.
- §3.1: comments updated; local_anti_patterns mirror still applies, just
  against sm not Hindsight.
- §16.3: two-bank pattern preserved verbatim; references updated.
- §16.7: retrieval delegated to sm rather than Hindsight.
- §14.2: new [memory] config block — mode, url, api_key.
- §25 /sf doctor: checks Singularity Memory connectivity.
- §19.4 chapters / §16.8 sf init: sm references.

What did NOT change:
- Schema (no SQLite changes).
- Two-bank pattern semantics.
- pending_retain queue.
- local_anti_patterns mirror behavior on outage.
- Anti-pattern decay rules.
- Conformance gates (with name updates only).

Conformance: C-94 through C-96 added.

The remaining `hindsight` strings in SPEC.md are MIT attribution links to
vectorize-io/hindsight, which stay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Now that singularity-memory lives at singularity-ng/, the spec points
explicitly at:
- repo: github.com/singularity-ng/singularity-memory
- Go client: github.com/singularity-ng/singularity-memory-client-go (auto-generated)
- OpenAPI source: the running sm server's /openapi.json (not a checked-in YAML)

Last point matters because the SM repo doesn't ship a static openapi.yaml;
FastAPI generates it at runtime. Anyone regenerating the Go client points
at a running instance, not a file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Major rewrite. The spec was previously aimed at a Go fork of Crush; that
direction was reconsidered after recognising sf already has gen-2 harness
control via the vendored pi-mono SDK packages. Forking Crush would mostly
duplicate what pi-mono already provides (agent loop, multi-provider via
pi-ai, hooks, TUI primitives), pay a fork-merge tax, and ignore that the
daemon model already absorbs Node.js cold-start cost.

Implementation target now: the next major version of singularity-forge
(sf, formerly Get-Shit-Done / GSD), built directly on the existing
packages/pi-* vendored modules. TypeScript, not Go.

External tracker integration dropped entirely. sf's SQLite DB is the
sole source of work units. The Symphony-style poll-Linear-and-pull
model doesn't fit sf's "user states a goal, sf decomposes" pattern.
External visibility (GH Issues, Slack, etc.) is achieved via PostUnit
hook scripts (recipe in §10.5.1) rather than core integration.

Major changes:
- §1: implementation target changed from Crush fork to sf v3 on pi-mono.
  References to packages/daemon, packages/mcp-server, packages/native,
  packages/rpc-client added — all already exist in sf.
- §2: Tracker definition replaced with Plan definition.
- §3: tracker_kind/tracker_id columns removed from units; metadata JSON
  column added for arbitrary user-side links.
- §3.3: Tracker Integration replaced with "No external tracker" rationale.
- §4.6 PhaseReassess Abandon: tracker write removed; visibility flows
  through PostUnit hooks.
- §4.8 crash recovery: tracker reconciliation step removed.
- §6 worker loop: between-turn fetch is local DB read, not tracker call.
- §9 supervisor: ReconciliationCancel check removed.
- §10.5.1 added: PostUnit hook recipe for GH Issues publishing.
- §20 failure taxonomy: tracker class removed; ErrCanceledByOperator
  replaces ErrCanceledByReconciliation.
- §21 hardening: tracker tool reference replaced with plan_unit.
- §21.3: PreToolUse precedence text de-Crushed.
- §23: plugin loading model changed (TS dynamic import, not Caddy).
- §25: /sf plan and /sf abandon added.
- §26: C-41 / C-42 / C-50 / E-04 / E-05 reframed; E-04 is now
  plan_unit (agent self-refines plan), not tracker_query.
- All "Crush", "internal/", "Bubbletea", "fantasy", "catwalk" references
  rewritten to pi-mono / pi-coding-agent / pi-ai / pi-tui equivalents.

What survives unchanged: phase state machine, persistent agents, inter-agent
messaging, Singularity Memory integration (§16), workspace path safety
(§11), worktree isolation (§12), verification gates (§13), trust boundary
(§21), distributed SSH workers (§22), Vault secret management (§24),
observability (§19), failure taxonomy structure (§20). The conformance
checklist's bones survive — just retargeted from Go-on-Crush to TS-on-sf.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant