Skip to content

Prevent prose-output ↔ tool-call-belief conflation in HANDOFF/teachback writes #642

@michael-wojcik

Description

@michael-wojcik

Pattern observed (PR #641 cycle, 3 in-cycle instances)

The HANDOFF / teachback / stage-ready failure mode where agents conflate composing prose in their response turn with invoking the tool call that actually persists state. Three instances surfaced in PR #641:

  1. test-engineer's stage-ready claimed git status A pact-plugin/tests/test_628_coverage.py (added) while reality showed ?? (untracked, never git add'd). Walked back when lead grep'd file class names and they weren't found.
  2. backend-coder's first HANDOFF claim asserted "HANDOFF written for Task [P0] Fix pact-code-analyzer: dependencies, tests, security, and structure #26 ... canonical 6 fields ... SET intentional_wait{awaiting_lead_completion}" while metadata.handoff was empty (only intentional_wait from earlier stage-ready was in metadata). Walked back honestly under sharper diagnostic — closest to option (a): composed HANDOFF as prose in response turn; did NOT invoke TaskUpdate(metadata={handoff: ...}) before sending the announcement SendMessage.
  3. Lead's overreaction during incident Claude Code: not strictly follow framework #1: grep'd for the test classes in the WRONG file path (the dispatch-mentioned test_session_init.py rather than the new test_628_coverage.py test-engineer actually created). Concluded "fabrication" (HALT-class framing) before widening the grep. Same coupling failure shape — different actor.

All three are instances of the tightly-coupled-validator failure pattern (pact-memory 02005f83): the validating instrument (test parametrize literal / prose narrative / expected file path) is too tightly coupled to the validated thing (source literal / metadata write / actual work location). The cognitive completion-signal fires from the coupled instrument even when the substrate-level state is wrong.

Generalization

When the canonical HANDOFF format is described in pact-agent-teams/SKILL.md by-name + by-structure (6 fields: produced/decisions/reasoning_chain/uncertainty/integration/open_questions), an agent composing those fields as prose in their response turn produces the cognitive completion-signal — "I produced the canonical 6-field HANDOFF" — even though the prose is in the wrong substrate. Same mechanism for teachback (teachback_submit 4-field shape) and stage-ready (git diff --cached --stat paste).

The team-lead doesn't read the agent's response turn; the team-lead reads metadata.handoff (or metadata.teachback_submit). When prose-output substitutes for metadata-write, the lead sees empty metadata and the contract appears broken — but the agent believes it succeeded, so they idle without surfacing the problem.

Proposed defenses (two complementary)

1. Instruction-class fix (agent-readable, low-cost)

Update pact-plugin/skills/pact-agent-teams/SKILL.md to explicitly state, in both the HANDOFF and teachback sections:

Composing the HANDOFF / teachback as prose in your response turn is NOT the submission. Only TaskUpdate(taskId, metadata={handoff: {...}}) (or metadata={teachback_submit: {...}}) writes the payload to a substrate the team-lead reads. Before composing your "HANDOFF submitted" SendMessage:

  1. Invoke the TaskUpdate tool with the canonical-field payload
  2. Re-read via Bash: cat ~/.claude/tasks/{team}/{taskId}.json | python3 -c "import json,sys; print(list(json.load(sys.stdin).get('metadata',{}).get('handoff',{}).keys()))"
  3. Confirm the canonical fields are present at on-disk state
  4. Paste the re-read output in your SendMessage as proof

This mirrors the proven git diff --cached --stat paste-the-actual-output discipline already required at stage-ready time.

Similar wording for teachback (substituting teachback_submit).

Also: update agent persona files (or specialist agent bodies) where they reference HANDOFF discipline, to point at this canonical instruction.

2. Hook-class fix (structural, defense-in-depth)

Add a validator hook that fires on outgoing SendMessages from teammates whose content matches HANDOFF-completion claims. The hook:

  • Pattern-matches against the SendMessage body for phrases like "HANDOFF submitted", "HANDOFF written", "teachback submitted", "stage-ready" + "git add"-equivalent claims
  • For HANDOFF claims: reads the sender's task file (~/.claude/tasks/{team}/{taskId}.json), checks metadata.handoff is present with canonical fields populated
  • For teachback claims: checks metadata.teachback_submit similarly
  • For stage-ready claims involving git add: runs git diff --cached --stat and checks the file paths cited in the SendMessage are actually staged
  • If verification fails: inject a corrective additionalContext into the sender's next turn directing them to the actual on-disk state and asking them to re-attempt the write

This is substrate-level enforcement that doesn't depend on instruction-discipline holding. Even if a future agent forgets the paste-the-actual-output rule, the hook catches the divergence at the moment of claim.

Implementation notes:

  • Hook event: probably UserPromptSubmit or a new SendMessage-intercept event
  • Care: don't false-positive on legitimate "HANDOFF" mentions in non-claim contexts (e.g., "the architect's HANDOFF mentioned X")
  • Strategic: reconsider nag-prone hooks (TeammateIdle/TaskCompleted/Stop class) #538 livelock class compatibility: must verify the hook bindings are SAFE (UserPromptSubmit + PreToolUse are safe; TeammateIdle / TaskCompleted / Stop are forbidden)

Severity / scope

  • MEDIUM: pattern is recurring (3 instances same cycle, 2 different specialists). Both teammate-side failures self-corrected within ~1 round-trip via paste-the-actual-output discipline, so no work was lost — but the round-trip cost adds up across cycles, and the lead's overreaction added a SACROSANCT-framing-walk-back round-trip too. Structural defense justifies the cost.
  • Estimated effort: instruction-class is small (~2 days, agent-instruction edits + verification tests). Hook-class is medium (~1 week, includes hook design + livelock-class verification + opt-out for legitimate non-claim mentions + tests).
  • Suggested order: ship instruction-class first as a fast win; hook-class as Phase 2 if instruction-class doesn't reduce frequency.

Connection to existing memories / issues

Acceptance criteria

  • pact-agent-teams/SKILL.md updated with explicit prose-vs-tool-call distinction in both HANDOFF and teachback sections, including the paste-the-actual-output discipline
  • Persona body cross-references updated to point at the canonical instruction (no duplication)
  • (Optional Phase 2) Hook validator implemented + dogfood test asserting it catches the prose-vs-metadata divergence pattern at claim-time

Refs: PR #641 cycle, memory 02005f83

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions