Skip to content

feat(no_graph): auto-replay trace with full per-step state on failure#352

Merged
jp-fizzbee merged 1 commit into
mainfrom
user/jp/no-graph-auto-replay
May 18, 2026
Merged

feat(no_graph): auto-replay trace with full per-step state on failure#352
jp-fizzbee merged 1 commit into
mainfrom
user/jp/no-graph-auto-replay

Conversation

@jp-fizzbee
Copy link
Copy Markdown
Collaborator

When --experimental_no_graph reports a failure, the lightweight action- name trace it prints is enough to know WHAT failed but not the variable values at each step (no_graph drops Process state after expansion). Make the model checker re-exec itself with --trace-file pointing at the trace so the standard default-mode error report is printed below — converting the spartan no_graph trace into the same rich state-by- state output the user would have seen without --experimental_no_graph.

For the replay to faithfully reproduce the failure, the trace must contain EVERY non-thread-X link name that guided-mode's ShouldScheduleNode would match against — not just action transitions. Two pathTail-population sites that previously inherited the parent's path are switched to extend it with the link name:

  • processNode's intermediate-fork loop (non-yield branches): captures "Stmt:N" / "Any:var=value" / "For:value" / "checkpoint" — the choices made inside oneof / parallel-for / etc.

  • scheduleChannelMessages: captures "channel-X-message-Y".

The remaining inheritPath sites (YieldNode/YieldFork thread continuations producing "thread-X") stay as-is because trace.go auto-schedules thread-X links without consulting the trace.

Mechanism:

  • Trace recorded by no_graph is now an exact replay-able sequence of Inbound link names default mode would produce.
  • replayTraceWithFullState writes the trace to outDir/trace.txt (so the user has it on disk) and execs the current binary with --trace-file + the same .json. The replay run goes through the standard error-report code path that prints per-step state.
  • Called from both no_graph failure sites: early-deadlock detection and invariant-failure reporting.

Edge cases:

  • Replay run uses default mode (graph kept), bounded by trace length — cheap regardless of original state-space size.
  • Missing jsonFilename or empty path → skip silently.
  • Exec failure → print fallback "replay manually" hint.
  • For deadlock failures, the replay shows the state at the deadlock point but doesn't itself say "DEADLOCK detected" (guided trace doesn't run post-traversal deadlock detection). The no_graph output already labels it as DEADLOCK, so the user sees both.

Threading the jsonFilename through modelCheckSingleSpec required adding it as a parameter; the 4 call sites are updated with the appropriate json file (main / composition / refinement / impl).

Verified:

  • 94/94 reference suite passes (default-mode paths unchanged).
  • Simple invariant failure with role action: replay reproduces exact per-step state.
  • oneof block ("Stmt:N"): replay captures the failing choice branch and reproduces the failure state.
  • oneof expression ("Any:var=value"): same.
  • Deadlock: replay shows the deadlocked state.

When --experimental_no_graph reports a failure, the lightweight action-
name trace it prints is enough to know WHAT failed but not the variable
values at each step (no_graph drops Process state after expansion).
Make the model checker re-exec itself with --trace-file pointing at
the trace so the standard default-mode error report is printed below
— converting the spartan no_graph trace into the same rich state-by-
state output the user would have seen without --experimental_no_graph.

For the replay to faithfully reproduce the failure, the trace must
contain EVERY non-thread-X link name that guided-mode's
ShouldScheduleNode would match against — not just action transitions.
Two pathTail-population sites that previously inherited the parent's
path are switched to extend it with the link name:

  - processNode's intermediate-fork loop (non-yield branches):
    captures "Stmt:N" / "Any:var=value" / "For:value" / "checkpoint"
    — the choices made inside oneof / parallel-for / etc.

  - scheduleChannelMessages: captures "channel-X-message-Y".

The remaining inheritPath sites (YieldNode/YieldFork thread
continuations producing "thread-X") stay as-is because trace.go
auto-schedules thread-X links without consulting the trace.

Mechanism:
  - Trace recorded by no_graph is now an exact replay-able sequence
    of Inbound link names default mode would produce.
  - replayTraceWithFullState writes the trace to outDir/trace.txt
    (so the user has it on disk) and execs the current binary with
    --trace-file + the same .json. The replay run goes through the
    standard error-report code path that prints per-step state.
  - Called from both no_graph failure sites: early-deadlock detection
    and invariant-failure reporting.

Edge cases:
  - Replay run uses default mode (graph kept), bounded by trace
    length — cheap regardless of original state-space size.
  - Missing jsonFilename or empty path → skip silently.
  - Exec failure → print fallback "replay manually" hint.
  - For deadlock failures, the replay shows the state at the deadlock
    point but doesn't itself say "DEADLOCK detected" (guided trace
    doesn't run post-traversal deadlock detection). The no_graph
    output already labels it as DEADLOCK, so the user sees both.

Threading the jsonFilename through modelCheckSingleSpec required
adding it as a parameter; the 4 call sites are updated with the
appropriate json file (main / composition / refinement / impl).

Verified:
  - 94/94 reference suite passes (default-mode paths unchanged).
  - Simple invariant failure with role action: replay reproduces
    exact per-step state.
  - oneof block ("Stmt:N"): replay captures the failing choice
    branch and reproduces the failure state.
  - oneof expression ("Any:var=value"): same.
  - Deadlock: replay shows the deadlocked state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jp-fizzbee jp-fizzbee merged commit c845c5a into main May 18, 2026
1 check passed
@jp-fizzbee jp-fizzbee deleted the user/jp/no-graph-auto-replay branch May 18, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants