Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,12 +496,26 @@ opens a reverse-incremental search through history — each keystroke
narrows the match, Ctrl-R again walks to the next older one, Esc restores
the original line, Enter accepts.

Press `<` to rewind one instruction. Each explicit step (`s`, `S`, `n`,
`f`) records a full CPU + RAM snapshot beforehand, kept in a 256-entry
FIFO ring; the status bar shows `rwd:N` while non-empty. Free-run via
`r` does NOT snapshot — the 64 KiB-per-step cost would dominate at multi-MHz
throughput — so reverse-step covers single-stepping sessions, not whole
program executions.
Press `<` to rewind one instruction. Every step — explicit (`s`, `S`, `n`,
`f`) or free-run (`r`) — records a page-level copy-on-write delta beforehand,
kept in a 256-entry FIFO ring; the status bar shows `rwd:N` while non-empty.

For jumps deeper than that ring, **deep rewind** keeps periodic full-RAM
*keyframes* (one every 4096 steps) and reconstructs any earlier step by
restoring the nearest keyframe and replaying forward to the exact target:

| Command | Effect |
|--------------------|-------------------------------------------------------------|
| `:rewind N` | Step back N executed steps (keyframe replay for deep jumps) |
| `:rewind-budget MB`| Cap keyframe memory; sets the deep-rewind reach |

Reach (steps) = `budget / 64 KiB × 4096`. At the default 128 MiB cap that's
~8.4M steps; `:rewind-budget 256` reaches ~16.7M. The budget is a ceiling —
a short run holds only the keyframes it produced — and the status bar shows
`deep:<reach>@<budget>` once keyframes exist. A deep rewind replays at most
4096 instructions (sub-millisecond on the cycle-accurate core). Replay assumes
deterministic execution between keyframes; live keyboard input is captured in
the snapshot, so buffered input replays correctly.

---

Expand Down
146 changes: 146 additions & 0 deletions cpu/keyframe.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
package cpu

// Keyframe-based deep rewind (issue #392).
//
// The per-step SnapshotRing only reaches back as far as its capacity (a few
// hundred steps) — fine for "oops, step back one" but useless for "rewind to
// somewhere in the last few million steps". Storing a delta for every one of
// those steps is infeasible, so deep rewind instead keeps periodic *full*
// machine snapshots (keyframes) and reconstructs an arbitrary earlier state
// by restoring the nearest keyframe at or before the target step and
// replaying forward the handful of steps in between.
//
// reach (steps) = ring capacity (keyframes) × keyframe interval (steps)
// ring capacity = budget bytes / KeyframeBytes
//
// Memory is a *cap*, not a preallocation: a short run holds only as many
// keyframes as it produced. Forward-replay cost is bounded by the interval,
// so a larger interval trades replay latency for reach at a fixed budget.

// KeyframeBytes is the accounting size of one keyframe: a full 64 KiB RAM
// image. Register/peripheral state is negligible next to it, so the budget
// math treats every keyframe as this fixed size.
const KeyframeBytes = 0x10000

// Keyframe is a full machine snapshot tagged with the step index at which it
// was taken. Snap.Pages holds every page (a complete RAM image), so Restore
// reconstructs the exact state with no delta chain.
type Keyframe struct {
Step uint64
Snap Snapshot
}

// SnapshotFull captures a complete RAM image (all 256 pages) plus registers,
// suitable for use as a keyframe base. Unlike CPU.Snapshot — which records
// only a page delta for undoing a single step — this is self-contained:
// Restore needs nothing else. Peripherals are filled in by the caller, as
// with the delta path.
func (c *CPU) SnapshotFull(ram *RAM) Snapshot {
s := c.Snapshot(ram)
pages := make(map[byte][256]byte, 256)
for p := 0; p < 256; p++ {
var img [256]byte
base := p << 8
copy(img[:], ram.Data[base:base+256])
pages[byte(p)] = img
}
s.Pages = pages
return s
}

// KeyframeRing is a fixed-capacity FIFO of keyframes ordered by ascending
// step. Push appends the newest; when full it drops the oldest, so the ring
// always holds the most recent `cap` keyframes. Nil receiver methods are
// safe and behave as an empty, zero-capacity ring.
type KeyframeRing struct {
buf []Keyframe
head int // next-write index
size int
cap int
}

// NewKeyframeRing builds a ring sized to hold budgetBytes worth of keyframes.
// A budget too small for even one keyframe still yields a 1-slot ring so deep
// rewind degrades to "nearest keyframe" rather than disabling outright; a
// non-positive budget yields nil (feature off).
func NewKeyframeRing(budgetBytes int) *KeyframeRing {
if budgetBytes <= 0 {
return nil
}
c := budgetBytes / KeyframeBytes
if c < 1 {
c = 1
}
return &KeyframeRing{buf: make([]Keyframe, c), cap: c}
}

// Cap returns the ring's keyframe capacity (0 for a nil ring).
func (r *KeyframeRing) Cap() int {
if r == nil {
return 0
}
return r.cap
}

// Len returns the number of keyframes currently held.
func (r *KeyframeRing) Len() int {
if r == nil {
return 0
}
return r.size
}

// Bytes is the approximate resident size of the held keyframes.
func (r *KeyframeRing) Bytes() int {
return r.Len() * KeyframeBytes
}

// Push appends a keyframe. Callers are responsible for pushing in ascending
// step order (the TUI does, since it captures during forward execution).
func (r *KeyframeRing) Push(kf Keyframe) {
if r == nil || r.cap == 0 {
return
}
r.buf[r.head] = kf
r.head = (r.head + 1) % r.cap
if r.size < r.cap {
r.size++
}
}

// Nearest returns the latest keyframe whose Step is <= target, and true. When
// the ring is empty or every held keyframe is newer than target (target fell
// off the back of the reach window), it returns false.
func (r *KeyframeRing) Nearest(target uint64) (Keyframe, bool) {
if r == nil || r.size == 0 {
return Keyframe{}, false
}
// Entries run oldest..newest starting at (head - size). Scan newest-first
// and take the first with Step <= target.
for i := 0; i < r.size; i++ {
idx := (r.head - 1 - i + r.cap) % r.cap
if r.buf[idx].Step <= target {
return r.buf[idx], true
}
}
return Keyframe{}, false
}

// Oldest returns the lowest step still reachable (the back of the window) and
// true, or (0,false) when empty. Used to report reach to the user.
func (r *KeyframeRing) Oldest() (uint64, bool) {
if r == nil || r.size == 0 {
return 0, false
}
idx := (r.head - r.size + r.cap) % r.cap
return r.buf[idx].Step, true
}

// Reset drops all keyframes without freeing the backing buffer.
func (r *KeyframeRing) Reset() {
if r == nil {
return
}
r.head = 0
r.size = 0
}
95 changes: 95 additions & 0 deletions cpu/keyframe_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
package cpu

import "testing"

func TestSnapshotFull_RoundTrip(t *testing.T) {
ram := NewRAM()
ram.EnableShadow()
c := New(ram)
for a := 0; a < 0x10000; a += 257 {
ram.Data[a] = byte(a)
}
c.A, c.X, c.PC = 0x11, 0x22, 0x9000

kf := c.SnapshotFull(ram)
if len(kf.Pages) != 256 {
t.Fatalf("SnapshotFull captured %d pages; want 256", len(kf.Pages))
}
// Mutate everything, then restore.
for a := 0; a < 0x10000; a++ {
ram.Data[a] = 0xEE
}
c.A, c.X, c.PC = 0, 0, 0
c.Restore(kf, ram)
if c.A != 0x11 || c.X != 0x22 || c.PC != 0x9000 {
t.Errorf("regs not restored: A=%02X X=%02X PC=%04X", c.A, c.X, c.PC)
}
for a := 0; a < 0x10000; a += 257 {
if ram.Data[a] != byte(a) {
t.Fatalf("RAM[%04X] = %02X; want %02X", a, ram.Data[a], byte(a))
}
}
}

func TestKeyframeRing_CapFromBudget(t *testing.T) {
if r := NewKeyframeRing(0); r != nil {
t.Error("zero budget should yield nil ring")
}
// 64 MiB / 64 KiB = 1024.
if r := NewKeyframeRing(64 << 20); r.Cap() != 1024 {
t.Errorf("cap = %d; want 1024", r.Cap())
}
// Sub-keyframe budget still yields a 1-slot ring.
if r := NewKeyframeRing(100); r.Cap() != 1 {
t.Errorf("tiny budget cap = %d; want 1", r.Cap())
}
}

func TestKeyframeRing_NearestAndEviction(t *testing.T) {
r := NewKeyframeRing(3 * KeyframeBytes) // cap 3
for _, step := range []uint64{0, 1000, 2000, 3000} {
r.Push(Keyframe{Step: step})
}
// Cap 3 -> step 0 evicted; window is {1000,2000,3000}.
if old, _ := r.Oldest(); old != 1000 {
t.Errorf("oldest = %d; want 1000", old)
}
cases := []struct {
target uint64
step uint64
ok bool
}{
{3500, 3000, true},
{3000, 3000, true},
{2999, 2000, true},
{2000, 2000, true},
{1000, 1000, true},
{999, 0, false}, // older than the back of the window
}
for _, c := range cases {
kf, ok := r.Nearest(c.target)
if ok != c.ok || (ok && kf.Step != c.step) {
t.Errorf("Nearest(%d) = (%d,%v); want (%d,%v)", c.target, kf.Step, ok, c.step, c.ok)
}
}
}

func TestKeyframeRing_Bytes(t *testing.T) {
r := NewKeyframeRing(10 * KeyframeBytes)
r.Push(Keyframe{Step: 0})
r.Push(Keyframe{Step: 1})
if got := r.Bytes(); got != 2*KeyframeBytes {
t.Errorf("Bytes = %d; want %d", got, 2*KeyframeBytes)
}
}

func TestKeyframeRing_NilSafe(t *testing.T) {
var r *KeyframeRing
r.Push(Keyframe{})
if r.Len() != 0 || r.Cap() != 0 || r.Bytes() != 0 {
t.Error("nil ring should report zero")
}
if _, ok := r.Nearest(5); ok {
t.Error("nil ring Nearest should be false")
}
}
1 change: 1 addition & 0 deletions docs/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ Bus chain: `CPU → tui.WBus → cpu.MMIO → cpu.RAM`
- #1, #2, #3, #7, #8 (cycle audit), #9 (65C02), #10 (IRQ/NMI), #11–#15

### Merged PRs of note
- Deep rewind via keyframes (issue #392, v1.3.0): the per-step `SnapshotRing` only reaches back its capacity (256 steps) — fine for "step back one", useless for "rewind into the last few million steps". Added keyframe-based deep rewind: `cpu.KeyframeRing` holds periodic full-RAM snapshots (`CPU.SnapshotFull` captures all 256 pages; one keyframe every `keyframeInterval`=4096 steps), and `:rewind N` reconstructs any earlier step by restoring the nearest keyframe ≤ target (`KeyframeRing.Nearest`) and replaying forward to the exact step (`rewindToStep` → `stepReplay` loop under a `replayingRewind` guard so replay doesn't re-capture keyframes). Small jumps still pop the fine ring exactly. `:rewind-budget MB` resizes the ring (cap = budget/64KiB); reach = cap × interval, shown in the status bar as `deep:<reach>@<budget>`. **Note — the issue's own numbers are mutually inconsistent**: full 64 KiB keyframes every 1k steps can't reach 10M under 256 MiB (that's ~4M). Used interval 4096 instead so 256 MiB reaches ~16.7M while forward-replay stays ≤4096 instructions (benchmarked **1.3 ms** incl. replay, vs the 100 ms acceptance). Memory is a *cap* not a reservation — the ring only fills to the run length; the old "fixed 256-entry ring" already sat at ≤16 MiB so the issue's "ring grows" framing was off. A step-0 keyframe is seeded on the first step so sub-interval targets are reachable. `StepCount` tracks position; `<` and reset keep it in sync. Determinism caveat: forward replay assumes deterministic execution between keyframes (buffered keyboard input is snapshotted, so it replays). Deltas-from-previous-keyframe compression is a future optimisation. No state-format change (StepCount/keyframes are ephemeral). `cpu` ring logic unit-tested apart from the TUI; deep-rewind exactness verified byte-for-byte against a RAM-mutating loop ROM.
- Trace replay — search / jump-to-cycle / diff (issue #391, v1.3.0): four navigation features on top of `-trace-replay` (issue #64's playback). (1) **`:find EXPR` / `:rfind EXPR`** — jump to the next/previous frame matching an expression over the frame's registers/flags, reusing the breakpoint-condition `expr` grammar against a scratch CPU loaded per frame (`framePredicate`). A bare `=` is normalised to `==` (`normalizeFindExpr`) so `:find PC=$8042` works as users type it; bare `:find` repeats the last expression to sweep matches. (2) **`:cycle N`** — `Replay.SeekCycle` binary-searches the monotonic cycle column (O(log N) on a 1M-frame trace). (3) **`-diff PATH`** — loads a second trace; `trace.Diff` walks both by index and returns the first `Frame.Equal` mismatch (or a length-mismatch divergence at the shorter trace's end) as `trace.Divergence{Index,Cycle,Found}`, computed eagerly in `WithReplayDiff` and surfaced in the status line. (4) **`d` / `D`** — `d` toggles a side-by-side diff overlay (`diffModal`, double-bordered like the help modal) centred on the primary cursor with mismatched frames in red + a `✗` gutter at the divergence; `D` jumps both cursors there. Pure-`trace` logic (SeekCycle/FindFunc/Diff/Frame.Equal) is unit-tested separately from the TUI wiring. No state-format change.
- Watch panel array expansion (issue #390, v1.3.0): `:watch` learns an `xN` (or `[N]`) array token — `:watch grid word x16` pins 16 consecutive LE words and renders them as indexed rows `grid[0..15]` (header `[16]`, first `maxWatchElemRows`=8 shown, rest collapsed to `… +N more`). Element width = the watch's `byte`/`word` kind; addresses are `Addr + i*Width`. `symbols.Table` now parses the cc65 `sym size=` field (`Size(addr)`) and seeds the count automatically when present — but **the issue's premise was false**: cc65 V2.18 `.dbg` carries *no* struct member layout, array bounds, or element types. C globals get bare `sym ... type=lab` records with no `size=`; even local `csym` records collapse every type to `type id=0 val="00"` (void). So struct-tree expansion is impossible from `.dbg` and the auto-seed rarely fires for data globals — `xN` is the workhorse. Scoped to array-only best-effort per that finding; struct overlays + DAP `variables` array children deferred (DAP has no globals scope yet). New `Watch.Count` is an optional v1 state field (omitempty, no schema bump). Tests: `symbols` size parse, `:watch xN`/`[N]` parsing + element addressing, panel render + truncation.
- Blargg `apu_test` 4/8 → 8/8 PASS — Mesen2 frame-counter substeps + DMC alignment (PRs #379-#382, nessy v0.10): wired Blargg's `apu_test.nes` (8 sub-tests: len_ctr, len_table, irq_flag, irq_timing, len_timing, irq_flag_timing, dmc_basics, dmc_rates) into the accuracy harness (#379) and closed every gap it surfaced over three follow-up PRs. (1) **6 internal frame-counter sub-steps** (#380) — Mesen2 `ApuFrameCounter.h:19` table encodes the user-visible 'step 3' of 4-step mode as 3 CPU cycles (29828, 29829, 29830) where IRQ asserts continuously and the half-frame tick fires at cycle 29829. chippy's 4-entry interval table from #377 fired the tick at 29828; replaced with `frameStepIntervalsNtsc4Step = [6]int{7456, 7458, 7457, 1, 1, 7457}` + 5-step analogue, switch in `advanceFrameStep` extended to 6 cases (step 3 = IRQ-only, step 4 = q+h+IRQ, step 5 = idle/reset for 4-step). Cleared 5-len_timing. (2) **DMC buffer-fill + enable-fetch + $4015 read** (#381) — three real-silicon DMC behaviors chippy was getting wrong: `maybeRefill` was silencing whenever `bufferEmpty=true` at the 8-bit boundary instead of only when `bytesRemaining=0` too; `setEnabled` didn't schedule the initial DMA fetch (Mesen `SetEnabled` does via `transferStartDelay`); $4015 read was clearing the DMC IRQ flag (per nesdev + Mesen `NesApu.cpp:101`, only frame-counter IRQ is cleared by $4015 read — DMC IRQ acks via $4015 write or $4010 bit-7 clear). dmcChannel now inits with `bufferEmpty=true`+`silenced=true`. Cleared 7-dmc_basics' 18 sub-tests. (3) **Mesen-aligned DMC Clock** (#382) — three compounding structural mismatches: chippy burned an extra 'reload-only' fire per byte (each byte = 9 fires instead of Mesen's 8), the timer reload was period+1 cycles between fires (429 vs Mesen's 428), and the fetch-schedule check only ran at byte boundaries. Replaced `clockShift`+`maybeRefill` with a unified `clock()` mirroring Mesen `DeltaModulationChannel::Run`'s inner body: always shift+decrement, reload at `bitsRemaining=0` boundary, schedule fetch on every clock when buffer-empty+bytes-pending. Initialise `bitsRemaining=8` (matches Mesen `Reset:36`). Cleared 8-dmc_rates' 16 rates × 2 boundary checks. **All four accuracy ROMs now PASS**: `ppu_vbl_nmi` 10/10, `instr_timing`, `cpu_interrupts_v2` 5/5, `apu_test` 8/8. No regression on nestest / Klaus / demo SHAs. The DMC restructure also fixes any ROM that uses delta samples — the rate timing was off by ~12% before. Refs #318 (rolling accuracy tracker).
Expand Down
1 change: 1 addition & 0 deletions internal/tui/complete.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ var defaultVerbs = func() []string {
"syms", "symbols",
"mem",
"find", "rfind", "cycle",
"rewind", "rewind-budget",
"trace",
"textsave",
"theme",
Expand Down
Loading
Loading