Skip to content

agnus: model the DDF sequencer start/stop flip-flops for FMODE=0 fetches#109

Merged
LinuxJedi merged 3 commits into
mainfrom
fix/ddf-sequencer-fsm
Jul 4, 2026
Merged

agnus: model the DDF sequencer start/stop flip-flops for FMODE=0 fetches#109
LinuxJedi merged 3 commits into
mainfrom
fix/ddf-sequencer-fsm

Conversation

@LinuxJedi

@LinuxJedi LinuxJedi commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

Replace the value-range DDF window with the Agnus DDF sequencer flop model
for FMODE=0 bitplane fetches: DDFSTRT/DDFSTOP are comparator EDGES driving
flip-flops, a stop request drains through one final fetch unit (which
applies the modulos per plane), the hardwired window ($18/$D8, HARDDIS
$E0) gates starts and forces stops, and the state carries across line
boundaries.

  • src/chipset/ddf_sequencer.rs: the free-standing flop model (OCS and
    ECS rule variants, transcribed from vAmiga 4.4's Sequencer).
  • src/bus/ddf_line.rs: per-line walk driving the slot arbiter and the
    DMA capture loop; register writes rebuild the line table at their
    hardware commit clocks (DDF writes +4 ccks; an old DDFSTOP still fires
    on its commit clock, an old DDFSTRT does not).
  • Wide-FMODE (AGA quantum > 1) fetches keep the existing value-window
    plan; vAmiga has no AGA counterpart to transcribe.

This gives Copperline the hardware's "old stop" behaviours a value range
cannot express: missed/invalid DDFSTOP runs continue to the hardware-stop
drain, a DDFSTRT match past $D8 starts a run that wraps through
horizontal blanking into the next line, an early-blanked DDFSTRT never
starts on OCS, and ECS's latched BPHSTART restarts runs at the hard
window.

Hardware arbitration

The vAmigaTS Agnus/DDF/DDF/oldhwstop1-4 tests exercise exactly these
behaviours, and each has real A500 photos (OCS and ECS). The bottom
colour-swatch band in those tests is a cumulative hash of every preceding
row's fetched word count (a 5-plane interleaved image with modulos: every
row's word count moves the bitplane pointers), and the photos match
vAmiga's band exactly while mismatching the old Copperline model
completely - real hardware validates the flop model's per-row fetch
history in aggregate.

Scores (vs vAmiga 4.4 refs, X_SHIFT=0)

test before after
Agnus/DDF/DDF/oldhwstop1 51.5% 0.31%
Agnus/DDF/DDF/oldhwstop2 50.2% 2.7%
Agnus/DDF/DDF/oldhwstop3 59.8% 16.7%
Agnus/DDF/DDF/oldhwstop4 53.4% 14.2%
Agnus/DDF bucket mean (66 cases) 6.9% 4.2%

Full Agnus+Denise sweep audit (776 cases vs the post-#107 baseline):
Agnus mean 12.06% -> 10.99%, Denise flat. Large collateral improvements
from the same flop semantics: the whole Agnus/Registers/oldDMACON family
(bplen1-5/oldbplen1-5 22-47% -> 0.5-13%, bplon1-4 24-28% -> 0-0.5%,
bplon1h-4h 26-31% -> 12-18%), Agnus/Registers/BPLPTR drop/dropcpu
(15-16% -> 3.5%), DMACON resmod2-4 (8-12% -> 0.4-2.2%), BPLCON0 block0-3,
and blitter timing0/6/10/15l. Remaining small deltas are the
slot-cadence interplay in the blitter-timing family (+-2%, mixed
directions - the hires per-plane slot offsets are now the hardware ones,
so the residual sits in the blitter model) and the DDFTIM write-timing
family (parity on average; the +4-cck commit delay was sweep-calibrated,
3/4/5 all within 0.2%). One case regressed meaningfully: Copper
oldJump/jumpbpu2 (0% -> 7.4%) while its siblings improved - the
free-running copper-loop precession class, where any one-slot cadence
change re-phases the loop (a previous attempt to tune this class was
reverted; jumpbpu1 improved by the same mechanism).

diwv3/diwv4 pinned a subtlety: lines above the captured framebuffer must
not advance bitplane pointers (a DIWSTRT.V inside vertical blanking would
otherwise skew every visible row); the capture keeps the pre-FSM
semantics there, matching the vAmiga reference dumps.

The oldhwstop3/4 residuals are the render-side placement of the exotic
rows themselves (fetch runs that wrap through hblank render at a single
per-row origin today); the pointer progression - which dominates the
image - is already exact. Renderer run-origin support is the follow-up.

Verification

  • cargo test --release green (1284 tests: 18 value-window tests
    re-derived to flop semantics with rule comments; new FSM rule suite +
    bus-level table tests), cargo clippy, cargo fmt --check clean.
  • STATE_VERSION 15 (Bus gained the serialized sequencer flop state).
  • Performance: the per-cck table lookup replaces the memoized window math
    and cache probes; a 15s KS1.3 boot runs ~12% faster than main.
  • Demo screenshots byte-identical to current main across the whole gate
    set: Gen-X (110s/620s), Inside the Machine, Hamazing, KS1.3 boot,
    Second Nature, A1200 KS3.1 boot (hires FMODE=3), Zool (AGA) - the FSM
    reproduces the value model exactly on sane register patterns.
  • image_regression: 7 pass (ocs_bpu7_ham is the documented host-flaky
    perf test, fails on clean main too).

Update: renderer run-origin support (second commit)

The follow-up landed in this PR too: the capture records each row's fetch
run origin, and the renderer synthesizes the row's DDF geometry from it,
so rows whose fetch diverges from the register window paint what the DMA
fetched. Further movement (vs the first commit): oldhwstop3 16.7% -> 9.5%,
oldhwstop4 14.2% -> 7.5%, single4 16.0% -> 3.1%, single5 8.1% -> 2.4%,
hwstop2 5.2% -> 1.7%, hwstop4/5 10.0% -> 6.4%, hwstop6 7.3% -> 3.7%.
STATE_VERSION 16. Byte-identity re-verified (KS1.3, Inside the Machine,
Zool). Remaining in this family: hblank-wrapped runs still paint with the
register view, and arosddf1-4 (12.7%) is a separate ECS class.

Update: shifter reload-grid placement (third commit)

A second photo arbitration rode along: the arosddf1 A500 ECS photo shows
the DDFSTRT $3C lo-res picture at the $40 shifter reload slot relative to
the copper-anchored ruler dashes (both ruler ends agree) - FMODE=0
placement rounds UP to the reload grid, not down. arosddf1-3 12.8% ->
0.008%, arosddf4 -> 0.07%, ddf3/4/7/8 1.6% -> 0.1%. On-grid starts (every
previously calibrated case) unchanged; AROS/KS1.3/Zool byte-identical.

LinuxJedi added 3 commits July 4, 2026 14:31
Replace the value-range DDF window with the hardware's flop model:
DDFSTRT/DDFSTOP comparator matches set/clear flip-flops, a stop request
drains through one final fetch unit (which applies the per-plane
modulos), the hardwired window ($18/$D8, HARDDIS $E0) gates starts and
forces stops, and the sequencer state carries across line boundaries.
The flop semantics are transcribed from vAmiga 4.4's Sequencer (OCS and
ECS variants) as a free-standing module (chipset/ddf_sequencer.rs) plus
a per-line walk (bus/ddf_line.rs) that drives the slot arbiter and the
DMA capture loop. Register writes rebuild the line table at their
hardware commit clocks: DDF writes reach the comparators four colour
clocks after the write slot, an old DDFSTOP still fires on its commit
clock while an old DDFSTRT does not, DMACON/BPLCON0 keep their 2/3-cck
delays as strobes. Wide-FMODE (AGA quantum > 1) fetches keep the
value-window plan.

Missed or invalid comparators now produce the hardware behaviours a
value range cannot express: a missed DDFSTOP runs to the hardware-stop
drain, a DDFSTRT match past $D8 wraps the run through horizontal
blanking into the next line, an early-blanked DDFSTRT ($10) never
starts on OCS, and ECS's latched BPHSTART restarts runs at the hard
window. Word addressing is unit-based so late-enabled planes keep their
word positions.

Ground truth: the vAmigaTS Agnus/DDF/DDF/oldhwstop1-4 A500 photos. The
bottom swatch band there is a cumulative hash of every preceding row's
fetched word count via the bitplane pointer progression, and the photos
match the flop model's output (vAmiga's render) exactly while
mismatching the old value-window model. Scores vs vAmiga 4.4 refs:
oldhwstop1 51.5%->0.3%, oldhwstop2 50.2%->2.7%, oldhwstop3 59.8%->16.7%,
oldhwstop4 53.4%->14.2% (the 3/4 residual is render-side placement of
the wrapped rows, a follow-up); Agnus/DDF bucket mean 6.9%->4.2%.

18 value-window unit tests re-derived to the flop semantics with rule
comments; new FSM rule suite and bus-level table tests. STATE_VERSION 15
(Bus gained the serialized sequencer flop state). A 15s KS1.3 boot runs
~12% faster than main (the per-cck table lookup replaces the window
math and plan-cache probes) and its screenshot is byte-identical.
Rows whose DMA fetch diverges from the register-derived DDF window (the
sequencer's missed-stop drains to the hardware stop, late starts) were
still painted with the register-derived geometry: word plans, word count,
and picture origin all disagreed with the words the capture actually
fetched. The capture now records the run's first fetch-unit boundary in
CapturedBitplaneRow (STATE_VERSION 16), and the renderer synthesizes the
row's DDFSTRT/DDFSTOP from the captured origin and word count, so every
register-derived derivation agrees with the DMA. Rows whose registers
already match (all sane screens) synthesize nothing and stay
byte-identical; runs wrapping through horizontal blanking (origin inside
the hardware-blanked area) keep the register view for now.

vAmigaTS Agnus/DDF (vs vAmiga 4.4 refs): oldhwstop3 16.7%->9.5%,
oldhwstop4 14.2%->7.5%, single4 16.0%->3.1%, single5 8.1%->2.4%,
hwstop2 5.2%->1.7%, hwstop4/5 10.0%->6.4%, hwstop6 7.3%->3.7%.
KS1.3 boot, Inside the Machine, and Zool screenshots stay byte-identical
to main.
Denise's shifter reloads on a fixed grid (8 colour clocks in lo-res, 4 in
hi-res, 2 in SHRES at FMODE=0); a fetch unit starting off that grid has
its data wait for the NEXT reload slot. The placement quantization
therefore rounds UP, not down. Hardware-verified on the arosddf1 A500 ECS
photo: the DDFSTRT $3C lo-res picture sits at the $40 reload slot
relative to the copper-anchored ruler dashes, with both ruler ends
agreeing on the position (framebuffer 252-259 measured against vAmiga's
254 and the old floor placement's 222). Every on-grid start - all
previously calibrated cases including the Kickstart insert-disk art and
the wide-FMODE gulp grids - is unchanged; wide FMODE keeps its
calibrated floor alignment.

vAmigaTS Agnus/DDF: arosddf1-3 12.8% -> 0.008%, arosddf4 12.7% -> 0.07%,
ddf3/ddf4/ddf7/ddf8 1.6% -> 0.1%. AROS boot, KS1.3 boot, and Zool
screenshots stay byte-identical to main.
@LinuxJedi LinuxJedi merged commit 6293186 into main Jul 4, 2026
8 checks passed
@LinuxJedi LinuxJedi deleted the fix/ddf-sequencer-fsm branch July 4, 2026 14:31
LinuxJedi added a commit that referenced this pull request Jul 4, 2026
Sprite N's two DMA slots sit at colour clocks $15+4N and $17+4N (the
HRM slot chart and vAmiga's DAS table). Copperline reserved a pair-shaped
band four clocks later ($19..$37, two sprites per 8-cck band) and
captured both sprites of a pair at one coarse position. Now each sprite
owns its own two odd slots in the arbiter, capture happens at each
sprite's first slot, and SPREN is sampled per slot (honouring the DMACON
commit delay).

The corrected band phase-locks the copper's free-running loops against
sprite DMA: vAmigaTS Agnus/Copper/oldJump jumpbpu1 16.1% -> 0.016%,
jumpbpu2 7.4% -> 0.000%, jumpbpu3 5.9% -> 0.000%, jumpbpu4 1.5% -> 0.001%
(the precession residual documented in #109). The blitter timing family
net-improves (timing8/9/9f/12/15 -2.2, timing6/10/15l +1..2 - the
remaining spread sits in the blitter model). The sprena/sprdis DMACON
sweeps (29-43%) still need the true two-slot fetch split - the atomic
per-line sprite capture cannot express a word-level SPREN edge between a
sprite's two slots; unchanged here.

KS1.3, Inside the Machine, Hamazing, Gen-X, and Zool screenshots are
byte-identical to main. Second Nature's falling-leaves scene renders
identically with a small animation-phase drift (timing-sensitive demo;
chip-bus contention legitimately changed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant