agnus: model the DDF sequencer start/stop flip-flops for FMODE=0 fetches#109
Merged
Conversation
Replace the value-range DDF window with the hardware's flop model: DDFSTRT/DDFSTOP comparator matches set/clear flip-flops, a stop request drains through one final fetch unit (which applies the per-plane modulos), the hardwired window ($18/$D8, HARDDIS $E0) gates starts and forces stops, and the sequencer state carries across line boundaries. The flop semantics are transcribed from vAmiga 4.4's Sequencer (OCS and ECS variants) as a free-standing module (chipset/ddf_sequencer.rs) plus a per-line walk (bus/ddf_line.rs) that drives the slot arbiter and the DMA capture loop. Register writes rebuild the line table at their hardware commit clocks: DDF writes reach the comparators four colour clocks after the write slot, an old DDFSTOP still fires on its commit clock while an old DDFSTRT does not, DMACON/BPLCON0 keep their 2/3-cck delays as strobes. Wide-FMODE (AGA quantum > 1) fetches keep the value-window plan. Missed or invalid comparators now produce the hardware behaviours a value range cannot express: a missed DDFSTOP runs to the hardware-stop drain, a DDFSTRT match past $D8 wraps the run through horizontal blanking into the next line, an early-blanked DDFSTRT ($10) never starts on OCS, and ECS's latched BPHSTART restarts runs at the hard window. Word addressing is unit-based so late-enabled planes keep their word positions. Ground truth: the vAmigaTS Agnus/DDF/DDF/oldhwstop1-4 A500 photos. The bottom swatch band there is a cumulative hash of every preceding row's fetched word count via the bitplane pointer progression, and the photos match the flop model's output (vAmiga's render) exactly while mismatching the old value-window model. Scores vs vAmiga 4.4 refs: oldhwstop1 51.5%->0.3%, oldhwstop2 50.2%->2.7%, oldhwstop3 59.8%->16.7%, oldhwstop4 53.4%->14.2% (the 3/4 residual is render-side placement of the wrapped rows, a follow-up); Agnus/DDF bucket mean 6.9%->4.2%. 18 value-window unit tests re-derived to the flop semantics with rule comments; new FSM rule suite and bus-level table tests. STATE_VERSION 15 (Bus gained the serialized sequencer flop state). A 15s KS1.3 boot runs ~12% faster than main (the per-cck table lookup replaces the window math and plan-cache probes) and its screenshot is byte-identical.
Rows whose DMA fetch diverges from the register-derived DDF window (the sequencer's missed-stop drains to the hardware stop, late starts) were still painted with the register-derived geometry: word plans, word count, and picture origin all disagreed with the words the capture actually fetched. The capture now records the run's first fetch-unit boundary in CapturedBitplaneRow (STATE_VERSION 16), and the renderer synthesizes the row's DDFSTRT/DDFSTOP from the captured origin and word count, so every register-derived derivation agrees with the DMA. Rows whose registers already match (all sane screens) synthesize nothing and stay byte-identical; runs wrapping through horizontal blanking (origin inside the hardware-blanked area) keep the register view for now. vAmigaTS Agnus/DDF (vs vAmiga 4.4 refs): oldhwstop3 16.7%->9.5%, oldhwstop4 14.2%->7.5%, single4 16.0%->3.1%, single5 8.1%->2.4%, hwstop2 5.2%->1.7%, hwstop4/5 10.0%->6.4%, hwstop6 7.3%->3.7%. KS1.3 boot, Inside the Machine, and Zool screenshots stay byte-identical to main.
Denise's shifter reloads on a fixed grid (8 colour clocks in lo-res, 4 in hi-res, 2 in SHRES at FMODE=0); a fetch unit starting off that grid has its data wait for the NEXT reload slot. The placement quantization therefore rounds UP, not down. Hardware-verified on the arosddf1 A500 ECS photo: the DDFSTRT $3C lo-res picture sits at the $40 reload slot relative to the copper-anchored ruler dashes, with both ruler ends agreeing on the position (framebuffer 252-259 measured against vAmiga's 254 and the old floor placement's 222). Every on-grid start - all previously calibrated cases including the Kickstart insert-disk art and the wide-FMODE gulp grids - is unchanged; wide FMODE keeps its calibrated floor alignment. vAmigaTS Agnus/DDF: arosddf1-3 12.8% -> 0.008%, arosddf4 12.7% -> 0.07%, ddf3/ddf4/ddf7/ddf8 1.6% -> 0.1%. AROS boot, KS1.3 boot, and Zool screenshots stay byte-identical to main.
LinuxJedi
added a commit
that referenced
this pull request
Jul 4, 2026
Sprite N's two DMA slots sit at colour clocks $15+4N and $17+4N (the HRM slot chart and vAmiga's DAS table). Copperline reserved a pair-shaped band four clocks later ($19..$37, two sprites per 8-cck band) and captured both sprites of a pair at one coarse position. Now each sprite owns its own two odd slots in the arbiter, capture happens at each sprite's first slot, and SPREN is sampled per slot (honouring the DMACON commit delay). The corrected band phase-locks the copper's free-running loops against sprite DMA: vAmigaTS Agnus/Copper/oldJump jumpbpu1 16.1% -> 0.016%, jumpbpu2 7.4% -> 0.000%, jumpbpu3 5.9% -> 0.000%, jumpbpu4 1.5% -> 0.001% (the precession residual documented in #109). The blitter timing family net-improves (timing8/9/9f/12/15 -2.2, timing6/10/15l +1..2 - the remaining spread sits in the blitter model). The sprena/sprdis DMACON sweeps (29-43%) still need the true two-slot fetch split - the atomic per-line sprite capture cannot express a word-level SPREN edge between a sprite's two slots; unchanged here. KS1.3, Inside the Machine, Hamazing, Gen-X, and Zool screenshots are byte-identical to main. Second Nature's falling-leaves scene renders identically with a small animation-phase drift (timing-sensitive demo; chip-bus contention legitimately changed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the value-range DDF window with the Agnus DDF sequencer flop model
for FMODE=0 bitplane fetches: DDFSTRT/DDFSTOP are comparator EDGES driving
flip-flops, a stop request drains through one final fetch unit (which
applies the modulos per plane), the hardwired window ($18/$D8, HARDDIS
$E0) gates starts and forces stops, and the state carries across line
boundaries.
src/chipset/ddf_sequencer.rs: the free-standing flop model (OCS andECS rule variants, transcribed from vAmiga 4.4's Sequencer).
src/bus/ddf_line.rs: per-line walk driving the slot arbiter and theDMA capture loop; register writes rebuild the line table at their
hardware commit clocks (DDF writes +4 ccks; an old DDFSTOP still fires
on its commit clock, an old DDFSTRT does not).
plan; vAmiga has no AGA counterpart to transcribe.
This gives Copperline the hardware's "old stop" behaviours a value range
cannot express: missed/invalid DDFSTOP runs continue to the hardware-stop
drain, a DDFSTRT match past $D8 starts a run that wraps through
horizontal blanking into the next line, an early-blanked DDFSTRT never
starts on OCS, and ECS's latched BPHSTART restarts runs at the hard
window.
Hardware arbitration
The vAmigaTS
Agnus/DDF/DDF/oldhwstop1-4tests exercise exactly thesebehaviours, and each has real A500 photos (OCS and ECS). The bottom
colour-swatch band in those tests is a cumulative hash of every preceding
row's fetched word count (a 5-plane interleaved image with modulos: every
row's word count moves the bitplane pointers), and the photos match
vAmiga's band exactly while mismatching the old Copperline model
completely - real hardware validates the flop model's per-row fetch
history in aggregate.
Scores (vs vAmiga 4.4 refs, X_SHIFT=0)
Full Agnus+Denise sweep audit (776 cases vs the post-#107 baseline):
Agnus mean 12.06% -> 10.99%, Denise flat. Large collateral improvements
from the same flop semantics: the whole Agnus/Registers/oldDMACON family
(bplen1-5/oldbplen1-5 22-47% -> 0.5-13%, bplon1-4 24-28% -> 0-0.5%,
bplon1h-4h 26-31% -> 12-18%), Agnus/Registers/BPLPTR drop/dropcpu
(15-16% -> 3.5%), DMACON resmod2-4 (8-12% -> 0.4-2.2%), BPLCON0 block0-3,
and blitter timing0/6/10/15l. Remaining small deltas are the
slot-cadence interplay in the blitter-timing family (+-2%, mixed
directions - the hires per-plane slot offsets are now the hardware ones,
so the residual sits in the blitter model) and the DDFTIM write-timing
family (parity on average; the +4-cck commit delay was sweep-calibrated,
3/4/5 all within 0.2%). One case regressed meaningfully: Copper
oldJump/jumpbpu2 (0% -> 7.4%) while its siblings improved - the
free-running copper-loop precession class, where any one-slot cadence
change re-phases the loop (a previous attempt to tune this class was
reverted; jumpbpu1 improved by the same mechanism).
diwv3/diwv4 pinned a subtlety: lines above the captured framebuffer must
not advance bitplane pointers (a DIWSTRT.V inside vertical blanking would
otherwise skew every visible row); the capture keeps the pre-FSM
semantics there, matching the vAmiga reference dumps.
The oldhwstop3/4 residuals are the render-side placement of the exotic
rows themselves (fetch runs that wrap through hblank render at a single
per-row origin today); the pointer progression - which dominates the
image - is already exact. Renderer run-origin support is the follow-up.
Verification
cargo test --releasegreen (1284 tests: 18 value-window testsre-derived to flop semantics with rule comments; new FSM rule suite +
bus-level table tests),
cargo clippy,cargo fmt --checkclean.and cache probes; a 15s KS1.3 boot runs ~12% faster than main.
set: Gen-X (110s/620s), Inside the Machine, Hamazing, KS1.3 boot,
Second Nature, A1200 KS3.1 boot (hires FMODE=3), Zool (AGA) - the FSM
reproduces the value model exactly on sane register patterns.
perf test, fails on clean main too).
Update: renderer run-origin support (second commit)
The follow-up landed in this PR too: the capture records each row's fetch
run origin, and the renderer synthesizes the row's DDF geometry from it,
so rows whose fetch diverges from the register window paint what the DMA
fetched. Further movement (vs the first commit): oldhwstop3 16.7% -> 9.5%,
oldhwstop4 14.2% -> 7.5%, single4 16.0% -> 3.1%, single5 8.1% -> 2.4%,
hwstop2 5.2% -> 1.7%, hwstop4/5 10.0% -> 6.4%, hwstop6 7.3% -> 3.7%.
STATE_VERSION 16. Byte-identity re-verified (KS1.3, Inside the Machine,
Zool). Remaining in this family: hblank-wrapped runs still paint with the
register view, and arosddf1-4 (12.7%) is a separate ECS class.
Update: shifter reload-grid placement (third commit)
A second photo arbitration rode along: the arosddf1 A500 ECS photo shows
the DDFSTRT $3C lo-res picture at the $40 shifter reload slot relative to
the copper-anchored ruler dashes (both ruler ends agree) - FMODE=0
placement rounds UP to the reload grid, not down. arosddf1-3 12.8% ->
0.008%, arosddf4 -> 0.07%, ddf3/4/7/8 1.6% -> 0.1%. On-grid starts (every
previously calibrated case) unchanged; AROS/KS1.3/Zool byte-identical.