v1.1 — OLED UI overhaul + HG_PIN_DAP XIP-cache-contention fix#4
Merged
Conversation
…[GATE-PENDING] Candidate fix for the v1.1 retryable CMSIS-DAP framing-desync regression that the interleaved A/B soak proved on the bench (v1.0 ~0.2% vs v1.1 ~1.4-3.0% retryable, 0 stalls both, same bench/host/cable — only the firmware image differing). Root cause (root-cause workflow + ELF artifact diff): v1.1's heavier +0 OLED render loop (new 1242 B ssd1306_blit, +804 B screen_home, ghost + status bar = 8 blits/frame vs 0) churns a larger flash-instruction working set through the shared 16 KB XIP cache every frame, evicting the flash-resident CMSIS-DAP response-framing path. The extra QSPI cache-refill latency lands inside the USB-IN response window -> wrong-command-ID / short-transfer / IN-timeout desyncs. 0-stall because the USB device ISR is RAM-resident and keeps acking the bus; it is added LATENCY, not a lock. Task PRIORITY cannot help — the XIP cache + QSPI bus are hardware shared regardless of which task runs (this falsifies the design-doc claim in docs/hackagotchiUI_upgrade_v1.1.md that the UI "cache footprint can't threaten DAP timing"). Fix: take the DAP path out of the contended cache entirely. HG_PIN_DAP=ON selects a custom linker script (memmap_hackagotchi_pin.ld = verbatim SDK rp2040 memmap_default.ld with the 7 DAP/USB-vendor transaction objects added to the flash .text EXCLUDE_FILE), so their .text runs from SRAM (via the .data section, copied at boot). Pinned objects: DAP.c sw_dp_pio.c probe.c tusb_edpt_handler.c usbd.c vendor_device.c dcd_rp2040.c — the complete USB-vendor -> DAP -> SWD-PIO call tree incl. dap_thread. NOT a switch to copy_to_ram: XIP-default stays for everything else. RESIDENCY CHANGE ONLY — no FreeRTOS priority change; nothing new runs at/above the DAP path. Cost ~18.7 KB of the +139 KB free-SRAM headroom (RAM use ~101 KB / 256 KB). Desk-verified: builds clean; analyze.sh PASS (0 source-file changes); nm confirms all 9 hot functions moved 0x10xx -> 0x20xx; HG_PIN_DAP=OFF is byte-identical to stock v1.1 (uf2 sha 4ada43a7). HARDWARE SOAK PENDING — bar before claiming the fix: ./tests/gates/gate1_soak.sh 500 # pinned <=0.4% / 0 stalls (vs v1.1 1.4-3.0%) Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
…PENDING]
A firmware-side WITNESS to the R1 "0 DAP transfer stalls" invariant: a monotonic
dap_xfers count + dap_idle_ms (time since the last DAP command), so every soak
can cross-check that transfers advanced by the expected amount and the probe was
live throughout. A soak whose counter never moves is a silent pass.
Wiring avoids an upstream source shadow: -Wl,--wrap=DAP_ExecuteCommand routes the
(stable CMSIS-DAP) execute call through __wrap_DAP_ExecuteCommand in the new owned
src/dap_health.c, which does the real work then records a non-blocking witness
(counter ++ and one time_us_32 read). Re-diff surface on a debugprobe bump (cf.
backlog #8 v2.3.1 spike) stays zero — no tusb_edpt_handler.c copy to maintain.
- src/dap_health.{c,h}: counters + getters; single-writer (the DAP task) / lock-
free readers (32-bit aligned word reads are atomic on Cortex-M0+; no lock on the
DAP path).
- cdc1_control.c: surface dap_xfers + dap_idle_ms in write_status (buffer 256->320
so the longer line can never silently truncate).
- host/hackagotchi_ctl.py: print a "probe dap_xfers=… dap_idle_ms=…" line.
RUNS AT DAP PRIORITY: the wrapper IS the DAP execute call. It is non-blocking, but
this is a DAP-PATH CHANGE and MUST be re-gated before merge:
ADVERSARIAL: build, flash, then on the bench:
./tests/gates/gate1_soak.sh 1000 # bar: 0 fails + 0 stalls
.venv/bin/python tests/m2/coexist_soak.py 300 # 0 stalls AND unchanged retryable rate
# cross-check: {"q":"status"} dap_xfers advances by ~the soak's transfer count
Desk-verified only so far: builds clean, analyze.sh PASS (dap_health.c +
cdc1_control.c 0/0), and the disassembly confirms dap_thread -> __wrap -> real.
On a BRANCH deliberately — do not merge to main until the soak is green.
Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
(cherry picked from commit 5183048bb260d59b0ab1b23763856b0c2827db87)
… image) When HG_PIN_DAP and the dap-health telemetry are both built, the --wrap routes every DAP_ExecuteCommand through __wrap_DAP_ExecuteCommand (src/dap_health.c). Add dap_health.c.o to the pin EXCLUDE_FILE list so the wrapper runs from SRAM too — otherwise it reintroduces one small flash-resident hop on the otherwise-pinned DAP path. No-op when dap_health.c.o is absent (EXCLUDE_FILE of a missing object matches nothing). Verified: combined image builds, analyze.sh PASS, nm shows __wrap_DAP_ExecuteCommand + DAP_ProcessCommand + tud_task_ext in SRAM. Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
The v1.1 DAP regression taught two transferable lessons; write them into the day-to-day contract and the bring-up playbook so the next render-heavy change doesn't relearn them: - firmware-conventions.md §2: the XIP cache is PRIORITY-BLIND. Priority schedules CPU, not the shared 16 KB XIP cache / QSPI bus. A non-blocking +0 task (the OLED render loop) can evict the flash-resident DAP path -> retryable USB framing desyncs that stay 0-stall (pass the R1 hard bar) yet regress the strict Gate-1 retryable rate. Mitigation: HG_PIN_DAP (pin DAP/USB objects to SRAM, residency only). Watch the retryable rate, not just stalls. - mcu-bringup-playbook.md §10: "0 stalls != no regression" + the interleaved A/B-against-the-shipped-image method (candidate last-in-time, power-cycle each, sha-matched gold .uf2; a power-cycle making it WORSE falsifies dirty-bench) + the dap_xfers transfer-counter cross-check against the silent pass. (Companion user skills run-hil-gate/firmware-gate updated in parallel.) Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
The v1.1 OLED overhaul churns the 16 KB XIP instruction cache enough to evict the flash-resident CMSIS-DAP framing path -- a 0-stall retryable-desync regression (~1.4-3.0% vs v1.0's ~0.2%, proven by interleaved A/B). HG_PIN_DAP routes the 7 DAP/USB transaction objects to SRAM via memmap_hackagotchi_pin.ld so they fall out of the contended cache. Flip the default OFF->ON in both CMakeLists.txt and build_fork.sh so a plain ./build_fork.sh ships the fixed image; HG_PIN_DAP=OFF still reproduces the pre-fix image for an A/B soak. DAP-path impact: residency change ONLY -- no FreeRTOS priority change, nothing new runs at or above the DAP path, no upstream edit. ~18.8 KB of the +139 KB XIP SRAM win. Pinned image soaks 0/500 (Gate 1 PASS). Verified on the artifact: nm shows all 7 objects at 0x2000xxxx; analyze.sh PASS. Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
…ct falsified cache claims - RELEASE_NOTES_v1.1.md: rewrite with a feature table, the honest "regression we caught and fixed" story, an upgrade-from-v1.0 path, and a verified-evidence block. - CHANGELOG.md [1.1]: add the HG_PIN_DAP XIP-contention fix + DAP-health telemetry section; drop the "candidate/pending" line. - release-readiness.md: retitle to v1.1, new release identity (ver 1.1.0, tag v1.1, artifact shas, HG_PIN_DAP=ON), and a new Section 0 documenting the finding -> A/B -> fix -> soak chain. - hackagotchiUI_upgrade_v1.1.md: correct the three falsified cache-thrash claims (annotated [Corrected after v1.1 HIL], not silently rewritten) -- the light v1.0 UI never threatened the XIP cache, the heavy v1.1 UI does. Honest reporting: the combined 1000-cycle soak's strict Gate-1 verdict was FAIL (fails=2, 0.2%, 0 stalls) on a non-idle host; the docs state that verbatim and ship on the strength of the A/B (~7-15x regression eliminated) + the pure-pin 0/500 PASS, not on that run. An idle-host 1000-cycle re-run is the one open item. Claims adversarially re-verified against the artifact + recorded soak outputs. Signed-off-by: Pratheek Balakrishna <pratheekb96@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cuts v1.1 (
ver 1.1.0, tagv1.1): the OLED UI overhaul (cat + Spectre) plus theHG_PIN_DAPfix it turned out to need, plus DAP-health telemetry.The story
The v1.1 render loop (blit engine + ghost compositing, ~8 blits/frame) churns the 16 KB XIP instruction cache enough to evict the flash-resident CMSIS-DAP framing path → a 0-stall retryable-desync regression (~1.4–3.0% vs v1.0's ~0.2%, proven by interleaved A/B against the shipped v1.0 image). Priority can't help — it schedules CPU, not the shared cache. Fix:
HG_PIN_DAPpins the 7 DAP/USB transaction objects to SRAM (residency only — no priority change, no upstream edit), ON by default. Pinned image soaks 0/500.What's in this PR
build(dap): flipHG_PIN_DAPdefault OFF→ON (CMakeLists + build_fork.sh).memmap_hackagotchi_pin.ld),dap_health--wrapwitness, anddap_xfers/dap_idle_mstelemetry.docs(release): v1.1 release notes / CHANGELOG / release-readiness §0; corrected the 3 falsified cache-thrash claims in the design doc.Verification (artifact-decoded, not source-trusted)
analyze.shPASS;ver=1.1.0;nmconfirms all 7 DAP/USB objects in SRAM (0x2000xxxx)..uf2sha256b0826090…d683·.elf56c5c975…5bec.FAIL(fails=2, 0.2%, 0 stalls) on a non-idle host; we ship on the A/B + the 0/500, with an idle-host 1000-cycle re-run as the one open item. Seedocs/release-readiness.md§0.After merge: tag
v1.1+ GitHub release (.uf2+.elf+ NOTICE + LICENSE).🤖 Generated with Claude Code