Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ firmware's own semver is compiled into the binary and reported by `{"q":"status"

## [1.1] - 2026-06-22

_Candidate: the firmware on this commit builds + passes the static-analysis gate; on-hardware
re-attestation and the GitHub release are pending (see [`docs/release-readiness.md`](docs/release-readiness.md)).
Drop this line when v1.1 is published._
The **OLED UI overhaul** — a cat + Spectre the ghost give the probe a face — shipped together with the
**`HG_PIN_DAP` reliability fix** that the overhaul turned out to need, plus DAP health telemetry. Still
**R1-clean: 0 DAP transfer stalls.**

### OLED UI overhaul — the cat + Spectre the GhostLabs ghost (M-UI-1..5)
A full dashboard glow-up where **every flicker of personality is a literal readout of a real
Expand All @@ -34,6 +34,33 @@ camera-free text-attestation model.
Footprint: text +3.3 KB, bss +164 B total. Static-analysis gate green; the blit host unit test and an
extended `screen_hil.py` attest the new surfaces.

### DAP reliability — XIP-cache-contention fix (`HG_PIN_DAP`, ON by default)
The heavier render loop above exposed a subtle, **0-stall** regression and we fixed it before shipping:

- **The finding.** The firmware runs from flash XIP through a 16 KB instruction cache. The v1.1 render
loop (blit engine + ghost compositing, ~8 blits/frame) churns a large enough flash-instruction working
set to **evict the flash-resident CMSIS-DAP framing path** from that cache; the next transaction pays a
QSPI refill inside the USB-IN response window → **retryable** DAP framing desyncs (wrong command-ID /
short-transfer / IN-timeout). It stays **0-stall** (the RAM-resident USB ISR keeps acking the bus — added
*latency*, not a lock), so it passed the R1 hard bar yet regressed the strict Gate-1 *retryable* rate the
shipped v1.0 met (**~1.4–3.0% vs v1.0's ~0.2%**). Priority can't save it — when the DAP task runs, the
cache is already polluted. Proven by an **interleaved A/B against the shipped v1.0 image** on one bench
(candidate last-in-time → not bench drift). The *light* v1.0 UI never crossed this threshold; the v1.1 UI does.
- **The fix — `HG_PIN_DAP`.** A gated custom linker script (`memmap_hackagotchi_pin.ld`) drops the 7
DAP/USB transaction objects from flash `.text` (via `EXCLUDE_FILE`) so they run from **SRAM**, out of the
contended cache. **Residency change only — no FreeRTOS priority change, no upstream edit** — costing
~18.8 KB of the +139 KB XIP SRAM win. The pinned image soaks **0/500** (Gate 1 PASS, cleaner than v1.0's
own ~0.2%) — the clean reference. (A longer 1000-cycle run of the pin+telemetry image on a *non-idle*
host stayed **0-stall** but logged 2 retryable desyncs on one cycle — a strict-bar `FAIL` at ~0.2%, the
v1.0-class floor; an idle-host re-run for a clean 0/N is the one open item — see `release-readiness.md` §0.)
**ON by default from v1.1**; build `HG_PIN_DAP=OFF` to reproduce the pre-fix image for an A/B soak.
- **DAP health telemetry.** `{"q":"status"}` now reports `dap_xfers` (monotonic CMSIS-DAP commands
executed, via a `--wrap=DAP_ExecuteCommand` witness) and `dap_idle_ms` — a machine-checkable liveness
cross-check so a "clean" soak can't silently pass with a dead probe. The witness object is itself pinned.

See [`docs/firmware-conventions.md`](docs/firmware-conventions.md) §2 ("the XIP cache is priority-blind"),
[`docs/mcu-bringup-playbook.md`](docs/mcu-bringup-playbook.md) §10, and `docs/release-readiness.md`.

## [1.0] - 2026-06-21

First public release. A debug probe that is *also* a black-box flight recorder and a reactive,
Expand Down
113 changes: 86 additions & 27 deletions docs/RELEASE_NOTES_v1.1.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,97 @@
## Hackagotchi probe firmware — v1.1 🛠️🐾👻

The **"a debug probe with a soul"** release. v1.0 shipped the three-roles-on-one-RP2040 core
(probe + black-box recorder + dashboard); v1.1 is the **OLED UI overhaul** that gives it a face —
a cat and **Spectre the ghost**, where *every flicker of personality is a literal readout of a real
probe/recorder signal*. Still R1-clean (0 DAP transfer stalls): the whole character layer renders at
idle priority off snapshot-only reads, never on the DAP hot path.

### What's new since v1.0
- 👻 **Spectre, the ghost** — its state *is* the target board's soul: dozing (quiet) / live (talking) /
pale (wedged) / glitch (SD fault) / exorcised, driven from real UART liveness. Attested `g:<state>`.
- 🐱 **Cat moods** — sleep / content / hunting / alert from existing signals; flying-data particle
speed scales with live throughput. Attested `cat:<mood>`.
- 🎨 **A real graphics engine** — `ssd1306_blit()`, a clipped 1-bit sprite blit (OR / ANDNOT / XOR)
with a pico-free, host-unit-tested core and an ASCII-art sprite pipeline. Sprites are flash-resident (~0 RAM).
- 📊 **Persistent status bar** on every screen (REC / SD glyphs + a ghost pip), attested as a `BAR …` line.
- 💀 **Resurrection tally** — wedge→recover + fault counts, edge-counted in the 50 Hz SD task.
- 🕹️ **Companion interaction over CDC1** (no physical button): `pet`, `summon`/`banish`, `exorcise`
(a host flasher fires it after a clean reflash), `ghost` (mute → pure-instrument cluster), `theme`
(motion density).

Footprint over v1.0: text +3.3 KB, bss +164 B. Everything else from v1.0 is unchanged.

### Flash it (no toolchain needed)
**The "give it a soul" release.** v1.0 shipped the hard part — a CMSIS-DAP debug probe, a UART-to-microSD
black-box recorder, and a reactive OLED dashboard, **all three on one single-core RP2040 without ever
stalling a flash**. v1.1 gives that machine a *face*: a cat and **Spectre, the ghost**, where every
flicker of personality is a **literal readout of a real probe/recorder signal** — not decoration.

And because we hold ourselves to "prove it on hardware," the heavier graphics exposed a subtle, *zero-stall*
timing regression — so we caught it, root-caused it, and **fixed it before shipping** (the `HG_PIN_DAP`
story below). Still **R1-clean: 0 DAP transfer stalls.**

---

### 🆕 What's new since v1.0

| | Feature | Attested as |
|---|---|---|
| 👻 | **Spectre, the ghost** — its state *is* the target's soul: dozing (quiet) / live (talking) / pale (wedged) / glitch (SD fault) / exorcised, driven from real UART liveness | `g:<state>` |
| 🐱 | **Cat moods** — sleep / content / hunting / alert from live signals; flying-data particle speed scales with throughput | `cat:<mood>` |
| 🎨 | **A real graphics engine** — `ssd1306_blit()`, a clipped 1-bit sprite blit (OR / ANDNOT / XOR), host-unit-tested, fed by an ASCII-art sprite pipeline. Sprites are flash-resident (~0 RAM) | `blit_test.c` |
| 📊 | **Persistent status bar** on every screen (REC / SD glyphs + a ghost pip) | `BAR …` line |
| 💀 | **Resurrection tally** — wedge→recover + fault counts, edge-counted in the 50 Hz SD task (never misses a fast edge) | on UPTIME |
| 🕹️ | **Companion interaction over CDC1** (no button): `pet`, `summon`/`banish`, `exorcise` (auto-fired after a clean reflash), `ghost` (mute → pure instrument), `theme` (motion density) | CDC1 verbs |
| 🛡️ | **`HG_PIN_DAP`** — DAP/USB hot path pinned to SRAM (the XIP-cache fix below). **On by default.** | `dap_xfers` = live |
| 📈 | **DAP health telemetry** — `{"q":"status"}` now reports `dap_xfers` (transfers executed) + `dap_idle_ms` | `status` reply |

The whole character layer renders at the **lowest priority off snapshot-only reads** — it never touches
the DAP hot path. (v1.0's full feature set — probe, recorder, two CDCs, crash box, watchdog, SD explorer —
is all unchanged and carried forward.)

---

### 🛡️ Under the hood: the regression we caught (and fixed)

A debugger you can't trust isn't a debugger. So this is worth a paragraph.

The firmware runs from **flash XIP** through a 16 KB instruction cache (that choice buys +139 KB of SRAM at
identical DAP throughput). The new render loop churns a big enough flash-instruction working set that it
**evicts the CMSIS-DAP framing path from that cache** — and the next probe transaction pays a flash-refill
delay right inside the USB response window. The result was **retryable** DAP desyncs at **~1.4–3.0%**, where
shipped v1.0 sat at **~0.2%**. Crucially it was **still 0-stall** (the RAM-resident USB ISR keeps the bus
acked — it's added *latency*, not a hang), so it slipped past the hard correctness bar while quietly
regressing a softer one. **Task priority doesn't help here: priority schedules the CPU, not the shared
cache.** We proved it was the firmware (not bench noise) with an **interleaved A/B against the actual
shipped v1.0 image** on one bench, candidate last-in-time.

**The fix (`HG_PIN_DAP`, on by default):** a linker variant pins just the 7 DAP/USB transaction objects into
**SRAM**, out of the contended cache — a *residency* change only (no priority change, no upstream edit,
~18.8 KB of the SRAM headroom). The pinned image soaks **0/500**, back at/below the v1.0 floor, 0 stalls
throughout. And to make sure a "clean" soak can never silently pass with a dead probe, the probe now
self-reports a monotonic `dap_xfers` counter you can read before/after.

→ Full write-up: `docs/firmware-conventions.md` §2, `docs/mcu-bringup-playbook.md` §10, `docs/release-readiness.md` §0.

---

### ⬆️ Flash it / upgrade from v1.0 (no toolchain needed)

1. Download **`hackagotchi_probe.uf2`** below.
2. BOOTSEL the XIAO (or, on a running unit, send `{"q":"bootsel"}` to the control port — hands-free).
2. Enter BOOTSEL — on a running v1.0 unit, **hands-free**: send `{"q":"bootsel"}` to the control port
(there's no button — GP27 is SWDIO). Otherwise BOOTSEL the XIAO at power-up.
3. `picotool load -x hackagotchi_probe.uf2` (or drag the `.uf2` onto the `RPI-RP2` drive).
4. Confirm — send `{"q":"status"}` to the control serial port → `{"fw":"Hackagotchi","ver":"1.1.0",…}`.
4. Confirm — send `{"q":"status"}` to the control serial port → `{"fw":"Hackagotchi","ver":"1.1.0",…,"dap_xfers":…}`.

Settings persist across the upgrade. No re-wiring; same pin map as v1.0.

---

### ✅ Verified on this image

- **Build + static-analysis gate** (`analyze.sh`) — PASS; the two pristine TUs stay 0-warning.
- **Artifact provenance** — `strings` → `ver 1.1.0`; `nm` confirms all 7 DAP/USB objects resident in SRAM
(`0x2000xxxx`), i.e. the fix is actually in the binary, not just the source.
- **DAP under sustained flash** — the pinned image soaks **0/500** (Gate 1, **0 stalls**) — the clean
reference. A longer 1000-cycle run on a *non-idle* host stayed **0-stall** but logged **2 retryable
(recoverable) desyncs on a single cycle** — a strict-bar miss at ~0.2% (the v1.0-class idle floor), with
a live `dap_xfers` witness (~2 M transfers serviced) confirming the probe never went dark. The fix is
proven by the A/B (the ~7–15× regression is gone) + the 0/500; an idle-host 1000-cycle re-run for a
clean 0/N is the one open item (`docs/release-readiness.md` §0).
- The v1.0 HIL suite (probe / bridge / recorder / crash box / watchdog / CDC / SD) carries forward —
unaffected by a `+0` UI layer + a residency change. See `docs/release-readiness.md`.

---

### 📦 Assets

### Assets
`hackagotchi_probe.uf2` · `hackagotchi_probe.elf` (to symbolicate crash dumps) · `THIRD-PARTY-NOTICES.md` · `LICENSE`

### Notes
### 📝 Notes

- The firmware reports its own version live (`{"q":"status"}` → `ver` = `1.1.0`).
- **License:** project GPL-3.0-or-later; the `firmware/c/` subtree MIT. All dependencies permissive (MIT / BSD-3 / Apache-2.0).
- ⚠️ Under an *artificial* continuous-max-SD soak, retryable (0-stall) DAP errors can appear — ~0 in real use (the target is halted during a real flash). Run soaks on an idle host.
- ⚠️ Under an *artificial* continuous-max-SD soak, retryable (0-stall) DAP errors can still appear — ~0 in
real use (the target is halted during a real flash). Run soaks on an idle host. Some target boards
re-glitch their QSPI under sustained hammering (still 0 stalls) — power-cycle the target between long soaks.

📓 Full changelog: `CHANGELOG.md` · 🔧 build from source: `docs/c-firmware-build.md` · 🛠️ build a unit: `docs/build-a-hackagotchi.md`
20 changes: 20 additions & 0 deletions docs/firmware-conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,26 @@ must not busy-wait or do slow I/O, or it delays the probe:
wedge (so do **not** watchdog the dashboard task; watchdog TUD instead — it is never legitimately
starved because DAP is below it).

**The XIP cache is priority-blind (Finding, v1.1).** Priority arbitrates CPU *scheduling*, not the
shared **16 KB XIP instruction cache or the QSPI flash bus**. The image runs from flash XIP and the
entire CMSIS-DAP response-framing path (`DAP_ProcessCommand`, `SWD_Transfer`, `tud_task_ext`,
`usbd_edpt_xfer`, the vendor/CDC endpoint glue) is flash-resident — only the USB device ISR
(`dcd_rp2040_irq`) is in SRAM. A *non-blocking* +0 task that churns a large flash-instruction working
set every frame (v1.1's OLED render loop: new blit engine + sprites + ghost compositing, 8 blits/frame)
**evicts the DAP path's cache lines**; the next transaction pays a QSPI refill that lands inside the
USB-IN response window → **retryable** DAP framing desyncs (wrong command-ID / short-transfer /
IN-timeout). It stays **0-stall** (the RAM-resident ISR keeps acking the bus — added *latency*, not a
lock), so it PASSES the R1 hard bar yet **regresses the strict Gate-1 retryable rate the shipped image
met**. Preemption can't save it: when the DAP task runs, the cache is *already* polluted. Proven by
interleaved A/B on the bench (v1.1 ~1.4–3.0% retryable vs v1.0 ~0.2%, 0 stalls throughout, candidate
last-in-time → not bench drift). **Mitigation: `HG_PIN_DAP`** — a gated linker variant
(`memmap_hackagotchi_pin.ld`) drops the DAP/USB transaction objects from flash `.text` via
`EXCLUDE_FILE` so they run from SRAM, out of the contended cache (no upstream edit, **residency-only —
no priority change**, ~19 KB of the +139 KB free SRAM). The pinned image soaked **0/500**. Corollary:
when you add continuous flash-resident work, **watch the DAP *retryable* rate, not just stalls**, and
A/B it against the shipped image — and the old "the DAP hot path stays warm in XIP without
`__not_in_flash_func`" assumption holds only for a light UI; a heavy render loop crosses the threshold.

## 3. Bounded buffers, counted overflow — never silent loss

Every queue/ring/buffer has a fixed cap and a **counter** for what it had to drop, surfaced in the
Expand Down
Loading
Loading