diff --git a/solutions/LP-0017.md b/solutions/LP-0017.md new file mode 100644 index 0000000..e48bf37 --- /dev/null +++ b/solutions/LP-0017.md @@ -0,0 +1,189 @@ +# Solution: LP-0017 — Whistleblower + +**Submitted by:** edenbd1 + +## Summary + +A censorship-resistant document upload and indexing application, built end-to-end on the Logos stack. A Basecamp Qt6/QML plugin uploads a file to Logos Storage, broadcasts the CID over Logos Delivery, and optionally anchors it on-chain via a SPEL LEZ registry program. A permissionless batch-anchor CLI subscribes to the Delivery topic, deduplicates CIDs against the on-chain registry state, and commits batches of up to 50 CIDs in a single transaction. + +Everything has been deployed live on the **public LEZ testnet** at `https://testnet.lez.logos.co`. Six on-chain transactions (deploy, registry init, n=1 batch, n=50 batch, plus two account-init operations) are recorded with hashes anyone can verify via the public sequencer's `getTransaction` JSON-RPC or the public block explorer at `https://explorer.testnet.lez.logos.co`. The registry program-derived account (`A9ewyji3THdFGqLAtAd9GkoPX9B9R6yb5LZCfWLxbAeH`) holds 51 anchored CIDs in 6583 bytes of Borsh-encoded state — exactly matching the theoretical layout `4 + 51 × 129` bytes. + +The whole pipeline was developed without any mock layers between the application and Logos infrastructure: `batch-anchor` refuses to start without a reachable nwaku endpoint, Codex uploads go through the real REST API, and the e2e CI workflow asserts the `RISC0_DEV_MODE=0` banner is present in stdout before passing. + +## Repository + +- **Repo:** https://github.com/edenbd1/lp-0017-whistleblower (MIT OR Apache-2.0) +- **Narrated demo video:** https://youtu.be/J7eCklx3gEg +- **Per-criterion compliance map:** https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/SPEC_COMPLIANCE.md +- **Deployment evidence (6 public tx hashes):** https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/DEPLOYMENT.md +- **CU benchmarks (measured live on public testnet):** https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/BENCHMARKS.md +- **Basecamp `.lgx` plugin asset:** https://github.com/edenbd1/lp-0017-whistleblower/releases/tag/v0.1.0-rc1 + +## Approach + +### Architecture + +``` + Basecamp Qt6/QML plugin + (app/whistleblower) + │ + publishFile() │ anchor() + ▼ + ┌────────────────────────────────────┐ + │ document-indexing module │ agnostic SDK + │ (crates/indexing + crates/ffi) │ any Logos app can drop in + └──┬───────────────┬──────────────┬──┘ + ▼ ▼ ▼ + storage_module delivery_module LEZ registry + (Codex) (nwaku) (SPEL guest) + + Permissionless batch CLI + (crates/batch-anchor) + subscribe → dedup → batch tx +``` + +### On-chain registry: LEZ program (justified) + +The brief allows either an LEZ program or a direct zone-SDK consensus inscription. I chose the **LEZ program approach** for two reasons documented in [ADR-004](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/004-lez-program-vs-zone-sdk.md): + +1. **Trust model.** Decentralised sequencers for zones are not yet shipped (acknowledged in the brief). A zone-SDK path would require a single designated actor to perform consensus inscription, introducing exactly the centralised dependency this prize is designed to avoid. +2. **Permissionlessness.** A program holds the dedup state on chain, queryable by any client, mutable only via signed `index_batch` calls. Any party with a funded LEZ wallet can call `index_batch`. No designated relayer, no off-chain coordinator. + +The program is a single SPEL `#[lez_program]` ([source](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/methods/guest/src/bin/whistleblower_registry.rs)) with two `#[instruction]`s: `init_registry` (claim a program-derived account) and `index_batch(cids, metadata_hashes, anchor_timestamps)` (parallel vectors, `MAX_BATCH = 50`, in-guest dedup via `BTreeMap::contains_key`). Storage layout — `Registry { entries: BTreeMap }` with `CidRecord = { metadata_hash, anchor_timestamp, anchored_by, version }` — is documented in [ADR-001](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/001-registry-layout.md). Borsh encoding is deterministic, so the on-chain byte count matches the theoretical model exactly (verified live: 51 entries × 129 bytes + 4-byte length prefix = 6583 bytes). + +### Document-indexing module (agnostic) + +`crates/indexing/` is a standalone trait crate defining three traits — `StorageClient`, `DeliveryClient`, `RegistryClient` — plus the canonical `Envelope` schema and a retry helper. There are **zero `whistleblower::` references** in the public API: `grep -rn whistleblower crates/indexing/src/` returns nothing. Any other Logos application that needs the upload → broadcast → anchor pipeline can drop this crate in as a single dependency. + +The envelope wire format is locked at `v=1` and documented in [ADR-002](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/002-envelope-schema.md). Required fields: `cid`, `metadata_hash: "v1:"`, `timestamp`. Optional: `title`, `description`, `content_type`, `size_bytes`, `tags`. `metadata_hash` is a SHA-256 over the **canonical** discovery-metadata JSON with a fixed key order, so any two implementations produce byte-identical hashes for identical inputs. + +### Permissionless batch CLI + +`crates/batch-anchor/` is a real CLI, not a mock. Its lifecycle: + +1. **Seed dedup set from on chain.** At startup, query the registry account, decode the Borsh state, populate the in-process `known` set. This is the resume cursor — the on-chain registry **is** the cursor; no local state file required. +2. **Catch up via Delivery store-protocol.** The nwaku REST adapter (`delivery/nwaku.rs`) calls the `/store/v3/messages` endpoint with a 24-hour lookback window, decodes each envelope, and accumulates new CIDs. +3. **Subscribe to live topic.** Maintains a long-poll subscription to `/whistleblower/1/document-broadcast/json`. +4. **Drain & flush.** Loop: drain incoming envelopes, drop duplicates, flush in batches up to `MAX_BATCH = 50`. Each flush calls `spel send-transaction` with the registry program ID. + +If the process is killed mid-flight, step 1 picks up exactly where it left off. Re-broadcasting a CID is silently dropped at the buffer layer; in-guest dedup is the final safety net via `BTreeMap::contains_key`. + +### What I tried that didn't work + +- **wallet/spel storage-format mismatch.** The `wallet` binary built from main HEAD wrote a `storage.json` schema incompatible with `spel` v0.2.0-rc3. Initial deploys silently produced `InvalidSignature` errors. Fix: rebuilt wallet at `git checkout v0.2.0-rc3` to align all three binaries. +- **risc0 ↔ ruint version skew.** The latest `ruint 1.18.0` requires rustc 1.90, but the risc0 toolchain pins 1.88. Fix: pinned `ruint = "=1.17.0"` in `methods/guest/Cargo.toml`. +- **SPEL macro requires `serde`.** The `#[lez_program]` macro emits code that references `::serde`, but doesn't pull serde into scope. Fix: added the dep explicitly in `methods/guest/Cargo.toml`. +- **`SpelError::custom` type ambiguity.** Several `.into()` calls on string errors produced E0283 "type annotations needed". Fix: replaced with `.to_string()` everywhere in the guest source. +- **`spel` CLI `Vec` arg parsing.** The released CLI parses `--cids a,b,c` as a single comma-containing string, not a vector. Fix: switched to the spel fork at `edenbd1/spel` (branch `cli-vec-string`, commit `fbbffd3`) which uses flag repetition (`--cids a --cids b`). Filed as a draft issue in `docs/BUGS_FILED.md`. + +These are all real-world rough edges of building against a young toolchain; documenting them is part of the deliverable. + +### Why the Logos stack is the right fit + +- **Logos Storage** stores opaque bytes without binding to an uploader identity. The CID is a content hash, not a routing pointer. Takedown requires breaking SHA-256 — i.e. impossible. +- **Logos Delivery (nwaku)** propagates the metadata envelope peer-to-peer in real time. Any subscriber to the topic learns about a new document seconds after upload, without any hosted index. The store-protocol gives a 24h replay window for late-joining subscribers. +- **LEZ** lets the registry be permissionless: anyone with a funded wallet can call `index_batch`, with zero-knowledge proofs of correct execution making the dedup invariant verifiable by third parties. + +A centralised alternative (S3 + Postgres + Kafka) would collapse all three of those properties into a single point of takedown, a single point of identity binding, and a single trust root for the dedup set. Whistleblower exists to demonstrate that the Logos stack already removes that single point — end-to-end, with no mocks. + +## Success Criteria Checklist + +### Functionality + +- [x] **Upload to Logos Storage, return CID** — CLI: `crates/batch-anchor/src/storage/codex.rs` (real Codex REST). Plugin: `app/whistleblower/src/backend.cpp::publish()` (via `storage_module.uploadUrl` through `LogosAPIClient`). Confirmed on public testnet: real CID anchored at index 0 of the registry. +- [x] **Broadcast metadata envelope immediately after upload** — `Envelope` struct: `crates/indexing/src/envelope.rs::Envelope` carries `cid`, `metadata_hash: "v1:"`, `timestamp`, plus optional `title`, `description`, `content_type`, `size_bytes`, `tags`. Topic: `/whistleblower/1/document-broadcast/json`. "Immediately after": `backend.cpp::publish()` lines 161-176 — `send` runs inside the `storageUploadDone` event handler, so broadcast is the next thing that happens once the CID is in hand. +- [x] **Optional on-chain anchor action distinct from upload** — Plugin: separate "Anchor on-chain" button in `qml/Main.qml`. Backend: `WhistleblowerBackend::anchorLast()` is a separate slot from `publish()`. CLI: `batch-anchor anchor` is a distinct subcommand from `batch-anchor publish`. +- [x] **Permissionless batch CLI: subscribe + accumulate + batch tx + permissionless + idempotent** — `crates/batch-anchor/src/cmd/watch.rs`. Real nwaku REST subscribe + drain (`delivery/nwaku.rs`), zero mock-delivery shortcuts. Permissionless: no privileged-signer check anywhere in the program; any funded wallet can call `index_batch`. Idempotency: in-guest `Registry::try_insert` (silent skip on duplicate) **plus** on-chain-seeded `BatchBuffer::known` (skip before publishing). +- [x] **On-chain registry chosen + justified, stores `(CID, metadata_hash, anchor_timestamp)`, queryable by CID, ≥10 CIDs/batch** — LEZ program chosen, justified in [ADR-004](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/004-lez-program-vs-zone-sdk.md). State: `Registry { entries: BTreeMap }`. Lookup: `batch-anchor lookup `. `MAX_BATCH = 50 ≥ 10`. **Confirmed live**: 50-CID `index_batch` tx `2af12289409c55e8cee1ac172c35da518c0576e83a2ffaac7c8a67978209d531` anchored 50 CIDs atomically. +- [x] **Document-indexing module — extracted, agnostic, reusable** — `crates/indexing/`. `grep -rn whistleblower crates/indexing/src/` returns 0 hits. Public API documented in `lib.rs` doc headers. Three traits + envelope + retry helper. + +### Usability + +- [x] **Basecamp app GUI: local build instructions + downloadable assets + loadable in Basecamp** — `app/whistleblower/README.md` covers both framework (`LOGOS_MODULE_BUILDER_ROOT` + `cmake` + `lgx`) and standalone (`brew install qt` + `cmake`) build paths. Downloadable asset: `whistleblower-0.1.0-darwin-arm64.lgx` (489 KB) attached to GitHub release v0.1.0-rc1. Plugin built against the real `logos-cpp-sdk` (cloned + linked, not stubbed); `nm -gU` confirms genuine `LogosAPIClient`, `LogosAPI::*` metatype symbols. +- [x] **Document-indexing module as library/SDK with README** — `crates/indexing/src/lib.rs` doc headers describe the three traits, the envelope schema, the retry helper, and integration steps. Cross-referenced from `app/whistleblower/README.md`. +- [x] **IDL via SPEL** — `idl/whistleblower_registry.json` committed. Regenerated by `make idl` → `spel generate-idl methods/guest/src/bin/whistleblower_registry.rs`. + +### Reliability + +- [x] **Upload retries with exponential back-off, clear error on exhaustion** — `crates/batch-anchor/src/storage/codex.rs::post_bytes` wraps `post_once` in `indexing::with_retry` with the default 5-attempt 100ms → 1.6s exponential backoff. Retries on 5xx / 408 / 429; gives up cleanly on 4xx. Exhaustion surfaces as `IndexingError::Backend("upload retries exhausted: ...")`. Unit test `upload_exhausts_retries_against_unreachable_endpoint` pins the exhausted-error shape. +- [x] **Delivery broadcast deduplicated** — Two layers: `BatchBuffer::push` returns false on duplicate (silent drop) before publishing; in-guest `Registry::try_insert` returns `Ok(false)` on duplicate as a final safety net. Re-broadcasting the same CID never grows the registry. +- [x] **Batch tool resumes after interrupt** — `seed_from_chain` at startup populates the dedup set from the on-chain registry; `catch_up_from_store` then replays the last 24 hours of broadcasts via nwaku's store protocol. Kill-9-restart cannot double-anchor; the on-chain registry **is** the resume cursor. + +### Performance + +- [x] **CU cost for 1-CID and 50-CID batches on devnet/testnet** — Measured live 2026-05-22 on the public testnet: + | Operation | Time (real proof, n=1 stage) | Notes | + |---|---|---| + | `init_registry` | 3.30 ms | One-time setup | + | `index_batch` n=1 | 4.12 ms | Baseline | + | `index_batch` n=50 | 36.27 ms | **0.73 ms/CID amortised — 5.6× win over serial** | + | `index_batch` n=1 (51 prior entries) | 51.74 ms | State-dependent re-encode cost | + + Full methodology in [`docs/BENCHMARKS.md`](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/BENCHMARKS.md). + +### Supportability + +- [x] **Registry deployed and tested on LEZ devnet/testnet** — Deployed on `https://testnet.lez.logos.co` 2026-05-23. Six public tx hashes recorded in [`docs/DEPLOYMENT.md`](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/DEPLOYMENT.md): + - account-init: `dd55dd1e5b754fb975f7b5e523bee1cc361aee78e56f904d1f152ff1747b97f0` + - pinata claim: `40b7966dd494645d7eaa2669ccbd734e254aecf6a359160508c7ff42707476b4` + - deploy: `9e499b12781422f445d0e425f0b7499d4c975d3f96e12c9c0c35afb3dba48c8a` + - init_registry: `ae57ff1bf480c949af23a1ae53592abbe3c44240632364fce0dc7624e0b131d9` + - index_batch n=1: `1257c61c3ddff0ec083ef4756a81b28bc058ba55a11b147ef41ba3275edef55b` + - index_batch n=50: `2af12289409c55e8cee1ac172c35da518c0576e83a2ffaac7c8a67978209d531` + + Registry PDA `A9ewyji3THdFGqLAtAd9GkoPX9B9R6yb5LZCfWLxbAeH` holds 6583 bytes of Borsh-encoded state. Each tx independently queryable via `getTransaction` JSON-RPC on the public sequencer or the public block explorer `https://explorer.testnet.lez.logos.co`. + +- [x] **E2E tests against LEZ sequencer in CI** — `.github/workflows/e2e.yml` spawns nwaku + storage via docker-compose, installs risc0 + spel + wallet, brings up a localnet sequencer, deploys the guest, runs `cargo test --features live-lez --test e2e_anchor` (50-CID round-trip). Asserts the `RISC0_DEV_MODE=0` banner is in stdout. Schedule-triggered + `workflow_dispatch` (the push trigger is off during the CI-toolchain shakedown to avoid noise — schedule + dispatch are the canonical evidence path). +- [x] **CI green on default branch** — `.github/workflows/ci.yml` (fmt + clippy + workspace tests). Verified green via `gh run list --branch main --workflow ci`. +- [x] **README covers build + deployment addresses + Basecamp + batch tool + registry queries** — Root [`README.md`](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/README.md) Quickstart + `app/whistleblower/README.md` + the `batch-anchor lookup` example under "Just the headless CLI". Deployment addresses recorded in [`docs/DEPLOYMENT.md`](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/DEPLOYMENT.md). +- [x] **Reproducible demo script with `RISC0_DEV_MODE=0`** — [`scripts/demo.sh`](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/scripts/demo.sh) line 17: `export RISC0_DEV_MODE=0`. First non-banner stdout line echoes the value (`▶ RISC0_DEV_MODE = 0`). The whole verify-and-prove flow runs in under 30 seconds against the public testnet from a clean clone using only `curl`, `python3`, `jq`, and `cargo`. +- [x] **Recorded narrated video showing `RISC0_DEV_MODE=0` in terminal** — https://youtu.be/J7eCklx3gEg — architecture overview, live `demo.sh` run with the `RISC0_DEV_MODE=0` banner visible on screen, public explorer walkthrough of the deploy + n=50 batch txs, code-repo tour, and the `.lgx` release asset. + +## FURPS Self-Assessment + +### Functionality + +Supported operations: file upload to Logos Storage (Codex REST), envelope broadcast over Logos Delivery (nwaku REST), batched on-chain anchoring via a SPEL LEZ program (`init_registry` + `index_batch`), on-chain CID lookup, registry list-all, plugin-mode file picker + anchor button. All six functional criteria (F1–F6) confirmed live on the public testnet. No content moderation, no allowlist, no privileged signer — by design, per the brief's permissionlessness clause. + +### Usability + +- **Basecamp plugin** ships as a `.lgx` bundle (489 KB) on the GitHub release page; drop into `~/Library/Application Support/Logos/LogosBasecampDev/plugins/whistleblower/` and Basecamp's `PluginLoader` picks it up. +- **CLI** has a `doctor` subcommand that checks the nwaku endpoint, Codex endpoint, and sequencer connectivity in one pass — runs the same diagnostics an evaluator would run by hand. +- **Demo script** (`scripts/demo.sh`) is the canonical "convince yourself it works in 30 seconds" entry point. Six explicit steps, each labelled, output is human-readable. +- **Docs** are layered: 1-line README pitch → 1-page SPEC_COMPLIANCE map → per-file ADRs for the three decisions that matter (registry layout, envelope schema, CI architecture). + +### Reliability + +- **Upload retries** — 5-attempt exponential back-off with 4xx-vs-5xx discrimination (test-pinned). +- **Dedup at two layers** — buffer-side `BatchBuffer::push` and in-guest `Registry::try_insert`. Either catches re-broadcasts. +- **Resumability** — on-chain registry is the resume cursor; no local state file means there's nothing to corrupt or to lose. +- **Real proofs** — `RISC0_DEV_MODE=0` enforced in `scripts/demo.sh` and asserted by the e2e CI workflow. Every on-chain tx hash linked in `docs/DEPLOYMENT.md` was generated under real-proof mode. + +### Performance + +- `index_batch` n=50 = 36.27 ms = 0.73 ms/CID amortised — a **5.6× win** over n=50 serial individual calls. +- `init_registry` is a one-time 3.30 ms operation. +- State-dependent re-encode cost: n=1 jumps from 4.12 ms (empty registry) to 51.74 ms (51 prior entries). This is the Borsh re-encoding of the full registry on each write — a documented characteristic of the chosen state layout, not a bug. ADR-001 discusses the alternative (segmented Merkle layout) and the trade-off. + +### Supportability + +- **30+ unit tests** across `registry-core` (11), `indexing` (16), `batch-anchor` (29), `ffi` (4), plus a `live-lez`-gated e2e integration test in `batch-anchor/tests/e2e_anchor.rs`. +- **Three CI workflows**: `ci.yml` (fmt + clippy + tests, runs on every push), `e2e.yml` (spawns the full stack, runs the live-lez gated round-trip, asserts `RISC0_DEV_MODE=0`), `verify-deployment.yml` (nightly read-only against the deployed program). +- **`docs/`** has 4 ADRs, the spec compliance map, the deployment record with explorer links, the CU benchmarks, the live-validation reproduction recipe, and the bug list filed against upstream Logos tooling. +- **Structured logging** via `tracing` throughout the workspace. `RUST_LOG=debug` surfaces every retry, every nwaku poll, every tx submission. +- **Codebase layout** — five crates with one job each (`registry-core` types, `indexing` traits, `batch-anchor` CLI, `ffi` JSON ABI, `methods/guest` SPEL program). One swap point per layer. + +## Supporting Materials + +- **Narrated demo video** — https://youtu.be/J7eCklx3gEg (architecture → live demo → public explorer walkthrough → code tour → release asset, RISC0_DEV_MODE=0 visible on screen) +- **Live public-testnet deployment** — https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/DEPLOYMENT.md (6 tx hashes + explorer links) +- **Per-criterion compliance map** — https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/SPEC_COMPLIANCE.md (every criterion → code, test, or tx hash) +- **CU benchmarks (measured live)** — https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/BENCHMARKS.md +- **ADRs** — [registry layout](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/001-registry-layout.md), [envelope schema](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/002-envelope-schema.md), [CI architecture](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/003-ci-with-sequencer.md), [LEZ program vs zone SDK choice](https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/decisions/004-lez-program-vs-zone-sdk.md) +- **Basecamp `.lgx` plugin** — https://github.com/edenbd1/lp-0017-whistleblower/releases/tag/v0.1.0-rc1 (489 KB, SHA-256 `55453853110b944c5f714b9687246e6e9f7b92b9099dace05c7ee4e3bf90bfd0`) +- **Public block explorer** — https://explorer.testnet.lez.logos.co (each tx hash from `docs/DEPLOYMENT.md` opens directly) +- **Logos-tooling issues filed** — https://github.com/edenbd1/lp-0017-whistleblower/blob/main/docs/BUGS_FILED.md (three issues prepared against upstream `spel`, `wallet`, and risc0 toolchain integration) + +## Terms & Conditions + +By submitting this solution, I confirm that I have read and agree to the [Terms & Conditions](../TERMS.md).