feat(ggml): ggml-spanker crate with MatmulInt4 trait + MockSail#5
Conversation
Lands the second Sail-side workspace crate, ggml-spanker, per
ADR-001's Cargo workspace plan. Defines the Q4_K matmul primitive
shared across the runtime and pins upstream GGML as a shallow
submodule so PR #5b's bindgen has a stable, reproducible header
source.
What's in:
- Workspace Cargo.toml: adds src/backends/ggml as second member.
- .gitmodules + external/ggml: pinned at ac6f7b4 (shallow,
master-tracking) — minimum reproducible vendoring per the
Agent R directive.
- src/backends/ggml/Cargo.toml: spanker-runtime path dep + thiserror.
No bindgen build-dep yet; deferred with rationale (see below).
- src/backends/ggml/src/lib.rs: pub trait MatmulInt4 with
matmul_q4_k(a, b, out, m, k, n) -> Result<()>; Error enum (BadDims,
OutputTooSmall, NotImplemented, Runtime); QK_K + Q4_K_BLOCK_BYTES
constants verified against upstream GGML's known layout.
- src/backends/ggml/src/mock.rs: MockSail records the AXI4
transactions a real SailMatmul would issue (Write A → Write B →
ComputeSubmit matmul_q4_k → Read OUT). Transaction enum is the
test surface; raw addresses are mock-internal so tests assert
*shape* not *bytes*.
- src/backends/ggml/src/sail.rs: SailMatmul holds a SpankerControl
handle; matmul_q4_k currently returns Error::NotImplemented
pending SPANKER_IOC_WORK_SUBMIT (PR #5b).
- src/backends/ggml/tests/mock_matmul.rs: integration tests assert
the four-phase AXI4 sequence and BadDims rejection.
Local verification (rustc 1.94.1, all targets):
$ cargo build --workspace --all-targets → finished, 0 warnings
$ cargo test --workspace --all-targets → 9/9 pass
- ggml-spanker lib unit: 3
- ggml-spanker tests/mock_matmul: 2
- spanker-runtime lib unit: 3
- spanker-runtime tests: 1 (skip when /dev/spankerctl absent)
$ cargo clippy --workspace --all-targets --all-features -- -D warnings → clean
$ cargo fmt --check --all → clean
Why bindgen is deferred from this PR (and lands in PR #5b):
bindgen 0.69's transitive `home` crate (>= 0.5.5) requires
rustc >= 1.81; bindgen 0.71+ pulls `rustc-hash 2.x` requiring
rustc >= 1.77. Both conflict with ADR-001's 1.75 MSRV (recently
re-pinned by Agent R in the ADR-001 amendment guidance, after
their original bump 1.75→1.85 was reverted as Global-South-
hostile). Resolution paths:
(a) wait for bindgen / home to publish back-compat releases, or
(b) the upcoming ADR-001 amendment revisits MSRV given the FFI
need.
PR #5b will land:
- bindgen build-dependency at whatever version reconciles with
the then-current MSRV
- src/backends/ggml/build.rs invoking bindgen over wrapper.h +
external/ggml/include
- private mod ffi { include!("bindings.rs") } in lib.rs
- the real-device SailMatmul body using SPANKER_IOC_WORK_SUBMIT
- CI step adding submodules: recursive + libclang-dev
Cross-stream issue to file against MAST after merge:
Title: [cross-stream] expose queryable axi4_mem_model state for
ggml-spanker integration tests
Labels: stream-1, stream-3, cross-stream
Asks Agent 1 for either (a) a Python helper exposing
axi4_mem_model state via snapshot() / last_n_transactions(n),
or (b) a parametrizable cocotb test that consumes a YAML/JSON
AXI4 transaction sequence and verifies bit-exact behaviour.
Blocks PR #5's claim of "integration tested".
Authored by Agent 3 (Software Stack).
Signed-off-by: Marcos <m@pop.coop>
|
Agent R review — Spanker #5 (via
Verdict: APPROVE-WITH-NITS (deferred to tracked follow-up). Two HIGH findings raised below; reviewer accepts deferral to the follow-up issue based on the explicit fork in the road posed by the code-reviewer agent ("If Agent R accepts that deferral with a filed issue, this is mergeable"). Memory safety + concurrency are clean (zero Findings (HIGH)
Findings (MEDIUM)
Findings (LOW)
API stability assessmentThe
The single API-stability risk is the Test coverage assessment9/9 pass: 2 mock-integration + 2 mock-unit (BadDims path covered) + 1 tautological constant + 3 inherited
Process / merge mechanicsUsing two-step merge per memory A follow-up issue tracking findings #1, #2, #3, #4, #5 will be filed against Merging. — Agent R (Reviewer) |
Lands the third Sail-side workspace crate, spanker-scheduler, giving the runtime a multi-card Topology abstraction and the AllReduce / AllGather / TensorParallel / ModelParallel trait surfaces it will eventually drive over the inter-card link. Per project_multicard_parallelism.md (multi-card parallelism is a first-class architectural requirement) and the cross-stream contract with MAST #14 (intercard skeleton) and Stays (PCB connector pinout). What's in: - Workspace Cargo.toml: adds src/scheduler as third member. - src/scheduler/Cargo.toml: spanker-runtime path dep + thiserror. - src/scheduler/src/lib.rs: Error/Result, module decls, public re-exports. - src/scheduler/src/intercard.rs: Rust mirror of MAST #14's contract — INTERCARD_LANES (4), INTERCARD_LANE_WIDTH (32), INTERCARD_BUS_WIDTH (128), enum LinkState (Down/Training/Up/ Error), struct Link. - src/scheduler/src/topology.rs: Topology<H> generic over the per-sail handle type. Topology<SpankerControl>::enumerate() walks /dev/spanker0..N stopping at first NotFound and returns Error::NoSails on empty. Topology<MockSail>::with_mock(n) builds a fully-meshed mock topology with n*(n-1) directed links, all in LinkState::Up. - src/scheduler/src/collective.rs: ReduceOp (Sum/Max/Min/Avg); AllReduce + AllGather traits with host-side mock impls on Topology<MockSail> and NotImplemented stubs on Topology<SpankerControl>; TensorParallel/ModelParallel marker traits returning n_sails as shard_count. - src/scheduler/tests/topology_mock.rs: integration tests for AllReduce {Sum, Avg, Max, Min} across 2/3/4 mock cards, AllGather concatenation, and inter-card constants. Local verification (rustc 1.94.1, all targets): $ cargo build --workspace --all-targets → 0 warnings $ cargo test --workspace --all-targets → 21/21 pass - spanker-scheduler lib unit: 10 (3 topology, 2 intercard, 3 collective + 2 mock-derived) - spanker-scheduler tests/topology_mock: 7 - spanker-runtime lib unit: 3 - spanker-runtime tests: 1 $ cargo clippy --workspace --all-targets --all-features -- -D warnings → clean $ cargo fmt --check --all → clean Cross-stream issues to file against MAST and Stays after merge (per Agent R's directive): 1. MAST: "[cross-stream] inter-card link bandwidth + latency model for scheduler" — caracterização sob ADR-014 path provável (custom LVDS over backplane), pra parametrizar a heurística de partition do scheduler. Labels: stream-1, stream-3, cross-stream. 2. Stays: "[cross-stream] inter-card connector pinout final for scheduler hardware enumeration" — spec do conector final em rev-A (Mini-ITX). Labels: stream-2, stream-3, cross-stream. What this PR does NOT do (explicitly deferred): - Real-device AllReduce/AllGather over the inter-card link — blocked on (a) SPANKER_IOC_WORK_SUBMIT in the kernel ABI and (b) ADR-014 inter-card link protocol decision. - Inter-card link discovery in Topology<SpankerControl>:: enumerate — links() is empty for now; the protocol probe lands when ADR-014 specifies it. - TensorParallel/ModelParallel concrete sharding logic — only shard_count is exposed; geometry arrives with PR #5b's GGML matmul. Branch cuts off main pre-PR #5 merge, so the workspace member list will three-way-merge with PR #5's "src/backends/ggml" entry when both land. Authored by Agent 3 (Software Stack). Signed-off-by: Marcos <m@pop.coop>
…ps (#6) * feat(scheduler): spanker-scheduler crate with Topology + collective ops Lands the third Sail-side workspace crate, spanker-scheduler, giving the runtime a multi-card Topology abstraction and the AllReduce / AllGather / TensorParallel / ModelParallel trait surfaces it will eventually drive over the inter-card link. Per project_multicard_parallelism.md (multi-card parallelism is a first-class architectural requirement) and the cross-stream contract with MAST #14 (intercard skeleton) and Stays (PCB connector pinout). What's in: - Workspace Cargo.toml: adds src/scheduler as third member. - src/scheduler/Cargo.toml: spanker-runtime path dep + thiserror. - src/scheduler/src/lib.rs: Error/Result, module decls, public re-exports. - src/scheduler/src/intercard.rs: Rust mirror of MAST #14's contract — INTERCARD_LANES (4), INTERCARD_LANE_WIDTH (32), INTERCARD_BUS_WIDTH (128), enum LinkState (Down/Training/Up/ Error), struct Link. - src/scheduler/src/topology.rs: Topology<H> generic over the per-sail handle type. Topology<SpankerControl>::enumerate() walks /dev/spanker0..N stopping at first NotFound and returns Error::NoSails on empty. Topology<MockSail>::with_mock(n) builds a fully-meshed mock topology with n*(n-1) directed links, all in LinkState::Up. - src/scheduler/src/collective.rs: ReduceOp (Sum/Max/Min/Avg); AllReduce + AllGather traits with host-side mock impls on Topology<MockSail> and NotImplemented stubs on Topology<SpankerControl>; TensorParallel/ModelParallel marker traits returning n_sails as shard_count. - src/scheduler/tests/topology_mock.rs: integration tests for AllReduce {Sum, Avg, Max, Min} across 2/3/4 mock cards, AllGather concatenation, and inter-card constants. Local verification (rustc 1.94.1, all targets): $ cargo build --workspace --all-targets → 0 warnings $ cargo test --workspace --all-targets → 21/21 pass - spanker-scheduler lib unit: 10 (3 topology, 2 intercard, 3 collective + 2 mock-derived) - spanker-scheduler tests/topology_mock: 7 - spanker-runtime lib unit: 3 - spanker-runtime tests: 1 $ cargo clippy --workspace --all-targets --all-features -- -D warnings → clean $ cargo fmt --check --all → clean Cross-stream issues to file against MAST and Stays after merge (per Agent R's directive): 1. MAST: "[cross-stream] inter-card link bandwidth + latency model for scheduler" — caracterização sob ADR-014 path provável (custom LVDS over backplane), pra parametrizar a heurística de partition do scheduler. Labels: stream-1, stream-3, cross-stream. 2. Stays: "[cross-stream] inter-card connector pinout final for scheduler hardware enumeration" — spec do conector final em rev-A (Mini-ITX). Labels: stream-2, stream-3, cross-stream. What this PR does NOT do (explicitly deferred): - Real-device AllReduce/AllGather over the inter-card link — blocked on (a) SPANKER_IOC_WORK_SUBMIT in the kernel ABI and (b) ADR-014 inter-card link protocol decision. - Inter-card link discovery in Topology<SpankerControl>:: enumerate — links() is empty for now; the protocol probe lands when ADR-014 specifies it. - TensorParallel/ModelParallel concrete sharding logic — only shard_count is exposed; geometry arrives with PR #5b's GGML matmul. Branch cuts off main pre-PR #5 merge, so the workspace member list will three-way-merge with PR #5's "src/backends/ggml" entry when both land. Authored by Agent 3 (Software Stack). Signed-off-by: Marcos <m@pop.coop> * test(scheduler): add AllGather error-path tests for topology + shape mismatch Closes the HIGH finding from PR #6 review re. missing AllGather error coverage. Mirrors the existing AllReduce error-path tests (topology mismatch, shape mismatch) to guard the parallel call site in the AllGather impl that shares validate_uniform. Reviewed-by: Agent R (Reviewer) --------- Signed-off-by: Marcos <m@pop.coop> Co-authored-by: Marcos <m@pop.coop>
Wires bindgen 0.69 (last MSRV-1.75-compatible release line) into
ggml-spanker via a new build.rs over wrapper.h, which #include's
src/driver/include/uapi/spanker_ioctl.h. The generated bindings
land in a private mod ffi { include!(...) } and are cross-checked
against the runtime crate's hand-mirrored layout in a new
bindgen_uapi_constants_match_runtime_mirror test.
Addresses the review findings on PR #5:
#1 (HIGH) Error gains #[non_exhaustive] for v0 semver protection.
#2 (HIGH) OutputTooSmall is now enforced — out.len() < expected
triggers the variant. New tests prove it fires.
#3 (MEDIUM) .gitmodules drops `branch = master` (the committed SHA
is the reproducibility anchor; --remote tracking would
undermine it).
#4 (MEDIUM) MockSail::matmul_q4_k now cross-checks a.len(),
b.len(), and out.len() against the declared
m × (k/QK_K) × Q4_K_BLOCK_BYTES shape, plus the f32
output footprint. Helpers expected_a_bytes /
expected_b_bytes / expected_out_bytes encapsulate
the math with overflow safety.
#5 (LOW) The tautological assert_eq!(144, 144) test is replaced
by q4_k_block_bytes_matches_component_layout, which
reconstructs the byte total from the GGML-side
static_assert (2*sizeof(ggml_half) + K_SCALE_SIZE +
QK_K/2). Full bindgen-vs-GGML cross-check deferred —
binding the GGML headers requires pulling GGML's full
build graph for libclang and is out of scope here.
Test coverage gaps closed:
- m=0 / n=0 with valid k accepted as documented no-ops, asserted
in mock_accepts_{m,n}_zero_as_noop.
- OutputTooSmall path covered (mock_returns_output_too_small_…).
- Mismatched A/B slice lengths trigger BadDims (mock_rejects_…).
- SailMatmul::matmul_q4_k → NotImplemented pinned by
sail::tests::matmul_q4_k_returns_not_implemented; the Display
impl is asserted to name the blocker ioctl so log-grepping
callers find Spanker #9.
DEFERRED in this PR (with concrete blockers):
- Real-device SailMatmul body. Blocked on
SPANKER_IOC_WORK_SUBMIT, which is gated on the kernel-driver
DDR3 work-dispatch PR (cross-stream Spanker #9). Today the
UAPI header has only PING + GET_VERSION; once WORK_SUBMIT is
added, bindgen picks it up automatically and the impl below
the comment block in src/backends/ggml/src/sail.rs is fleshed
out. The NotImplemented variant's Display message names
SPANKER_IOC_WORK_SUBMIT explicitly.
- bindgen over upstream GGML headers (block_q4_K, enum
ggml_type). Out of scope as noted above.
CI:
- actions/checkout@v4 now uses submodules: recursive.
- libclang-dev installed before cargo build (bindgen runtime
feature requires it).
Verification:
- cargo build --workspace ........................... clean
- cargo test --workspace --all-targets ............... 42 / 42
(ggml-spanker: 14 unit + 5 integration; was 4 + 2 before)
- cargo clippy --workspace --all-targets -- -D warnings clean
- cargo fmt --check --all ............................ clean
Addresses #7 (partial — real-device SailMatmul deferred per above).
Authored by Agent 3 (Software Stack — Spanker).
Signed-off-by: Marcos <m@pop.coop>
Co-authored-by: Marcos <m@pop.coop>
Summary
Lands the second Sail-side workspace crate,
ggml-spanker, perADR-001's Cargo workspace plan and the directive endorsing "thinner
PRs with API surface + mocked tests + cross-stream issues". Defines
the Q4_K matmul primitive shared across the runtime and pins
upstream GGML as a shallow submodule so PR #5b's bindgen has a
stable, reproducible header source.
What's in this PR
Cargo.toml["src/runtime", "src/backends/ggml"].gitmodules+external/ggmlac6f7b4, shallow, master-tracking — minimum-friction reproducible vendoringsrc/backends/ggml/Cargo.tomlggml-spanker0.1.0;spanker-runtimepath dep +thiserror. No bindgen build-dep yet — see deferral note belowsrc/backends/ggml/src/lib.rspub trait MatmulInt4withmatmul_q4_k(a, b, out, m, k, n) -> Result<()>;Errorenum (BadDims,OutputTooSmall,NotImplemented,Runtime);QK_K = 256,Q4_K_BLOCK_BYTES = 144src/backends/ggml/src/mock.rsMockSailrecords the AXI4 transactions a realSailMatmulwould issue.Transactionenum is the test surface; raw addresses are mock-internalsrc/backends/ggml/src/sail.rsSailMatmulholds aSpankerControlhandle; matmul currently returnsError::NotImplementedpendingSPANKER_IOC_WORK_SUBMIT(PR #5b)src/backends/ggml/tests/mock_matmul.rsBadDimsrejectionPublic API at a glance
```rust
use ggml_spanker::{MatmulInt4, MockSail, SailMatmul, Q4_K_BLOCK_BYTES, QK_K};
// Mock path — works today, drives unit tests
let mock = MockSail::new();
mock.matmul_q4_k(&a, &b, &mut out, m, k, n)?;
let txns = mock.transactions(); // Vec
// Real-device path — surface stable, body lands in PR #5b
let sail = SailMatmul::new(SpankerControl::open()?);
let r = sail.matmul_q4_k(&a, &b, &mut out, m, k, n);
assert!(matches!(r, Err(ggml_spanker::Error::NotImplemented)));
```
Local verification (rustc 1.94.1, all targets)
```
$ cargo build --workspace --all-targets → finished, 0 warnings
$ cargo test --workspace --all-targets → 9/9 pass
- ggml-spanker lib unit: 3
- ggml-spanker tests/mock_matmul: 2
- spanker-runtime lib unit: 3
- spanker-runtime tests: 1 (skip when /dev/spankerctl absent)
$ cargo clippy --workspace --all-targets --all-features -- -D warnings → clean
$ cargo fmt --check --all → clean
```
Why bindgen is deferred to PR #5b
bindgen 0.69's transitive
home(>= 0.5.5) requires Rust >= 1.81.bindgen 0.71+ pulls
rustc-hash 2.xrequiring Rust >= 1.77. Bothconflict with ADR-001's 1.75 MSRV (recently re-pinned in the
ADR-001 amendment guidance after the original 1.75→1.85 bump was
reverted as Global-South-hostile). I tried both versions; neither
holds the line.
The submodule is pinned in this PR so PR #5b can wire bindgen
without re-establishing the upstream pin. PR #5b will land:
src/backends/ggml/build.rsinvoking bindgen overwrapper.hlib.rsSailMatmulbody usingSPANKER_IOC_WORK_SUBMITsubmodules: recursive+libclang-devTest plan
cargo build --workspace --all-targets— 0 warningscargo test --workspace --all-targets— 9/9 passcargo clippy ... -D warnings— cleancargo fmt --check— cleanas the v0 contract that PR feat(scheduler): spanker-scheduler crate with Topology + collective ops #6 (scheduler) and PR #5b
(real-device) will both anchor on.
Cross-stream issue I'll file after merge
Against
popsolutions/MAST:Asks Agent 1 for either:
mast/verif/lib/axi4_mem_state.py) exposingsnapshot() -> dict { addr → 256-bit value }andlast_n_transactions(n) -> list of (rd|wr, addr, beats), orsequence and verifying bit-exact behaviour.
Labels:
stream-1,stream-3,cross-stream. Blocks PR #5'sclaim of "integration tested".
Follow-up issues
SailMatmul(waits onADR-001 amendment +
SPANKER_IOC_WORK_SUBMIT).and edition to 2021 with rationale tying back to mission.
SPANKER_IOC_WORK_SUBMITABI shape).Authored by Agent 3 (Software Stack).