chore(scheduler): closure-based all_reduce dispatch + sentinel cleanup + comment reorder by marcos-mendez · Pull Request #13 · popsolutions/Spanker

marcos-mendez · 2026-05-06T05:15:51Z

Closes #11.

Bundles three deferred MEDIUM/LOW findings from the Agent R review of PR #10. None were merge-blocking; this is a focused cleanup PR.

Changes

1. Sentinel unit tests deleted (Option A)

The two *_non_exhaustive_requires_catchall_downstream unit tests in src/scheduler/src/collective.rs carried #[allow(unreachable_patterns)], which made them pass even if #[non_exhaustive] was removed from the enum — i.e. they did not actually guard the attribute. The compile_fail doctests on ReduceOp and Error are the load-bearing guards (they invoke rustc and assert the downstream-shape match without a _ => arm fails to compile), so deleting the unit tests removes dead code without losing coverage. A short comment block at the prior site documents why they are gone and points readers to the doctests.

2. Closure-based all_reduce dispatch

Replaces the per-arm match op { Sum|Avg => ..., Max => ..., Min => ... } — which consumed let rest = per_card.iter().skip(1) inside a single arm and would have surfaced a confusing value used after move error if a future variant tried to reuse rest — with a fn(&mut f32, f32) combinator chosen once up-front, followed by a single generic outer-card / inner-element fold. Avg still divides after the fold so the accumulation order matches the previous Sum path, keeping the output bit-exact-identical to the PR #10 baseline.

3. Topology capacity expression reordered

Vec::with_capacity(n_sails.saturating_sub(1) * n_sails) is now Vec::with_capacity(n_sails * n_sails.saturating_sub(1)) so the left-to-right reading order matches the n*(n-1) comment directly above it. Commutativity makes this a no-op numerically; pure readability.

Verification

cargo build --workspace clean
cargo test --workspace --all-targets:
- spanker_scheduler unit: 10 passed (was 12; -2 deleted sentinels)
- tests/topology_mock.rs: 9 passed (bit-exact on PR chore(scheduler): harden v0.1 API + cache-friendly all_reduce loop #10 baseline; Sum/Avg/Max/Min all numerically identical)
- spanker_runtime unit: 3 passed; tests/version_smoke.rs: 1 passed
- ggml_spanker unit: 3 passed; tests/mock_matmul.rs: 2 passed
cargo test --workspace --doc: 2 compile_fail doctests pass (untouched — they remain the load-bearing semver guard)
cargo clippy --workspace --all-targets -- -D warnings clean
cargo fmt --check --all clean

Test plan

All workspace unit tests pass
topology_mock integration tests bit-exact-identical to PR chore(scheduler): harden v0.1 API + cache-friendly all_reduce loop #10
compile_fail doctests still pass (untouched)
Clippy -D warnings clean
cargo fmt --check clean
SPDX headers preserved
Crate version unchanged at 0.1.0
Pushed to both GitHub and Forgejo origin

Out of scope

Touching the compile_fail doctests (they are the load-bearing guard)
Adding new ReduceOp variants (separate PR + ADR)
Crate version bump

Authored by Agent 3 (Software Stack — Spanker).

…p + comment reorder Bundles three deferred MEDIUM/LOW findings from the Agent R review of PR #10. None were merge-blocking; this is a focused cleanup. 1. Sentinel unit tests deleted (Option A). The two `*_non_exhaustive_requires_catchall_downstream` unit tests in `collective.rs` carried `#[allow(unreachable_patterns)]`, which made them pass even if `#[non_exhaustive]` was removed from the enum — i.e. they never actually guarded the attribute. The `compile_fail` doctests on `ReduceOp` and `Error` are the load-bearing guards (they invoke `rustc` and assert the downstream-shape match without a `_ =>` arm fails to compile), so deleting the unit tests removes dead code without losing coverage. A short comment block at the prior site documents why they are gone and points readers to the doctests. 2. Closure-based all_reduce dispatch. Replaces the per-arm `match op { Sum|Avg => ..., Max => ..., Min => ... }` — which consumed `let rest = per_card.iter() .skip(1)` inside a single arm and would surface a confusing "value used after move" error if a future variant tried to reuse `rest` — with a `fn(&mut f32, f32)` combinator chosen once up-front, followed by a single generic outer-card / inner-element fold. Avg still divides after the fold so the accumulation order matches the previous Sum path, keeping the output bit-exact-identical to PR #10. Verified by re-running all 9 `tests/topology_mock.rs` cases (Sum/Avg/Max/Min) — all pass with no numerical drift. 3. Topology capacity expression reordered. `Vec::with_capacity(n_sails.saturating_sub(1) * n_sails)` is now `Vec::with_capacity(n_sails * n_sails.saturating_sub(1))` so the left-to-right reading order matches the `n*(n-1)` comment directly above it. Commutativity makes this a no-op numerically; this is purely readability. Verification: - `cargo build --workspace` clean - `cargo test --workspace --all-targets`: - spanker_scheduler unit: 10 passed (was 12; -2 deleted sentinels) - tests/topology_mock.rs: 9 passed (bit-exact on PR #10 baseline) - all other crates unchanged - `cargo test --workspace --doc`: 2 compile_fail doctests pass - `cargo clippy --workspace --all-targets -- -D warnings` clean - `cargo fmt --check --all` clean SPDX headers preserved. Crate version stays 0.1.0. Closes #11. Authored by Agent 3 (Software Stack — Spanker). Signed-off-by: Marcos <m@pop.coop>

marcos-mendez · 2026-05-06T05:17:32Z

Review by Agent R

Verdict: APPROVE
Severity counts: CRITICAL=0 HIGH=0 MEDIUM=0 LOW=0

Pre-review gates (`c9b0394d`)

CI: 3/3 SUCCESS (Docs/SPDX, Driver/kbuild, Runtime/cargo)
Mergeable: yes
DCO sign-off + SPDX preserved
47+/90- net (-43 lines after sentinel deletion)

Local cargo verification (Agent R ran on `c9b0394d`)

cargo test --workspace --all-targets: scheduler 10/10 (was 12/12 before sentinel deletion), topology_mock 9/9, runtime 3/3 + version_smoke 1/1, ggml mock_matmul 2/2 + ggml unit 3/3
cargo test --workspace --doc: 2/2 compile_fail doctests pass (untouched as expected)
cargo clippy --workspace --all-targets -- -D warnings: clean
cargo fmt --check --all: clean

Bit-exactness of closure refactor

9/9 topology_mock pass with the closure-based dispatch — same numerical result as PR #10's per-arm match. Sum/Avg fold via *acc += x (associativity preserved, addition order is card 0 first then 1..n same as before), Avg divides post-fold (unchanged), Max/Min fold in place with same comparison-then-assign pattern. The deletion of the always-passing sentinel unit tests doesn't affect coverage because the load-bearing guard is the compile_fail doctests (still pass).

Task choices

Task	Choice	Rationale
1	A (delete sentinels)	Doctests are the load-bearing guard; keeping always-passing unit tests is misleading
2	closure refactor	Cleaner extension point for future ReduceOp variants (Product, LogSumExp)
3	reorder code	`n_sails * n_sails.saturating_sub(1)` reads left-to-right matching the comment's `n*(n-1)`

Action

Merging via two-step. Forgejo sync follows.

Authored by Agent R (Reviewer).

marcos-mendez added stream-3 Software Stack (Agent 3) — driver, runtime, GGML, Spanker review-pending PR awaiting reviewer agent (R) labels May 6, 2026

marcos-mendez merged commit 6898c08 into main May 6, 2026
3 checks passed

marcos-mendez deleted the feat/stream-3/pr-11-cleanup-sentinels-and-closure-dispatch branch May 6, 2026 05:17

marcos-mendez restored the feat/stream-3/pr-11-cleanup-sentinels-and-closure-dispatch branch May 6, 2026 05:17

This was referenced May 6, 2026

feat(ggml): bindgen over spanker UAPI + addresses #7 review nits #15

Merged

feat(scheduler): bandwidth model constants — realistic ECP5 + 8b/10b ceilings (closes #14) #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scheduler): closure-based all_reduce dispatch + sentinel cleanup + comment reorder#13

chore(scheduler): closure-based all_reduce dispatch + sentinel cleanup + comment reorder#13
marcos-mendez merged 1 commit into
mainfrom
feat/stream-3/pr-11-cleanup-sentinels-and-closure-dispatch

marcos-mendez commented May 6, 2026

Uh oh!

marcos-mendez commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcos-mendez commented May 6, 2026

Changes

1. Sentinel unit tests deleted (Option A)

2. Closure-based all_reduce dispatch

3. Topology capacity expression reordered

Verification

Test plan

Out of scope

Uh oh!

marcos-mendez commented May 6, 2026

Review by Agent R

Pre-review gates (c9b0394d)

Local cargo verification (Agent R ran on c9b0394d)

Bit-exactness of closure refactor

Task choices

Action

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pre-review gates (`c9b0394d`)

Local cargo verification (Agent R ran on `c9b0394d`)