Skip to content

chore(scheduler): harden v0.1 public API surface — non_exhaustive + cache-friendly all_reduce #8

@marcos-mendez

Description

@marcos-mendez

Context

Bundles three MEDIUM findings from the Agent R review of PR #6 (scheduler crate) that were deferred from merge. None blocked the merge but all three should land before any external consumer (kernel-userspace bridge, GGML backend) starts importing this crate, otherwise we eat semver pain.

Tasks

1. #[non_exhaustive] on public enums and structs (MEDIUM-2 + MEDIUM-4)

Add #[non_exhaustive] to:

  • ReduceOp (src/scheduler/src/collective.rs:24) — future variants like Product, LogSumExp, BitwiseOr are likely
  • LinkState (src/scheduler/src/intercard.rs:29) — protocol still TBD per ADR-014
  • Link struct (src/scheduler/src/intercard.rs:46) — bandwidth_gbps / latency_ns will be added when ADR-014 lands
  • Error enum (src/scheduler/src/lib.rs:43) — library Error enums almost always benefit from this

Rationale: avoids breaking-change semver bumps when these types grow. Cost: zero runtime; downstream match arms get a forced _ => catch-all.

2. Cache-friendly all_reduce_f32 loop order (MEDIUM-3)

In src/scheduler/src/collective.rs:114, the for i in 0..stride outer loop strides across n_sails separate Vec allocations per element. For a 4 GiB weight shard (n_sails=4) this is cache-hostile.

Suggested transposition:

  • Hoist the match op outside the per-element loop (compute closure or function pointer once per call)
  • Outer loop = card; inner loop = element (so each card's contiguous Vec is read sequentially before moving to the next)
  • Add a doc note on the AllReduce trait that callers provide per-card contiguous buffers — fixes the layout contract for the real-device impl too

This is mock-only today but the trait signature is the public contract, so getting the layout intent into the doc now avoids a rewrite in PR #6b.

3. (Bonus) Inline doc on graph capacity arithmetic (LOW-5)

src/scheduler/src/topology.rs:84: Vec::with_capacity(n_sails.saturating_sub(1) * n_sails) — add a one-line // n*(n-1) directed edges in a fully-meshed graph comment.

Acceptance

  • All three changes in one PR titled chore(scheduler): harden v0.1 public API surface
  • New unit test that asserts a match ReduceOp without _ => arm fails to compile (use #[deny(non_exhaustive_omitted_patterns)] or a doctest demonstrating semver-evolution)
  • cargo bench (if added) shows the loop transposition improves throughput on n_sails=4, stride=1M; if no bench yet, document expected cache behavior in the commit message

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stream-3Software Stack (Agent 3) — driver, runtime, GGML, Spanker

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions