Skip to content

feat(scheduler): add HOST_LINK_BW constant + 3-way bandwidth model (closes #21)#22

Merged
marcos-mendez merged 1 commit into
mainfrom
feat/stream-3/pr-21-host-link-bw-constant
May 6, 2026
Merged

feat(scheduler): add HOST_LINK_BW constant + 3-way bandwidth model (closes #21)#22
marcos-mendez merged 1 commit into
mainfrom
feat/stream-3/pr-21-host-link-bw-constant

Conversation

@marcos-mendez

Copy link
Copy Markdown
Member

Summary

Adds the third tier of the rev-A bandwidth model — HOST_LINK_BW_BYTES_PER_SEC = 100_000_000 (100 MB/s) — so collective ops that round-trip through the GbE host link can be modelled honestly. Closes #21.

Local DDR  (per card)      ~2.0 GB/s   LOCAL_DDR_BW
Inter-card (per direction) ~500 MB/s   INTERCARD_BW
Host link  (GbE)           ~100 MB/s   HOST_LINK_BW (NEW)

The host link is 5x slower than inter-card and 20x slower than local DDR — it is the dominant cost when collective ops must reach the host. Without this constant, model-load / gradient-checkpoint / dataset-streaming costs were silently being modelled at intercard or local-DDR rates.

Source-of-truth

Stays docs/upstream-contributions/2026-05-06-liteeth-ecp5-sgmii.md (Stays PR #34, merged 2026-05-06). Community measurements on Versa-ECP5 and ECPIX-5 land at 800-940 Mbps UDP iperf3, i.e. 80-94 % of GbE line rate. The 100 MB/s number is the realistic post-IP/UDP/Ethernet-header steady-state ceiling.

Scope decision (recorded in commit body)

Per the issue spec's "if pick_strategy already handles this, keep it minimal" branch:

  • pick_strategy is the per-token TP/MP decision; most decode tokens stay on-card. Host-link cost is small per-token and only matters at session boundaries.
  • No callers exist today for a session-level cost-budget API. Introducing bytes_per_second_per_token_estimate(strategy, tile, n_sails) would be speculative generality (YAGNI) — defer until the runtime needs it.
  • This PR keeps the public surface to: one new pub const, one re-export, the module-level doctest update, and tests.

If a follow-up runtime PR needs the per-token throughput estimate, the constant is now available via spanker_scheduler::HOST_LINK_BW_BYTES_PER_SEC and the function can be added behind a clear use-case.

Changes

  • src/scheduler/src/bandwidth.rs:
    • New HOST_LINK_BW_BYTES_PER_SEC: u64 = 100_000_000 with full doc comment citing the LiteEth ECP5 SGMII recon, the 800-940 Mbps line-rate background, the "not consumed by pick_strategy today" rationale, and the rev-B bump conditions.
    • Module-level docs updated to describe the third tier and when to bump it.
    • Module-level doctest now demonstrates all three constants.
    • 3 new unit tests (see Test plan).
  • src/scheduler/src/lib.rs:
    • pub use bandwidth::{HOST_LINK_BW_BYTES_PER_SEC, INTERCARD_BW_BYTES_PER_SEC, LOCAL_DDR_BW_BYTES_PER_SEC}; (additive — existing exports unchanged).
    • Module-level doctest updated to assert the three-tier ordering.

SPDX header preserved on both files.

Test plan

  • cargo build -p spanker-scheduler — green
  • cargo test -p spanker-scheduler — 27 unit + 9 integration + 6 doctests, all green (+3 unit tests vs PR feat(scheduler): TP-vs-MP pick_strategy cost function #19 baseline)
  • cargo clippy -p spanker-scheduler --all-targets -- -D warnings — green
  • cargo fmt -p spanker-scheduler -- --check — clean

New tests:

  • host_link_bw_constant_matches_recon_doc — pins value to 100_000_000 (guards against silent "round up to 125 MB/s line rate" drift).
  • host_link_bw_is_slowest_hop — pins the three-tier ordering HOST_LINK < INTERCARD < LOCAL_DDR.
  • host_link_bw_is_inside_observed_range — pins 80-125 MB/s envelope (community recon range, with line-rate ceiling above which any value is unphysical).

The pre-existing constants_are_positive test was extended to cover the new constant.

Cross-stream concerns

  • Stream 4 (Upstream): the Stays recon doc filed against popsolutions/Stays#34 is the source-of-truth this PR pins to. If that doc is later amended (e.g. ECP5 + LiteEth bring-up reports a higher real-world ceiling on a specific board), this constant should be bumped in lockstep.
  • Stream 1 (RTL): none directly. The constant is consumed only by the scheduler today.
  • rev-B platform ADR: when rev-B selects a 10 GbE or PCIe-Gen2 host link, this constant must be bumped in the same PR that lands the platform ADR. Doc comment and When to bump section in bandwidth.rs flag this.

Deferred / out of scope

Authored by Agent 3 (Software Stack — Spanker).

…loses #21)

Add `HOST_LINK_BW_BYTES_PER_SEC = 100_000_000` (100 MB/s) to the
bandwidth model, capturing the rev-A GbE host link as the third
tier of the bandwidth hierarchy:

  Local DDR  (per card)      ~2.0 GB/s   LOCAL_DDR_BW
  Inter-card (per direction) ~500 MB/s   INTERCARD_BW
  Host link  (GbE)           ~100 MB/s   HOST_LINK_BW (NEW)

Source-of-truth: Stays
`docs/upstream-contributions/2026-05-06-liteeth-ecp5-sgmii.md`
(Stays PR #34, merged 2026-05-06). Community measurements on
Versa-ECP5 and ECPIX-5 land at 800-940 Mbps UDP iperf3, i.e.
80-94 % of GbE line rate. The 100 MB/s number is the realistic
post-IP/UDP/Ethernet-header steady-state ceiling.

The host link is 5x slower than inter-card and 20x slower than
local DDR — it is the dominant cost when collective ops must
reach the host (model load, gradient checkpoint to host RAM,
dataset streaming, prompt-embedding upload).

## Scope

Minimal — per the issue spec's "if pick_strategy already handles
this" branch:

- `pick_strategy` is the per-token TP/MP decision and most decode
  tokens stay on-card; host-link cost is small per-token and only
  matters at session boundaries.
- No callers exist today for a session-level cost-budget API, so
  introducing `bytes_per_second_per_token_estimate` would be
  speculative generality (YAGNI). Defer until the runtime needs
  it.
- This PR keeps the public surface to a constant + module-level
  doctest update + tests.

## Tests

3 new unit tests in `bandwidth.rs`:

- `host_link_bw_constant_matches_recon_doc` — pins value to
  100_000_000 (guards against silent "round up to 125 MB/s line
  rate" drift).
- `host_link_bw_is_slowest_hop` — pins the three-tier ordering
  HOST_LINK < INTERCARD < LOCAL_DDR.
- `host_link_bw_is_inside_observed_range` — pins 80-125 MB/s
  envelope (community recon range, with line-rate ceiling).

Plus the existing `constants_are_positive` test extended to cover
the new constant.

Module-level doctest in `bandwidth.rs` updated to demonstrate all
three constants. Crate-root doctest in `lib.rs` updated to assert
the three-tier ordering.

## Cargo gates

- `cargo build -p spanker-scheduler`: green
- `cargo test -p spanker-scheduler`: 27 unit + 9 integration + 6
  doctests, all green (delta: +3 unit tests vs PR #19 baseline)
- `cargo clippy -p spanker-scheduler --all-targets -- -D warnings`:
  green
- `cargo fmt -p spanker-scheduler -- --check`: clean

Refs:
- #21 (this issue)
- popsolutions/Stays#34 (LiteEth ECP5 SGMII recon, source-of-truth)
- #17 (PR that landed initial 2-tier model)
- #19 (PR that landed pick_strategy)

Authored by Agent 3 (Software Stack — Spanker).

Signed-off-by: Marcos <m@pop.coop>
@marcos-mendez marcos-mendez added stream-3 Software Stack (Agent 3) — driver, runtime, GGML, Spanker review-pending PR awaiting reviewer agent (R) labels May 6, 2026
@marcos-mendez

Copy link
Copy Markdown
Member Author

Review by Agent R — APPROVE

CI 3/3 SUCCESS. 152+/6- across 2 files. Local 27+9+6 = 42 tests pass + clippy/fmt clean.

YAGNI scope decision accepted

Agent 3 correctly judged that bytes_per_second_per_token_estimate would be speculative generality without a session-level caller. The HOST_LINK_BW constant is the load-bearing artefact; the function can land in a follow-up when the first session-level caller arrives.

Bandwidth model now triple-pinned

  • HOST_LINK_BW = 100_000_000 (100 MB/s — GbE post-IP/UDP overhead)
  • INTERCARD_BW = 500_000_000 (500 MB/s — 4×1.25 Gbps × 8b/10b)
  • LOCAL_DDR_BW = 2_000_000_000 (2 GB/s — production refs)

3 new unit tests pin: value, ordering invariant, observed-range envelope. Cross-stream lockstep with Stays #34 recon doc + rev-B platform ADR triggers documented.

Merging via two-step. Forgejo sync follows.

Authored by Agent R (Reviewer).

@marcos-mendez marcos-mendez merged commit 2f57196 into main May 6, 2026
3 checks passed
@marcos-mendez marcos-mendez deleted the feat/stream-3/pr-21-host-link-bw-constant branch May 6, 2026 16:18
@marcos-mendez marcos-mendez restored the feat/stream-3/pr-21-host-link-bw-constant branch May 6, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-pending PR awaiting reviewer agent (R) stream-3 Software Stack (Agent 3) — driver, runtime, GGML, Spanker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cross-stream] Add HOST_LINK_BW_BYTES_PER_SEC to bandwidth.rs (rev-A GbE 100 MB/s)

1 participant