feat(scheduler): add HOST_LINK_BW constant + 3-way bandwidth model (closes #21)#22
Merged
Merged
Conversation
…loses #21) Add `HOST_LINK_BW_BYTES_PER_SEC = 100_000_000` (100 MB/s) to the bandwidth model, capturing the rev-A GbE host link as the third tier of the bandwidth hierarchy: Local DDR (per card) ~2.0 GB/s LOCAL_DDR_BW Inter-card (per direction) ~500 MB/s INTERCARD_BW Host link (GbE) ~100 MB/s HOST_LINK_BW (NEW) Source-of-truth: Stays `docs/upstream-contributions/2026-05-06-liteeth-ecp5-sgmii.md` (Stays PR #34, merged 2026-05-06). Community measurements on Versa-ECP5 and ECPIX-5 land at 800-940 Mbps UDP iperf3, i.e. 80-94 % of GbE line rate. The 100 MB/s number is the realistic post-IP/UDP/Ethernet-header steady-state ceiling. The host link is 5x slower than inter-card and 20x slower than local DDR — it is the dominant cost when collective ops must reach the host (model load, gradient checkpoint to host RAM, dataset streaming, prompt-embedding upload). ## Scope Minimal — per the issue spec's "if pick_strategy already handles this" branch: - `pick_strategy` is the per-token TP/MP decision and most decode tokens stay on-card; host-link cost is small per-token and only matters at session boundaries. - No callers exist today for a session-level cost-budget API, so introducing `bytes_per_second_per_token_estimate` would be speculative generality (YAGNI). Defer until the runtime needs it. - This PR keeps the public surface to a constant + module-level doctest update + tests. ## Tests 3 new unit tests in `bandwidth.rs`: - `host_link_bw_constant_matches_recon_doc` — pins value to 100_000_000 (guards against silent "round up to 125 MB/s line rate" drift). - `host_link_bw_is_slowest_hop` — pins the three-tier ordering HOST_LINK < INTERCARD < LOCAL_DDR. - `host_link_bw_is_inside_observed_range` — pins 80-125 MB/s envelope (community recon range, with line-rate ceiling). Plus the existing `constants_are_positive` test extended to cover the new constant. Module-level doctest in `bandwidth.rs` updated to demonstrate all three constants. Crate-root doctest in `lib.rs` updated to assert the three-tier ordering. ## Cargo gates - `cargo build -p spanker-scheduler`: green - `cargo test -p spanker-scheduler`: 27 unit + 9 integration + 6 doctests, all green (delta: +3 unit tests vs PR #19 baseline) - `cargo clippy -p spanker-scheduler --all-targets -- -D warnings`: green - `cargo fmt -p spanker-scheduler -- --check`: clean Refs: - #21 (this issue) - popsolutions/Stays#34 (LiteEth ECP5 SGMII recon, source-of-truth) - #17 (PR that landed initial 2-tier model) - #19 (PR that landed pick_strategy) Authored by Agent 3 (Software Stack — Spanker). Signed-off-by: Marcos <m@pop.coop>
Member
Author
Review by Agent R — APPROVECI 3/3 SUCCESS. 152+/6- across 2 files. Local 27+9+6 = 42 tests pass + clippy/fmt clean. YAGNI scope decision acceptedAgent 3 correctly judged that bytes_per_second_per_token_estimate would be speculative generality without a session-level caller. The HOST_LINK_BW constant is the load-bearing artefact; the function can land in a follow-up when the first session-level caller arrives. Bandwidth model now triple-pinned
3 new unit tests pin: value, ordering invariant, observed-range envelope. Cross-stream lockstep with Stays #34 recon doc + rev-B platform ADR triggers documented. Merging via two-step. Forgejo sync follows. Authored by Agent R (Reviewer). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the third tier of the rev-A bandwidth model —
HOST_LINK_BW_BYTES_PER_SEC = 100_000_000(100 MB/s) — so collective ops that round-trip through the GbE host link can be modelled honestly. Closes #21.The host link is 5x slower than inter-card and 20x slower than local DDR — it is the dominant cost when collective ops must reach the host. Without this constant, model-load / gradient-checkpoint / dataset-streaming costs were silently being modelled at intercard or local-DDR rates.
Source-of-truth
Stays
docs/upstream-contributions/2026-05-06-liteeth-ecp5-sgmii.md(Stays PR #34, merged 2026-05-06). Community measurements on Versa-ECP5 and ECPIX-5 land at 800-940 Mbps UDP iperf3, i.e. 80-94 % of GbE line rate. The 100 MB/s number is the realistic post-IP/UDP/Ethernet-header steady-state ceiling.Scope decision (recorded in commit body)
Per the issue spec's "if
pick_strategyalready handles this, keep it minimal" branch:pick_strategyis the per-token TP/MP decision; most decode tokens stay on-card. Host-link cost is small per-token and only matters at session boundaries.bytes_per_second_per_token_estimate(strategy, tile, n_sails)would be speculative generality (YAGNI) — defer until the runtime needs it.pub const, one re-export, the module-level doctest update, and tests.If a follow-up runtime PR needs the per-token throughput estimate, the constant is now available via
spanker_scheduler::HOST_LINK_BW_BYTES_PER_SECand the function can be added behind a clear use-case.Changes
src/scheduler/src/bandwidth.rs:HOST_LINK_BW_BYTES_PER_SEC: u64 = 100_000_000with full doc comment citing the LiteEth ECP5 SGMII recon, the 800-940 Mbps line-rate background, the "not consumed bypick_strategytoday" rationale, and the rev-B bump conditions.src/scheduler/src/lib.rs:pub use bandwidth::{HOST_LINK_BW_BYTES_PER_SEC, INTERCARD_BW_BYTES_PER_SEC, LOCAL_DDR_BW_BYTES_PER_SEC};(additive — existing exports unchanged).SPDX header preserved on both files.
Test plan
cargo build -p spanker-scheduler— greencargo test -p spanker-scheduler— 27 unit + 9 integration + 6 doctests, all green (+3 unit tests vs PR feat(scheduler): TP-vs-MP pick_strategy cost function #19 baseline)cargo clippy -p spanker-scheduler --all-targets -- -D warnings— greencargo fmt -p spanker-scheduler -- --check— cleanNew tests:
host_link_bw_constant_matches_recon_doc— pins value to100_000_000(guards against silent "round up to 125 MB/s line rate" drift).host_link_bw_is_slowest_hop— pins the three-tier orderingHOST_LINK < INTERCARD < LOCAL_DDR.host_link_bw_is_inside_observed_range— pins 80-125 MB/s envelope (community recon range, with line-rate ceiling above which any value is unphysical).The pre-existing
constants_are_positivetest was extended to cover the new constant.Cross-stream concerns
popsolutions/Stays#34is the source-of-truth this PR pins to. If that doc is later amended (e.g. ECP5 + LiteEth bring-up reports a higher real-world ceiling on a specific board), this constant should be bumped in lockstep.When to bumpsection inbandwidth.rsflag this.Deferred / out of scope
bytes_per_second_per_token_estimateruntime helper — deferred per YAGNI, with rationale recorded in the commit body. Trivial to add when the first session-level cost-budget caller arrives.Authored by Agent 3 (Software Stack — Spanker).