Skip to content

scheduler: model asymmetric pipeline-MP per-card-boundary forward cost #20

@marcos-mendez

Description

@marcos-mendez

Context

PR #19 ships pick_strategy with a symmetric bandwidth-bound cost model: both TP and MP charge activation_bytes / intercard_bw (sharded by n_sails for MP). This is good enough for the rev-A capacity-planning datapoint (TinyLlama Q4_0 decode → MP) but it understates MP's cost for deep pipeline-parallel placements where every card-boundary forward step pays an intercard hop.

Proposal

For pipeline-MP across p partitions of the model, each forward step pays (p - 1) * activation_bytes / intercard_bw for the chained forwards. The current pick_strategy only counts the sharded activation transfer, not the chain.

Add either:

  • A Strategy::PipelineParallel { partitions: u32 } variant with the per-boundary cost, or
  • A more careful MP cost that scales with n_sails - 1 when the tile spans a pipeline boundary.

Acceptance

  • TinyLlama decode datapoint still picks MP (the activation chain is short on small models).
  • A 70B-class model with 8-way pipeline at the same intercard bandwidth flips toward TP because the chain cost dominates.
  • New test in decision::tests pinning the boundary where the pipeline-cost penalty flips the answer.

Follow-up references

Metadata

Metadata

Assignees

No one assigned

    Labels

    stream-3Software Stack (Agent 3) — driver, runtime, GGML, Spanker

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions