scheduler: model asymmetric pipeline-MP per-card-boundary forward cost

## Context

PR #19 ships `pick_strategy` with a symmetric bandwidth-bound cost model: both TP and MP charge `activation_bytes / intercard_bw` (sharded by `n_sails` for MP). This is good enough for the rev-A capacity-planning datapoint (TinyLlama Q4_0 decode → MP) but it understates MP's cost for deep pipeline-parallel placements where every card-boundary forward step pays an intercard hop.

## Proposal

For pipeline-MP across `p` partitions of the model, each forward step pays `(p - 1) * activation_bytes / intercard_bw` for the chained forwards. The current `pick_strategy` only counts the sharded activation transfer, not the chain.

Add either:

- A `Strategy::PipelineParallel { partitions: u32 }` variant with the per-boundary cost, or
- A more careful MP cost that scales with `n_sails - 1` when the tile spans a pipeline boundary.

## Acceptance

- TinyLlama decode datapoint still picks MP (the activation chain is short on small models).
- A 70B-class model with 8-way pipeline at the same intercard bandwidth flips toward TP because the chain cost dominates.
- New test in `decision::tests` pinning the boundary where the pipeline-cost penalty flips the answer.

## Follow-up references

- Mentioned in PR #19 ("the asymmetric MP-pipeline cost requires more sophistication").
- Cross-link to MAST cost-model issue once filed against the inter-card link characterisation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: model asymmetric pipeline-MP per-card-boundary forward cost #20

Context

Proposal

Acceptance

Follow-up references

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

scheduler: model asymmetric pipeline-MP per-card-boundary forward cost #20

Description

Context

Proposal

Acceptance

Follow-up references

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions