Reuse Skippy forwarded decode frames (1/3) by i386 · Pull Request #800 · Mesh-LLM/mesh-llm

i386 · 2026-06-05T06:55:36Z

Summary

Skippy split decode now reuses the forwarded activation frame envelope and activation encode buffer across decode tokens.

This is stacked on #799. It keeps the existing writer and wire format unchanged, but removes repeated forwarded StageWireMessage construction and gives activation encoding a caller-owned buffer to refill.

What changed

Added an internal in-place activation encoder: encode_f32_activation_payload_with_state_flags_into(...).
Kept the existing Vec-returning activation encode API by delegating to the in-place helper.
Added ReusableForwardedStageMessage for the decode forwarding hot path.
Reused forwarded decode frame containers in normal split decode and multimodal split decode.
Preserved existing one-shot forwarded_stage_message_timed(...) for non-loop callsites.

Before

flowchart LR
    A["Native decode output"] --> B["Encode activation into new Vec"]
    B --> C["Build new forwarded StageWireMessage"]
    C --> D["Clone sampling/tokens/positions"]
    D --> E["write_stage_message_conditioned"]

After

flowchart LR
    A["Before decode loop"] --> B["Create reusable forwarded frame"]
    B --> C["Native decode output"]
    C --> D["Refill activation encode buffer"]
    D --> E["Update forwarded frame fields"]
    E --> F["write_stage_message_conditioned"]

Performance Impact

This targets fixed CPU/allocation overhead on decode TPOT:

Reuses the activation encode buffer instead of allocating a fresh activation Vec every forwarded decode token.
Reuses token/position/raw-byte containers on the forwarded message.
Avoids repeatedly cloning stable sampling config in the two split decode loops unless it actually changes.

Expected impact is still modest because GPU forward, network wait, activation wire bytes, and downstream compute remain unchanged. This should be more meaningful than #799 when activation encode allocation shows up in profiles, but it is still a cleanup-class improvement rather than the main 30 tok/s lever.

High-impact follow-up

The bigger decode lever remains overlap/pipelining: hide stage0 setup, sampler/direct-return handling, or downstream wait behind work already in flight. That can remove or hide milliseconds of TPOT; this PR removes repeated local allocation work.

Compatibility

No wire protocol, ABI, topology, sampling, or activation dtype changes. The emitted stage messages keep the same fields and activation bytes.

Validation

cargo fmt -p skippy-protocol -p skippy-server -- --check
cargo check -p skippy-server
cargo test -p skippy-protocol --lib — 34 passed
cargo test -p skippy-server --lib — 117 passed
cargo clippy -p skippy-protocol --all-targets -- -D warnings
cargo clippy -p skippy-server --all-targets -- -D warnings
cargo check -p mesh-llm
cargo clippy -p mesh-llm --all-targets -- -D warnings

ndizazzo · 2026-06-06T05:23:19Z

Tipping the stack

Reuse Skippy forwarded decode frames

fa1c0c4

ndizazzo assigned i386 Jun 6, 2026

ndizazzo changed the title ~~Reuse Skippy forwarded decode frames~~ Reuse Skippy forwarded decode frames (1/3) Jun 6, 2026

ndizazzo self-requested a review June 6, 2026 05:23

ndizazzo merged commit dafa489 into skippy-decode-frame-reuse Jun 6, 2026
28 checks passed

ndizazzo deleted the skippy-decode-forward-reuse branch June 6, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Skippy forwarded decode frames (1/3)#800

Reuse Skippy forwarded decode frames (1/3)#800
ndizazzo merged 1 commit into
skippy-decode-frame-reusefrom
skippy-decode-forward-reuse

i386 commented Jun 5, 2026

Uh oh!

ndizazzo commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

i386 commented Jun 5, 2026

Summary

What changed

Before

After

Performance Impact

High-impact follow-up

Compatibility

Validation

Uh oh!

ndizazzo commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants