Reuse Skippy forwarded decode frames (1/3)#800
Merged
Conversation
Collaborator
|
Tipping the stack |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Skippy split decode now reuses the forwarded activation frame envelope and activation encode buffer across decode tokens.
This is stacked on #799. It keeps the existing writer and wire format unchanged, but removes repeated forwarded
StageWireMessageconstruction and gives activation encoding a caller-owned buffer to refill.What changed
encode_f32_activation_payload_with_state_flags_into(...).Vec-returning activation encode API by delegating to the in-place helper.ReusableForwardedStageMessagefor the decode forwarding hot path.forwarded_stage_message_timed(...)for non-loop callsites.Before
flowchart LR A["Native decode output"] --> B["Encode activation into new Vec"] B --> C["Build new forwarded StageWireMessage"] C --> D["Clone sampling/tokens/positions"] D --> E["write_stage_message_conditioned"]After
flowchart LR A["Before decode loop"] --> B["Create reusable forwarded frame"] B --> C["Native decode output"] C --> D["Refill activation encode buffer"] D --> E["Update forwarded frame fields"] E --> F["write_stage_message_conditioned"]Performance Impact
This targets fixed CPU/allocation overhead on decode TPOT:
Vecevery forwarded decode token.Expected impact is still modest because GPU forward, network wait, activation wire bytes, and downstream compute remain unchanged. This should be more meaningful than #799 when activation encode allocation shows up in profiles, but it is still a cleanup-class improvement rather than the main 30 tok/s lever.
High-impact follow-up
The bigger decode lever remains overlap/pipelining: hide stage0 setup, sampler/direct-return handling, or downstream wait behind work already in flight. That can remove or hide milliseconds of TPOT; this PR removes repeated local allocation work.
Compatibility
No wire protocol, ABI, topology, sampling, or activation dtype changes. The emitted stage messages keep the same fields and activation bytes.
Validation
cargo fmt -p skippy-protocol -p skippy-server -- --checkcargo check -p skippy-servercargo test -p skippy-protocol --lib— 34 passedcargo test -p skippy-server --lib— 117 passedcargo clippy -p skippy-protocol --all-targets -- -D warningscargo clippy -p skippy-server --all-targets -- -D warningscargo check -p mesh-llmcargo clippy -p mesh-llm --all-targets -- -D warnings