Skip to content

Reuse Skippy decode wire messages (2/3)#799

Merged
ndizazzo merged 1 commit into
skippy-decode-hotpath-cleanupfrom
skippy-decode-frame-reuse
Jun 6, 2026
Merged

Reuse Skippy decode wire messages (2/3)#799
ndizazzo merged 1 commit into
skippy-decode-hotpath-cleanupfrom
skippy-decode-frame-reuse

Conversation

@i386
Copy link
Copy Markdown
Collaborator

@i386 i386 commented Jun 5, 2026

Summary

Skippy split decode now reuses the per-token decode wire-message envelope in the frontend hot path instead of rebuilding the same StageWireMessage shape every token.

This is stacked on #798 and is intentionally scoped to allocation churn around decode message construction. It does not change topology, protocol, sampling, activation dtype, or native stage execution.

What changed

  • Added ReusableDecodeMessage, a small frontend wire-message helper that owns the stable decode envelope once.
  • Normal split decode mutates only decode_step, pos_start, current_token, and the one-token sideband each iteration.
  • Multimodal/split decode reuses the same message and token buffer, including the short exact-replay sideband checkpoint case.
  • Kept the existing one-shot embedded_decode_message helper for prefix-cache and repair paths.
  • Added a focused unit test for reusable decode message mutation and stable request/session/sampling fields.

Before

flowchart LR
    A["Each decode token"] --> B["Allocate StageStateHeader"]
    B --> C["Clone sampling config"]
    C --> D["Allocate tokens Vec"]
    D --> E["Allocate empty positions/activation/raw Vecs"]
    E --> F["Run stage + forward activation"]
Loading

After

flowchart LR
    A["Before decode loop"] --> B["Create reusable DecodeEmbd message"]
    B --> C["Each decode token"]
    C --> D["Mutate step/position/current token"]
    D --> E["Refill existing token sideband buffer"]
    E --> F["Run stage + forward activation"]
Loading

Performance Impact

This targets small but repeated CPU-side work in TPOT:

  • Removes per-token allocation of the decode message's stable empty vectors.
  • Avoids per-token sampling clone in the two split decode loops.
  • Reuses the sideband token buffer for exact-replay checkpoint frames instead of allocating a fresh Vec every eligible token.

This should be a modest fixed-overhead improvement rather than a model-compute improvement. It is designed to compose with #798 and remain easy to benchmark independently once lab capacity is available.

Compatibility

No protocol or ABI changes. The emitted decode wire messages keep the same fields and token sideband semantics.

Validation

  • cargo fmt -p skippy-server -- --check
  • cargo check -p skippy-server
  • cargo test -p skippy-server --lib — 116 passed
  • cargo clippy -p skippy-server --all-targets -- -D warnings
  • cargo check -p mesh-llm
  • cargo clippy -p mesh-llm --all-targets -- -D warnings

@ndizazzo ndizazzo force-pushed the skippy-decode-hotpath-cleanup branch from cd1a007 to aeae536 Compare June 6, 2026 05:07
@ndizazzo ndizazzo changed the title Reuse Skippy decode wire messages Reuse Skippy decode wire messages (2/3) Jun 6, 2026
@ndizazzo ndizazzo force-pushed the skippy-decode-frame-reuse branch from dafa489 to 0bbf257 Compare June 6, 2026 05:29
@ndizazzo ndizazzo self-requested a review June 6, 2026 05:29
Copy link
Copy Markdown
Collaborator

@ndizazzo ndizazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timberrrrr

@ndizazzo ndizazzo merged commit 4d4752b into skippy-decode-hotpath-cleanup Jun 6, 2026
28 checks passed
@ndizazzo ndizazzo deleted the skippy-decode-frame-reuse branch June 6, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants