Reuse Skippy decode wire messages (2/3) by i386 · Pull Request #799 · Mesh-LLM/mesh-llm

i386 · 2026-06-05T06:39:27Z

Summary

Skippy split decode now reuses the per-token decode wire-message envelope in the frontend hot path instead of rebuilding the same StageWireMessage shape every token.

This is stacked on #798 and is intentionally scoped to allocation churn around decode message construction. It does not change topology, protocol, sampling, activation dtype, or native stage execution.

What changed

Added ReusableDecodeMessage, a small frontend wire-message helper that owns the stable decode envelope once.
Normal split decode mutates only decode_step, pos_start, current_token, and the one-token sideband each iteration.
Multimodal/split decode reuses the same message and token buffer, including the short exact-replay sideband checkpoint case.
Kept the existing one-shot embedded_decode_message helper for prefix-cache and repair paths.
Added a focused unit test for reusable decode message mutation and stable request/session/sampling fields.

Before

flowchart LR
    A["Each decode token"] --> B["Allocate StageStateHeader"]
    B --> C["Clone sampling config"]
    C --> D["Allocate tokens Vec"]
    D --> E["Allocate empty positions/activation/raw Vecs"]
    E --> F["Run stage + forward activation"]

After

flowchart LR
    A["Before decode loop"] --> B["Create reusable DecodeEmbd message"]
    B --> C["Each decode token"]
    C --> D["Mutate step/position/current token"]
    D --> E["Refill existing token sideband buffer"]
    E --> F["Run stage + forward activation"]

Performance Impact

This targets small but repeated CPU-side work in TPOT:

Removes per-token allocation of the decode message's stable empty vectors.
Avoids per-token sampling clone in the two split decode loops.
Reuses the sideband token buffer for exact-replay checkpoint frames instead of allocating a fresh Vec every eligible token.

This should be a modest fixed-overhead improvement rather than a model-compute improvement. It is designed to compose with #798 and remain easy to benchmark independently once lab capacity is available.

Compatibility

No protocol or ABI changes. The emitted decode wire messages keep the same fields and token sideband semantics.

Validation

cargo fmt -p skippy-server -- --check
cargo check -p skippy-server
cargo test -p skippy-server --lib — 116 passed
cargo clippy -p skippy-server --all-targets -- -D warnings
cargo check -p mesh-llm
cargo clippy -p mesh-llm --all-targets -- -D warnings

ndizazzo

timberrrrr

i386 mentioned this pull request Jun 5, 2026

Reuse Skippy forwarded decode frames (1/3) #800

Merged

ndizazzo assigned i386 Jun 6, 2026

ndizazzo force-pushed the skippy-decode-hotpath-cleanup branch from cd1a007 to aeae536 Compare June 6, 2026 05:07

ndizazzo changed the title ~~Reuse Skippy decode wire messages~~ Reuse Skippy decode wire messages (2/3) Jun 6, 2026

Reuse Skippy decode wire messages

0bbf257

ndizazzo force-pushed the skippy-decode-frame-reuse branch from dafa489 to 0bbf257 Compare June 6, 2026 05:29

ndizazzo self-requested a review June 6, 2026 05:29

ndizazzo approved these changes Jun 6, 2026

View reviewed changes

ndizazzo merged commit 4d4752b into skippy-decode-hotpath-cleanup Jun 6, 2026
28 checks passed

ndizazzo deleted the skippy-decode-frame-reuse branch June 6, 2026 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Skippy decode wire messages (2/3)#799

Reuse Skippy decode wire messages (2/3)#799
ndizazzo merged 1 commit into
skippy-decode-hotpath-cleanupfrom
skippy-decode-frame-reuse

i386 commented Jun 5, 2026

Uh oh!

ndizazzo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

i386 commented Jun 5, 2026

Summary

What changed

Before

After

Performance Impact

Compatibility

Validation

Uh oh!

ndizazzo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants