Skip to content

Protect Skippy stage decode from KV pressure#758

Merged
ndizazzo merged 3 commits into
mainfrom
codex/kv-tool-loop-runtime-closure
Jun 6, 2026
Merged

Protect Skippy stage decode from KV pressure#758
ndizazzo merged 3 commits into
mainfrom
codex/kv-tool-loop-runtime-closure

Conversation

@IvGolovach
Copy link
Copy Markdown
Collaborator

@IvGolovach IvGolovach commented May 30, 2026

Summary

This protects Skippy stage decode/replay from resident-prefix KV pressure before the binary transport asks the native runtime to decode.

When a stage is about to decode with a resident prefix already occupying KV space, the server now computes a bounded resident-prefix eviction plan first. That gives the Rust-side cache a chance to free stale prefix records before decode work hits native KV admission, while keeping the newly landed native KV compaction from #764 as the lower-level fragmentation recovery path.

Why

Issue #652 is not only one allocator failure mode. #764 addresses native unified-KV fragmentation. This PR covers the adjacent Skippy server path where resident-prefix cache entries can still consume KV budget before binary stage decode/replay work starts.

The intent is to reduce avoidable llama_decode / slot-pressure failures under long Goose/Pi-style tool loops without changing protocol, schema, or the Skippy ABI.

Diff Scope

  • Adds binary_transport::kv_eviction to compute bounded resident-prefix eviction decisions before decode/replay work.
  • Wires the eviction decision into binary stage decode admission.
  • Rehomes proactive-eviction telemetry helpers from frontend utility code into kv_integration.
  • Adds regression coverage for binary decode eviction admission and bounded telemetry attributes.
  • Keeps mesh protocol, gossip, API schema, and Skippy ABI unchanged.

Branch Integrity

  • Base branch: main
  • Validated base: f9bd75a97378be014f52c4e68c13e81d3399e65e
  • Ahead/behind: 0 behind / 1 ahead
  • Merge base: f9bd75a97378be014f52c4e68c13e81d3399e65e
  • Introduced commit: dcbfecfa14690fa6f250194cf239add52d2a022a Evict binary stage KV before decode

Diff Hygiene

Changed files:

  • crates/skippy-server/src/binary_transport.rs
  • crates/skippy-server/src/binary_transport/kv_eviction.rs
  • crates/skippy-server/src/binary_transport/tests.rs
  • crates/skippy-server/src/frontend.rs
  • crates/skippy-server/src/frontend/util.rs
  • crates/skippy-server/src/kv_integration/mod.rs

Proof:

  • git diff --check origin/main...HEAD: PASS, no output.
  • git diff --check: PASS, no output.
  • git diff --cached --check: PASS, no output.

Validation

Validation tier: Tier 3 - shared Skippy binary stage runtime path refreshed onto current main after PR #764; binary decode/replay now reserves resident-prefix KV capacity before decode work, complementing native KV compaction without protocol/schema/ABI changes.

  • git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at f9bd75a9.
  • git rebase origin/main: PASS, no conflicts.
  • git diff --check origin/main...HEAD: PASS, no output.
  • git diff --check: PASS, no output.
  • git diff --cached --check: PASS, no output.
  • scripts/prepare-llama.sh pinned: PASS, 82 patches applied; upstream 22cadc1944f4658214aee03abd08240358840a95, patched c9b0a02726a0608efa351bf648de9eef6909a565.
  • scripts/build-llama.sh: PASS, patched CPU stage ABI libraries built.
  • cargo fmt --all -- --check: PASS.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 6 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 1 passed.
  • cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server --lib -- --test-threads=1: PASS, 107 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p skippy-server --all-targets -- -D warnings: PASS.
  • python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability: PASS, 18 passed.

Required remote gates: PASS on refreshed head dcbfecfa14690fa6f250194cf239add52d2a022a; PR Builds and PR Quality Checks completed successfully.

Ledger: not applicable - not required for selected validation tier/change family.

Version: not applicable - no release/version sync required for this non-release Skippy runtime behavior change.

Not Run

  • Live overlapping Goose/Pi tool-loop smoke: no local direct-model Skippy endpoint was available. Deterministic binary transport, frontend telemetry, resident-prefix eviction, cache eviction, and QA harness unit tests cover changed branches.
  • just build: not required for selected validation tier; Rust-only server path changed and patched stage ABI plus shipped mesh cargo check cover changed runtime linkage.

Runtime Safety

  • No new blocking locks.
  • No new unbounded queues.
  • No mesh protocol or gossip invariant changed.
  • Proactive eviction is bounded by the resident-prefix planner and runs before binary decode/replay work, not inside the native decode loop.
  • No invariant regression introduced.

Rollback Plan

Rollback: revert this PR.

git revert <post_merge_commit_sha>

DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: none known.

Known Residual Risks

The remaining issue-level proof is a live #652-style direct-model Goose/Pi overlap certification on a loaded Skippy endpoint. This PR is merge-ready as a deterministic runtime hardening step, while that live certification remains the final behavioral confirmation once suitable local or remote model hardware is available.

@i386 i386 requested a review from ndizazzo May 31, 2026 05:29
@i386
Copy link
Copy Markdown
Collaborator

i386 commented May 31, 2026

@michaelneale would be good for you to look at this one too

Copy link
Copy Markdown
Collaborator

@i386 i386 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but would like @michaelneale or @ndizazzo to double check as I have not done much with KV recently

@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch from 2c0b143 to dcbfecf Compare May 31, 2026 22:56
Copy link
Copy Markdown
Collaborator

@ndizazzo ndizazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvGolovach Oh hmm.. I think there’s a small hole here: on a one-chunk PrefillFinalEmbd, we may not have a runtime session yet when this eviction runs.

Since the batch size check needs an active session, that can turn into session … is not active, get logged, and then we carry on without actually evicting anything. So the new guard may not kick in for a valid final-prefill.

Might be worth either moving this until after the session is created for PrefillFinalEmbd, or getting the batch target from somewhere that doesn’t require an active session. I’d also add a regression for the one-chunk final-prefill case so this doesn’t quietly slip back in.

Other than that, this could use a rebase to pick up CI runner changes because I removed Blacksmith... sorry!

IvGolovach added a commit that referenced this pull request Jun 3, 2026
Validation
* Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364.
* git rebase origin/main: PASS, no conflicts.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed.
* cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release Skippy runtime correction.
* Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed.
* Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed.
* Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof.
* Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches.
* Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch from dcbfecf to 64118a7 Compare June 3, 2026 01:21
@michaelneale
Copy link
Copy Markdown
Collaborator

conceptually I think ok - but really does need a live test somehow, should be able to fire it up even with a small model and slam it with goose or pi.
Only concern I have is if it goes too hard and cache ends up not really doing anything.

that linked issue - I thought that was solved a few weeks ago, so how confident are we in the test coverage that this is fixing what it thinks it is?

IvGolovach added a commit that referenced this pull request Jun 3, 2026
Validation
* Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364.
* git rebase origin/main: PASS, no conflicts.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed.
* cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release Skippy runtime correction.
* Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed.
* Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed.
* Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof.
* Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches.
* Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation.

Rollback
* git revert HEAD
IvGolovach added a commit that referenced this pull request Jun 3, 2026
Validation
* Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b.
* git rebase origin/main: PASS, no conflicts.
* git diff --check origin/main...HEAD: PASS, no output.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* just release-build: PASS.
* MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal.
* Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary.
* Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this post-review runtime admission correction.
* Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch from 64118a7 to 538af0c Compare June 3, 2026 04:07
@IvGolovach
Copy link
Copy Markdown
Collaborator Author

Thanks, this was the right thing to force with a live run.

I refreshed the branch on current main and added a second guard around the local OpenAI path: after tokenization, local/embedded-stage0 generation now reserves an active KV token budget before prefill/decode. That means overlapping long prompts queue against the context-sized KV footprint instead of all entering the native runtime and competing for the same unified KV cells.

I also kept the one-chunk PrefillFinalEmbd regression in the test surface; the full skippy-server lib suite is green, including the one-chunk final-prefill case.

Live evidence on the final dynamic-native-runtime build:

  • ctx=8192 + resident KV + SmolLM2 + 3-way overlap pressure: native log scan is clean; no failed memory-slot / llama_decode failure reproduced. The full harness is still red because SmolLM2 does not reliably emit tool calls and the tiny ctx evicts cache to preserve decode room.
  • ctx=32768 + the same setup: exact-prefix cache is still alive, with prompt_tokens=4506 and cached_tokens=4505. So the fix is not just disabling cache; it preserves cache when the context has room and protects decode when it does not.

Goose/Pi were not available locally, so I used the repo qa-kv-tool-loop-stability harness against the release binary. That gives the same OpenAI overlap pressure shape, but I would still welcome a real Goose/Pi pass from a machine with a stronger tool-capable model.

IvGolovach added a commit that referenced this pull request Jun 3, 2026
Validation
* Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364.
* git rebase origin/main: PASS, no conflicts.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed.
* cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release Skippy runtime correction.
* Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed.
* Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed.
* Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof.
* Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches.
* Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation.

Rollback
* git revert HEAD
IvGolovach added a commit that referenced this pull request Jun 3, 2026
Validation
* Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b.
* git rebase origin/main: PASS, no conflicts.
* git diff --check origin/main...HEAD: PASS, no output.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* just release-build: PASS.
* MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal.
* Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary.
* Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this post-review runtime admission correction.
* Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch 2 times, most recently from 13d544a to af55d05 Compare June 3, 2026 06:12
IvGolovach added a commit that referenced this pull request Jun 4, 2026
Validation
* Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364.
* git rebase origin/main: PASS, no conflicts.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed.
* cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release Skippy runtime correction.
* Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed.
* Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed.
* Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof.
* Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches.
* Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation.

Rollback
* git revert HEAD
IvGolovach added a commit that referenced this pull request Jun 4, 2026
Validation
* Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b.
* git rebase origin/main: PASS, no conflicts.
* git diff --check origin/main...HEAD: PASS, no output.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* just release-build: PASS.
* MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal.
* Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary.
* Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this post-review runtime admission correction.
* Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary.

Rollback
* git revert HEAD
IvGolovach added a commit that referenced this pull request Jun 4, 2026
Validation
* Validation tier: Tier 4 — CI/action retry hardening plus Tier 3 runtime KV decode/admission correction after semantic rebase onto current main ccefb8a.
* git fetch --no-tags https://github.com/Mesh-LLM/mesh-llm.git main:refs/remotes/meshorigin/main codex/kv-tool-loop-runtime-closure:refs/remotes/meshorigin/codex/kv-tool-loop-runtime-closure: PASS, PR head verified at af55d05 before rebase.
* git rebase refs/remotes/meshorigin/main: PASS, scripts/ci-hf-download-smoke.sh conflict resolved by preserving main's rate-limit skip wrapper and #758 retry defaults.
* git diff --check refs/remotes/meshorigin/main...HEAD: PASS
* git diff --check: PASS
* git diff --cached --check: PASS
* bash -n scripts/ci-hf-download-smoke.sh: PASS
* cargo fmt --all -- --check: PASS
* cargo test -p model-hf retry_config --lib -- --test-threads=1: PASS
* cargo test -p model-hf --lib -- --test-threads=1: PASS, 34 passed.
* cargo test -p skippy-server binary_transport::tests:: --lib -- --test-threads=1: PASS, 8 passed.
* cargo test -p skippy-server frontend::tests::proactive_eviction_attrs_are_bounded_and_request_free --lib -- --test-threads=1: PASS
* cargo test -p skippy-server frontend::admission::tests:: --lib -- --test-threads=1: PASS, 4 passed.
* cargo test -p skippy-server --lib -- --test-threads=1: PASS, 117 passed.
* cargo check -p mesh-llm: PASS
* cargo run -p xtask -- repo-consistency ci-crate-lists: PASS
* ruby -e 'require "yaml"; ... YAML.load_file(...)': PASS for restore-smoke-inputs/action.yml, hf-download-smoke.yml, pr_builds.yml, scripted-binary-smoke.yml, smoke.yml, sdk-smoke.yml.
* Ledger: not applicable — not required for selected validation tier/change family.
* Version: not applicable — no release/deploy/versioned artifact update required for this PR correction.
* Not run: live HF download smoke and two-node binary smoke locally — network/model-heavy remote CI gates validate the final pushed SHA.
* Not run: Goose/Pi live harness — not available locally; existing qa-kv-tool-loop evidence remains prior proof, and final remote CI plus targeted Skippy tests cover this rebase.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch from af55d05 to 926382b Compare June 4, 2026 21:44
Validation
* Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364.
* git rebase origin/main: PASS, no conflicts.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed.
* cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release Skippy runtime correction.
* Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed.
* Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed.
* Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof.
* Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches.
* Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation.

Rollback
* git revert HEAD
Validation
* Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b.
* git rebase origin/main: PASS, no conflicts.
* git diff --check origin/main...HEAD: PASS, no output.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS.
* just release-build: PASS.
* MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal.
* Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary.
* Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this post-review runtime admission correction.
* Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary.

Rollback
* git revert HEAD
Validation
* Validation tier: Tier 3 - Skippy KV admission/eviction runtime behavior plus HF download smoke retry hardening; touched binary transport, embedded OpenAI frontend, runtime state/KV integration, model-hf, and CI smoke script paths.
* git fetch --no-tags origin main:refs/remotes/origin/main: PASS, origin/main at 95101ce.
* git diff --check: PASS, no output.
* git diff origin/main...HEAD --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all: PASS.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_proactive_eviction --lib -- --test-threads=1: PASS, 1 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server one_chunk_prefill_final_admits_session_before_proactive_eviction --lib -- --test-threads=1: PASS, 1 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server restore_prefill_decode --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_admission --lib -- --test-threads=1: PASS, 4 passed.
* cargo test -p model-hf retry_config --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 122 passed.
* cargo test -p model-hf --lib -- --test-threads=1: PASS, 34 passed.
* bash -n scripts/ci-hf-download-smoke.sh: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --lib -- -D warnings: PASS.
* /opt/homebrew/bin/cargo-clippy clippy -p model-hf --lib -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release runtime/CI-smoke hardening change.
* Not run: live KV/tool-loop certification - no local running OpenAI-compatible Skippy endpoint/model was available in this worktree; unit coverage, shipped-binary check, clippy, and required remote CI are the merge gates for this rebase repair.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/kv-tool-loop-runtime-closure branch from 926382b to 9a54dd8 Compare June 5, 2026 16:53
@ndizazzo ndizazzo self-requested a review June 6, 2026 04:19
@ndizazzo ndizazzo merged commit 365b7b2 into main Jun 6, 2026
28 checks passed
@ndizazzo ndizazzo deleted the codex/kv-tool-loop-runtime-closure branch June 6, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants