Protect Skippy stage decode from KV pressure#758
Conversation
|
@michaelneale would be good for you to look at this one too |
i386
left a comment
There was a problem hiding this comment.
Approved but would like @michaelneale or @ndizazzo to double check as I have not done much with KV recently
2c0b143 to
dcbfecf
Compare
ndizazzo
left a comment
There was a problem hiding this comment.
@IvGolovach Oh hmm.. I think there’s a small hole here: on a one-chunk PrefillFinalEmbd, we may not have a runtime session yet when this eviction runs.
Since the batch size check needs an active session, that can turn into session … is not active, get logged, and then we carry on without actually evicting anything. So the new guard may not kick in for a valid final-prefill.
Might be worth either moving this until after the session is created for PrefillFinalEmbd, or getting the batch target from somewhere that doesn’t require an active session. I’d also add a regression for the one-chunk final-prefill case so this doesn’t quietly slip back in.
Other than that, this could use a rebase to pick up CI runner changes because I removed Blacksmith... sorry!
Validation * Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364. * git rebase origin/main: PASS, no conflicts. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed. * cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release Skippy runtime correction. * Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed. * Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed. * Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof. * Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches. * Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation. Rollback * git revert HEAD
dcbfecf to
64118a7
Compare
|
conceptually I think ok - but really does need a live test somehow, should be able to fire it up even with a small model and slam it with goose or pi. that linked issue - I thought that was solved a few weeks ago, so how confident are we in the test coverage that this is fixing what it thinks it is? |
Validation * Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364. * git rebase origin/main: PASS, no conflicts. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed. * cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release Skippy runtime correction. * Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed. * Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed. * Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof. * Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches. * Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation. Rollback * git revert HEAD
Validation * Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b. * git rebase origin/main: PASS, no conflicts. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * just release-build: PASS. * MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal. * Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary. * Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this post-review runtime admission correction. * Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary. Rollback * git revert HEAD
64118a7 to
538af0c
Compare
|
Thanks, this was the right thing to force with a live run. I refreshed the branch on current main and added a second guard around the local OpenAI path: after tokenization, local/embedded-stage0 generation now reserves an active KV token budget before prefill/decode. That means overlapping long prompts queue against the context-sized KV footprint instead of all entering the native runtime and competing for the same unified KV cells. I also kept the one-chunk PrefillFinalEmbd regression in the test surface; the full skippy-server lib suite is green, including the one-chunk final-prefill case. Live evidence on the final dynamic-native-runtime build:
Goose/Pi were not available locally, so I used the repo qa-kv-tool-loop-stability harness against the release binary. That gives the same OpenAI overlap pressure shape, but I would still welcome a real Goose/Pi pass from a machine with a stronger tool-capable model. |
Validation * Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364. * git rebase origin/main: PASS, no conflicts. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed. * cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release Skippy runtime correction. * Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed. * Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed. * Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof. * Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches. * Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation. Rollback * git revert HEAD
Validation * Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b. * git rebase origin/main: PASS, no conflicts. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * just release-build: PASS. * MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal. * Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary. * Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this post-review runtime admission correction. * Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary. Rollback * git revert HEAD
13d544a to
af55d05
Compare
Validation * Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364. * git rebase origin/main: PASS, no conflicts. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed. * cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release Skippy runtime correction. * Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed. * Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed. * Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof. * Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches. * Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation. Rollback * git revert HEAD
Validation * Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b. * git rebase origin/main: PASS, no conflicts. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * just release-build: PASS. * MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal. * Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary. * Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this post-review runtime admission correction. * Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary. Rollback * git revert HEAD
Validation * Validation tier: Tier 4 — CI/action retry hardening plus Tier 3 runtime KV decode/admission correction after semantic rebase onto current main ccefb8a. * git fetch --no-tags https://github.com/Mesh-LLM/mesh-llm.git main:refs/remotes/meshorigin/main codex/kv-tool-loop-runtime-closure:refs/remotes/meshorigin/codex/kv-tool-loop-runtime-closure: PASS, PR head verified at af55d05 before rebase. * git rebase refs/remotes/meshorigin/main: PASS, scripts/ci-hf-download-smoke.sh conflict resolved by preserving main's rate-limit skip wrapper and #758 retry defaults. * git diff --check refs/remotes/meshorigin/main...HEAD: PASS * git diff --check: PASS * git diff --cached --check: PASS * bash -n scripts/ci-hf-download-smoke.sh: PASS * cargo fmt --all -- --check: PASS * cargo test -p model-hf retry_config --lib -- --test-threads=1: PASS * cargo test -p model-hf --lib -- --test-threads=1: PASS, 34 passed. * cargo test -p skippy-server binary_transport::tests:: --lib -- --test-threads=1: PASS, 8 passed. * cargo test -p skippy-server frontend::tests::proactive_eviction_attrs_are_bounded_and_request_free --lib -- --test-threads=1: PASS * cargo test -p skippy-server frontend::admission::tests:: --lib -- --test-threads=1: PASS, 4 passed. * cargo test -p skippy-server --lib -- --test-threads=1: PASS, 117 passed. * cargo check -p mesh-llm: PASS * cargo run -p xtask -- repo-consistency ci-crate-lists: PASS * ruby -e 'require "yaml"; ... YAML.load_file(...)': PASS for restore-smoke-inputs/action.yml, hf-download-smoke.yml, pr_builds.yml, scripted-binary-smoke.yml, smoke.yml, sdk-smoke.yml. * Ledger: not applicable — not required for selected validation tier/change family. * Version: not applicable — no release/deploy/versioned artifact update required for this PR correction. * Not run: live HF download smoke and two-node binary smoke locally — network/model-heavy remote CI gates validate the final pushed SHA. * Not run: Goose/Pi live harness — not available locally; existing qa-kv-tool-loop evidence remains prior proof, and final remote CI plus targeted Skippy tests cover this rebase. Rollback * git revert HEAD
af55d05 to
926382b
Compare
Validation * Validation tier: Tier 2R - post-review correction for PR #758; branch rebased onto current main and one-chunk PrefillFinalEmbd now activates its runtime session before resident-prefix proactive eviction, preserving mesh protocol, schema, Skippy ABI, and release metadata. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at ee67364. * git rebase origin/main: PASS, no conflicts. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 2 passed. * cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 108 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release Skippy runtime correction. * Not run: scripts/prepare-llama.sh pinned - not required for this post-review Rust-only server correction; no llama patch queue changed. * Not run: scripts/build-llama.sh - not required for this post-review Rust-only server correction; no native patch queue or ABI source changed. * Not run: python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability - not required for this narrow post-review lifecycle correction; existing deterministic binary transport, server, and cache tests cover the changed branch and mandatory PR CI is the final full proof. * Not run: live overlapping Goose/Pi tool-loop smoke - no local direct-model Skippy endpoint was available; targeted binary transport, resident-prefix eviction, cache, shipped-binary check, and clippy cover the changed branches. * Not run: just build - not required for selected validation tier; Rust-only server path changed and cargo check -p mesh-llm covers shipped binary compilation. Rollback * git revert HEAD
Validation * Validation tier: Tier 2R - post-review PR #758 correction; local OpenAI generation now reserves active KV token budget before local prefill/decode so overlapping long prompts queue instead of exhausting the unified native KV pool. * git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS, origin/main at 2d4be1b. * git rebase origin/main: PASS, no conflicts. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_token_budget --lib -- --test-threads=1: PASS, 4 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 112 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --all-targets -- -D warnings: PASS. * just release-build: PASS. * MESH_LLM_NATIVE_RUNTIME_CACHE_DIR=/Users/Funtland/.config/superpowers/worktrees/mesh-llm/kv-tool-loop-runtime-closure/target/live-proof/pr758-final-native-runtime-cache scripts/ci-install-native-runtime.sh ./target/release/mesh-llm target/live-proof/pr758-final-native-runtime metal: PASS, installed meshllm-native-runtime-darwin-aarch64-metal. * Live ctx=8192 resident-KV SmolLM2 harness: model-limited overall exit 1, but native_log_scan PASS with no fatal KV log patterns and no overlap 502/llama_decode failure reproduced on the final dynamic-runtime binary. * Live ctx=32768 resident-KV SmolLM2 harness: model-limited overall exit 1, exact_prefix_cache PASS with prompt_tokens=4506 cached_tokens=4505 and native_log_scan PASS with no fatal KV log patterns on the final dynamic-runtime binary. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this post-review runtime admission correction. * Not run: Goose/Pi live harness - goose and pi were not available in PATH; qa-kv-tool-loop-stability exercised the same OpenAI pressure shape with local release binary. Rollback * git revert HEAD
Validation * Validation tier: Tier 3 - Skippy KV admission/eviction runtime behavior plus HF download smoke retry hardening; touched binary transport, embedded OpenAI frontend, runtime state/KV integration, model-hf, and CI smoke script paths. * git fetch --no-tags origin main:refs/remotes/origin/main: PASS, origin/main at 95101ce. * git diff --check: PASS, no output. * git diff origin/main...HEAD --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all: PASS. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server binary_proactive_eviction --lib -- --test-threads=1: PASS, 1 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server one_chunk_prefill_final_admits_session_before_proactive_eviction --lib -- --test-threads=1: PASS, 1 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server restore_prefill_decode --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server generation_admission --lib -- --test-threads=1: PASS, 4 passed. * cargo test -p model-hf retry_config --lib -- --test-threads=1: PASS, 2 passed. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p skippy-server --lib -- --test-threads=1: PASS, 122 passed. * cargo test -p model-hf --lib -- --test-threads=1: PASS, 34 passed. * bash -n scripts/ci-hf-download-smoke.sh: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p skippy-server --lib -- -D warnings: PASS. * /opt/homebrew/bin/cargo-clippy clippy -p model-hf --lib -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release runtime/CI-smoke hardening change. * Not run: live KV/tool-loop certification - no local running OpenAI-compatible Skippy endpoint/model was available in this worktree; unit coverage, shipped-binary check, clippy, and required remote CI are the merge gates for this rebase repair. Rollback * git revert HEAD
926382b to
9a54dd8
Compare
Summary
This protects Skippy stage decode/replay from resident-prefix KV pressure before the binary transport asks the native runtime to decode.
When a stage is about to decode with a resident prefix already occupying KV space, the server now computes a bounded resident-prefix eviction plan first. That gives the Rust-side cache a chance to free stale prefix records before decode work hits native KV admission, while keeping the newly landed native KV compaction from #764 as the lower-level fragmentation recovery path.
Why
Issue #652 is not only one allocator failure mode. #764 addresses native unified-KV fragmentation. This PR covers the adjacent Skippy server path where resident-prefix cache entries can still consume KV budget before binary stage decode/replay work starts.
The intent is to reduce avoidable
llama_decode/ slot-pressure failures under long Goose/Pi-style tool loops without changing protocol, schema, or the Skippy ABI.Diff Scope
binary_transport::kv_evictionto compute bounded resident-prefix eviction decisions before decode/replay work.kv_integration.Branch Integrity
mainf9bd75a97378be014f52c4e68c13e81d3399e65e0 behind / 1 aheadf9bd75a97378be014f52c4e68c13e81d3399e65edcbfecfa14690fa6f250194cf239add52d2a022aEvict binary stage KV before decodeDiff Hygiene
Changed files:
crates/skippy-server/src/binary_transport.rscrates/skippy-server/src/binary_transport/kv_eviction.rscrates/skippy-server/src/binary_transport/tests.rscrates/skippy-server/src/frontend.rscrates/skippy-server/src/frontend/util.rscrates/skippy-server/src/kv_integration/mod.rsProof:
git diff --check origin/main...HEAD: PASS, no output.git diff --check: PASS, no output.git diff --cached --check: PASS, no output.Validation
Validation tier: Tier 3 - shared Skippy binary stage runtime path refreshed onto current main after PR #764; binary decode/replay now reserves resident-prefix KV capacity before decode work, complementing native KV compaction without protocol/schema/ABI changes.
git fetch --no-tags origin main:refs/remotes/origin/main codex/kv-tool-loop-runtime-closure:refs/remotes/origin/codex/kv-tool-loop-runtime-closure: PASS,origin/mainatf9bd75a9.git rebase origin/main: PASS, no conflicts.git diff --check origin/main...HEAD: PASS, no output.git diff --check: PASS, no output.git diff --cached --check: PASS, no output.scripts/prepare-llama.sh pinned: PASS, 82 patches applied; upstream22cadc1944f4658214aee03abd08240358840a95, patchedc9b0a02726a0608efa351bf648de9eef6909a565.scripts/build-llama.sh: PASS, patched CPU stage ABI libraries built.cargo fmt --all -- --check: PASS.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server binary_transport --lib -- --test-threads=1: PASS, 6 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server proactive_eviction --lib -- --test-threads=1: PASS, 1 passed.cargo test -p skippy-cache evict_lru_until_tokens --lib -- --test-threads=1: PASS, 2 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p skippy-server --lib -- --test-threads=1: PASS, 107 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p skippy-server --all-targets -- -D warnings: PASS.python3 -m unittest scripts.tests.test_qa_kv_tool_loop_stability: PASS, 18 passed.Required remote gates: PASS on refreshed head
dcbfecfa14690fa6f250194cf239add52d2a022a; PR Builds and PR Quality Checks completed successfully.Ledger: not applicable - not required for selected validation tier/change family.
Version: not applicable - no release/version sync required for this non-release Skippy runtime behavior change.
Not Run
just build: not required for selected validation tier; Rust-only server path changed and patched stage ABI plus shipped mesh cargo check cover changed runtime linkage.Runtime Safety
Rollback Plan
Rollback: revert this PR.
DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: none known.
Known Residual Risks
The remaining issue-level proof is a live #652-style direct-model Goose/Pi overlap certification on a loaded Skippy endpoint. This PR is merge-ready as a deterministic runtime hardening step, while that live certification remains the final behavioral confirmation once suitable local or remote model hardware is available.