Skip to content

Expose bounded route diagnostics#804

Open
IvGolovach wants to merge 1 commit into
Mesh-LLM:mainfrom
IvGolovach:codex/adaptive-routing-diagnostics-pr-ready
Open

Expose bounded route diagnostics#804
IvGolovach wants to merge 1 commit into
Mesh-LLM:mainfrom
IvGolovach:codex/adaptive-routing-diagnostics-pr-ready

Conversation

@IvGolovach
Copy link
Copy Markdown
Collaborator

Summary

Expose bounded local route-decision diagnostics in /api/status and make gossiped throughput hints safer for routing decisions.

This PR adds a small operator-facing diagnostics window that explains recent model-routing choices: which targets were considered, which target was selected, and why other targets were rejected or deprioritized. It also prevents very low-sample gossiped throughput hints from reordering otherwise eligible targets.

Why

When routing fails because of context fit, target health, affinity, or throughput ordering, the current status payload does not explain enough to debug the decision. This makes routing behavior easier to inspect without changing mesh protocol compatibility or exposing request contents.

Diff scope

  • Adds routing_metrics.recent_route_decisions to the local /api/status routing metrics payload.
  • Records bounded per-target route reason codes from existing route-model candidate data.
  • Includes context length, required token budget, throughput hint, sample count, selected target, and reason codes where available.
  • Redacts endpoint query strings and endpoint userinfo before storing diagnostics.
  • Ignores low-sample gossiped throughput hints for target ranking.
  • Keeps diagnostics local-only; no gossip, protobuf, or wire-format changes.

Compatibility and safety

  • No mesh protocol changes.
  • No protobuf/schema changes.
  • No request body content is stored.
  • Diagnostics are bounded in memory.
  • No new unbounded queues or blocking locks.
  • No invariant regression introduced.

Validation

Validation tier: Tier 3 - shared runtime routing/status behavior.

Local validation passed:

git fetch --no-tags origin main:refs/remotes/origin/main: PASS, origin/main at 95101ce99577077af392bc10ffb04ffc8ad34a32
git diff --check: PASS, no output
git diff --cached --check: PASS, no output
cargo fmt --all: PASS
cargo fmt --all -- --check: PASS
cargo test -p mesh-llm-host-runtime route_diagnostics --lib -- --test-threads=1: PASS, 1 passed
cargo test -p mesh-llm-host-runtime reorder_candidates --lib -- --test-threads=1: PASS, 9 passed
cargo test -p mesh-llm-host-runtime route_model_target_reason_codes --lib -- --test-threads=1: PASS, 2 passed
cargo test -p mesh-llm-host-runtime status_payload_exposes_bounded_route_decision_diagnostics --lib -- --test-threads=1: PASS, 1 passed
cargo test -p mesh-llm-host-runtime routing_metric --lib -- --test-threads=1: PASS, 10 passed
cargo test -p mesh-llm-host-runtime test_api_proxy_rejects_request_when_all_known_contexts_too_small --lib -- --test-threads=1: PASS, 1 passed
cargo test -p mesh-llm-host-runtime request_budget_tokens --lib -- --test-threads=1: PASS, 3 passed
cargo check -p mesh-llm: PASS
cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS

Validation
* Validation tier: Tier 3 - shared runtime routing/status behavior; target ordering, context-fit rejection diagnostics, routing metrics status, and operator diagnostics changed.
* git fetch --no-tags origin main:refs/remotes/origin/main: PASS, origin/main at 95101ce.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all: PASS.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime route_diagnostics --lib -- --test-threads=1: PASS, 1 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime reorder_candidates --lib -- --test-threads=1: PASS, 9 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime route_model_target_reason_codes --lib -- --test-threads=1: PASS, 2 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime status_payload_exposes_bounded_route_decision_diagnostics --lib -- --test-threads=1: PASS, 1 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime routing_metric --lib -- --test-threads=1: PASS, 10 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime test_api_proxy_rejects_request_when_all_known_contexts_too_small --lib -- --test-threads=1: PASS, 1 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo test -p mesh-llm-host-runtime request_budget_tokens --lib -- --test-threads=1: PASS, 3 passed.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=/Users/Funtland/Downloads/mesh-llm/.deps/llama-build/build-stage-abi-metal /opt/homebrew/bin/cargo-clippy clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release runtime diagnostics change.
* Not run: live multi-node mesh startup/public-mesh smoke/agent harness - no local multi-node runtime mesh/model endpoint was available; targeted route ordering, diagnostics, status, proxy, shipped-binary check, and clippy cover the changed paths before mandatory remote CI/reviewer validation.

Rollback
* git revert HEAD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant