Retry low-quality OpenAI responses before commit by IvGolovach · Pull Request #762 · Mesh-LLM/mesh-llm

IvGolovach · 2026-05-30T19:00:49Z

Summary

Users now get a cleaner OpenAI-compatible routing path when an upstream mesh target returns a technically successful but unusable non-streaming response. Before committing the response to the client, mesh-llm detects empty assistant output without tool calls, length-truncated responses, and repetitive responses, then tries the next eligible target when one is available.

Streaming responses keep the existing commit-on-first-byte behavior, and single-target routes still preserve availability by returning the target response instead of converting quality admission into a hard failure.

Why

Agent harnesses can get stuck when a target returns a 200 response that is effectively unusable. This gives the router a deterministic, pre-commit quality gate for the non-streaming JSON path while avoiding unsafe retries after a stream has already started.

Post-Review Update

The review comment on Responses status: "incomplete" handling is addressed in the refreshed head. Responses payloads are now retried as length failures only when the response is incomplete and incomplete_details.reason is a length/max-token condition. Non-length incomplete reasons, such as content-filter style terminal responses, are preserved instead of being hidden behind another target retry.

Diff Scope

Adds OpenAI response quality classification in network/openai/response_quality.rs.
Wires quality admission into non-streaming OpenAI transport retries.
Requires Responses incomplete retries to have both incomplete status and a length/max-token reason.
Marks rejected targets as non-cooling rejected health observations and clears cached affinity so the next eligible target can be tried.
Keeps streaming retries guarded so responses are not retried after client-visible bytes are committed.
Adds regression coverage for empty content, truncation, repetition, Responses incomplete reason handling, retry behavior, and streaming no-retry behavior.

Compatibility

No mesh protocol, plugin protocol, Skippy ABI, schema, or release metadata changes.

Branch Integrity

Base branch: main
Validated base: f9bd75a97378be014f52c4e68c13e81d3399e65e
Head branch: codex/router-quality-gates
Head commit: 4bafc3eeb43a4eb141f2550f11bc5d344971e9dd
Ahead/behind after local rebase: 0 behind / 1 ahead

Diff Hygiene

git diff --check origin/main...HEAD: PASS, no output.
git diff --check: PASS, no output.
git diff --cached --check: PASS, no output.

Validation

Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses status: incomplete retries now require a length/max-token reason so non-length terminal responses are preserved.

git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS, origin/main at f9bd75a9.
git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests.
git diff --check origin/main...HEAD: PASS, no output.
git diff --check: PASS, no output.
git diff --cached --check: PASS, no output.
cargo fmt --all -- --check: PASS.
LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed.
LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed.
LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed.
LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.
LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS.

Required remote gates: PASS on refreshed head 4bafc3eeb43a4eb141f2550f11bc5d344971e9dd; the initial Linux tests (skippy-smoke) compiler/sccache abort passed after rerunning the failed GitHub Actions job with no code changes.

Ledger: not applicable - not required for selected validation tier/change family.

Version: not applicable - no release/version sync required for this non-release routing behavior change.

Not Run

just build: not required for selected validation tier; no UI assets or release bundle changed.
Live multi-node agent smoke: no local model/runtime mesh endpoint was available; deterministic proxy, transport, quality-detector, and stream-guard tests cover the changed paths.

Runtime Safety

The retry decision is limited to non-streaming JSON responses before client-visible commit. Streaming responses keep the existing first-byte commit guard. No new protocol state, blocking locks, unbounded queues, or invariant removals were introduced.

Rollback

Revert this PR.

DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: none known.

Known Residual Risks

Final merge readiness still depends on mandatory GitHub CI/review for the pushed SHA. A live multi-node agent smoke would be useful as an additional confidence pass when a suitable mesh endpoint is available.

Validation * Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses `status: incomplete` retries now require a length/max-token reason so non-length terminal responses are preserved. * git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS, origin/main at f9bd75a. * git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release routing behavior change. * Not run: just build - not required for selected validation tier; no UI assets or release bundle changed. * Not run: live multi-node agent smoke - no local model/runtime mesh endpoint was available; deterministic proxy, transport, quality-detector, and stream-guard tests cover the changed paths. Rollback * git revert HEAD

michaelneale · 2026-06-01T07:29:45Z

@IvGolovach this looks ok to me - is this a thing that has happened in your experience? I wasn't sure about that (and didn't see a related issue)

IvGolovach · 2026-06-01T16:34:54Z

Good question. I do not have a single filed issue for this exact shape, so I should not frame this as a direct fix for a specific report.

The reason I opened it is the agent-harness failure mode we have been tightening around: a mesh target can return HTTP 200 before the router has committed anything to the client, but the payload is still effectively unusable for the caller. I kept the admission rules intentionally narrow to avoid turning subjective answer quality into routing policy:

empty assistant output with no tool call payload
explicit length/max-token truncation
obvious repetitive-loop output

The important boundary is that this only applies to non-streaming JSON responses before client-visible commit, and only retries when another eligible target exists. Streaming still keeps the existing first-byte commit behavior, and single-target routing preserves availability by returning the response.

So I would describe this as a defensive routing hardening rather than a fix for one specific issue. If that still feels too speculative for this PR, I am happy to either narrow the wording further or hold it until we have a live repro from an agent harness.

i386 reviewed May 31, 2026

View reviewed changes

Comment thread crates/mesh-llm-host-runtime/src/network/openai/response_quality.rs Outdated

ndizazzo assigned IvGolovach May 31, 2026

IvGolovach force-pushed the codex/router-quality-gates branch from d606bef to 4bafc3e Compare June 1, 2026 00:07

ndizazzo self-requested a review June 6, 2026 04:39

ndizazzo approved these changes Jun 6, 2026

View reviewed changes

ndizazzo merged commit c9047e7 into main Jun 6, 2026
48 of 49 checks passed

ndizazzo deleted the codex/router-quality-gates branch June 6, 2026 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry low-quality OpenAI responses before commit#762

Retry low-quality OpenAI responses before commit#762
ndizazzo merged 1 commit into
mainfrom
codex/router-quality-gates

IvGolovach commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

michaelneale commented Jun 1, 2026

Uh oh!

IvGolovach commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

IvGolovach commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Post-Review Update

Diff Scope

Compatibility

Branch Integrity

Diff Hygiene

Validation

Not Run

Runtime Safety

Rollback

Known Residual Risks

Uh oh!

Uh oh!

michaelneale commented Jun 1, 2026

Uh oh!

IvGolovach commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IvGolovach commented May 30, 2026 •

edited

Loading