Retry low-quality OpenAI responses before commit#762
Conversation
Validation * Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses `status: incomplete` retries now require a length/max-token reason so non-length terminal responses are preserved. * git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS, origin/main at f9bd75a. * git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests. * git diff --check origin/main...HEAD: PASS, no output. * git diff --check: PASS, no output. * git diff --cached --check: PASS, no output. * cargo fmt --all -- --check: PASS. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS. * LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS. * Ledger: not applicable - not required for selected validation tier/change family. * Version: not applicable - no release/version sync required for this non-release routing behavior change. * Not run: just build - not required for selected validation tier; no UI assets or release bundle changed. * Not run: live multi-node agent smoke - no local model/runtime mesh endpoint was available; deterministic proxy, transport, quality-detector, and stream-guard tests cover the changed paths. Rollback * git revert HEAD
d606bef to
4bafc3e
Compare
|
@IvGolovach this looks ok to me - is this a thing that has happened in your experience? I wasn't sure about that (and didn't see a related issue) |
|
Good question. I do not have a single filed issue for this exact shape, so I should not frame this as a direct fix for a specific report. The reason I opened it is the agent-harness failure mode we have been tightening around: a mesh target can return HTTP 200 before the router has committed anything to the client, but the payload is still effectively unusable for the caller. I kept the admission rules intentionally narrow to avoid turning subjective answer quality into routing policy:
The important boundary is that this only applies to non-streaming JSON responses before client-visible commit, and only retries when another eligible target exists. Streaming still keeps the existing first-byte commit behavior, and single-target routing preserves availability by returning the response. So I would describe this as a defensive routing hardening rather than a fix for one specific issue. If that still feels too speculative for this PR, I am happy to either narrow the wording further or hold it until we have a live repro from an agent harness. |
Summary
Users now get a cleaner OpenAI-compatible routing path when an upstream mesh target returns a technically successful but unusable non-streaming response. Before committing the response to the client, mesh-llm detects empty assistant output without tool calls, length-truncated responses, and repetitive responses, then tries the next eligible target when one is available.
Streaming responses keep the existing commit-on-first-byte behavior, and single-target routes still preserve availability by returning the target response instead of converting quality admission into a hard failure.
Why
Agent harnesses can get stuck when a target returns a 200 response that is effectively unusable. This gives the router a deterministic, pre-commit quality gate for the non-streaming JSON path while avoiding unsafe retries after a stream has already started.
Post-Review Update
The review comment on Responses
status: "incomplete"handling is addressed in the refreshed head. Responses payloads are now retried as length failures only when the response is incomplete andincomplete_details.reasonis a length/max-token condition. Non-length incomplete reasons, such as content-filter style terminal responses, are preserved instead of being hidden behind another target retry.Diff Scope
network/openai/response_quality.rs.Compatibility
No mesh protocol, plugin protocol, Skippy ABI, schema, or release metadata changes.
Branch Integrity
mainf9bd75a97378be014f52c4e68c13e81d3399e65ecodex/router-quality-gates4bafc3eeb43a4eb141f2550f11bc5d344971e9dd0 behind / 1 aheadDiff Hygiene
git diff --check origin/main...HEAD: PASS, no output.git diff --check: PASS, no output.git diff --cached --check: PASS, no output.Validation
Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses
status: incompleteretries now require a length/max-token reason so non-length terminal responses are preserved.git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS,origin/mainatf9bd75a9.git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests.git diff --check origin/main...HEAD: PASS, no output.git diff --check: PASS, no output.git diff --cached --check: PASS, no output.cargo fmt --all -- --check: PASS.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS.Required remote gates: PASS on refreshed head
4bafc3eeb43a4eb141f2550f11bc5d344971e9dd; the initialLinux tests (skippy-smoke)compiler/sccache abort passed after rerunning the failed GitHub Actions job with no code changes.Ledger: not applicable - not required for selected validation tier/change family.
Version: not applicable - no release/version sync required for this non-release routing behavior change.
Not Run
just build: not required for selected validation tier; no UI assets or release bundle changed.Runtime Safety
The retry decision is limited to non-streaming JSON responses before client-visible commit. Streaming responses keep the existing first-byte commit guard. No new protocol state, blocking locks, unbounded queues, or invariant removals were introduced.
Rollback
Revert this PR.
DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: none known.
Known Residual Risks
Final merge readiness still depends on mandatory GitHub CI/review for the pushed SHA. A live multi-node agent smoke would be useful as an additional confidence pass when a suitable mesh endpoint is available.