Skip to content

Retry low-quality OpenAI responses before commit#762

Merged
ndizazzo merged 1 commit into
mainfrom
codex/router-quality-gates
Jun 6, 2026
Merged

Retry low-quality OpenAI responses before commit#762
ndizazzo merged 1 commit into
mainfrom
codex/router-quality-gates

Conversation

@IvGolovach
Copy link
Copy Markdown
Collaborator

@IvGolovach IvGolovach commented May 30, 2026

Summary

Users now get a cleaner OpenAI-compatible routing path when an upstream mesh target returns a technically successful but unusable non-streaming response. Before committing the response to the client, mesh-llm detects empty assistant output without tool calls, length-truncated responses, and repetitive responses, then tries the next eligible target when one is available.

Streaming responses keep the existing commit-on-first-byte behavior, and single-target routes still preserve availability by returning the target response instead of converting quality admission into a hard failure.

Why

Agent harnesses can get stuck when a target returns a 200 response that is effectively unusable. This gives the router a deterministic, pre-commit quality gate for the non-streaming JSON path while avoiding unsafe retries after a stream has already started.

Post-Review Update

The review comment on Responses status: "incomplete" handling is addressed in the refreshed head. Responses payloads are now retried as length failures only when the response is incomplete and incomplete_details.reason is a length/max-token condition. Non-length incomplete reasons, such as content-filter style terminal responses, are preserved instead of being hidden behind another target retry.

Diff Scope

  • Adds OpenAI response quality classification in network/openai/response_quality.rs.
  • Wires quality admission into non-streaming OpenAI transport retries.
  • Requires Responses incomplete retries to have both incomplete status and a length/max-token reason.
  • Marks rejected targets as non-cooling rejected health observations and clears cached affinity so the next eligible target can be tried.
  • Keeps streaming retries guarded so responses are not retried after client-visible bytes are committed.
  • Adds regression coverage for empty content, truncation, repetition, Responses incomplete reason handling, retry behavior, and streaming no-retry behavior.

Compatibility

No mesh protocol, plugin protocol, Skippy ABI, schema, or release metadata changes.

Branch Integrity

  • Base branch: main
  • Validated base: f9bd75a97378be014f52c4e68c13e81d3399e65e
  • Head branch: codex/router-quality-gates
  • Head commit: 4bafc3eeb43a4eb141f2550f11bc5d344971e9dd
  • Ahead/behind after local rebase: 0 behind / 1 ahead

Diff Hygiene

  • git diff --check origin/main...HEAD: PASS, no output.
  • git diff --check: PASS, no output.
  • git diff --cached --check: PASS, no output.

Validation

Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses status: incomplete retries now require a length/max-token reason so non-length terminal responses are preserved.

  • git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS, origin/main at f9bd75a9.
  • git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests.
  • git diff --check origin/main...HEAD: PASS, no output.
  • git diff --check: PASS, no output.
  • git diff --cached --check: PASS, no output.
  • cargo fmt --all -- --check: PASS.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.
  • LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS.

Required remote gates: PASS on refreshed head 4bafc3eeb43a4eb141f2550f11bc5d344971e9dd; the initial Linux tests (skippy-smoke) compiler/sccache abort passed after rerunning the failed GitHub Actions job with no code changes.

Ledger: not applicable - not required for selected validation tier/change family.

Version: not applicable - no release/version sync required for this non-release routing behavior change.

Not Run

  • just build: not required for selected validation tier; no UI assets or release bundle changed.
  • Live multi-node agent smoke: no local model/runtime mesh endpoint was available; deterministic proxy, transport, quality-detector, and stream-guard tests cover the changed paths.

Runtime Safety

The retry decision is limited to non-streaming JSON responses before client-visible commit. Streaming responses keep the existing first-byte commit guard. No new protocol state, blocking locks, unbounded queues, or invariant removals were introduced.

Rollback

Revert this PR.

DB downgrade: not applicable.
Data repair: not applicable.
Operational caveats: none known.

Known Residual Risks

Final merge readiness still depends on mandatory GitHub CI/review for the pushed SHA. A live multi-node agent smoke would be useful as an additional confidence pass when a suitable mesh endpoint is available.

Comment thread crates/mesh-llm-host-runtime/src/network/openai/response_quality.rs Outdated
Validation
* Validation tier: Tier 3 - shared OpenAI routing/proxy behavior refreshed onto current main for PR #762; non-streaming JSON success responses get deterministic quality admission before client-visible commit, while Responses `status: incomplete` retries now require a length/max-token reason so non-length terminal responses are preserved.
* git fetch --no-tags origin main:refs/remotes/origin/main codex/router-quality-gates:refs/remotes/origin/codex/router-quality-gates: PASS, origin/main at f9bd75a.
* git rebase origin/main: PASS after resolving one proxy test-module conflict by preserving both the landed context-fit admission test and this PR's response-quality retry tests.
* git diff --check origin/main...HEAD: PASS, no output.
* git diff --check: PASS, no output.
* git diff --cached --check: PASS, no output.
* cargo fmt --all -- --check: PASS.
* LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime response_quality --lib -- --test-threads=1: PASS, 7 passed.
* LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime test_api_proxy_retries --lib -- --test-threads=1: PASS, 3 passed.
* LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo test -p mesh-llm-host-runtime openai::transport --lib -- --test-threads=1: PASS, 67 passed.
* LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo check -p mesh-llm: PASS.
* LLAMA_STAGE_BUILD_DIR=<stage-build-dir> cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings: PASS.
* Ledger: not applicable - not required for selected validation tier/change family.
* Version: not applicable - no release/version sync required for this non-release routing behavior change.
* Not run: just build - not required for selected validation tier; no UI assets or release bundle changed.
* Not run: live multi-node agent smoke - no local model/runtime mesh endpoint was available; deterministic proxy, transport, quality-detector, and stream-guard tests cover the changed paths.

Rollback
* git revert HEAD
@IvGolovach IvGolovach force-pushed the codex/router-quality-gates branch from d606bef to 4bafc3e Compare June 1, 2026 00:07
@michaelneale
Copy link
Copy Markdown
Collaborator

@IvGolovach this looks ok to me - is this a thing that has happened in your experience? I wasn't sure about that (and didn't see a related issue)

@IvGolovach
Copy link
Copy Markdown
Collaborator Author

Good question. I do not have a single filed issue for this exact shape, so I should not frame this as a direct fix for a specific report.

The reason I opened it is the agent-harness failure mode we have been tightening around: a mesh target can return HTTP 200 before the router has committed anything to the client, but the payload is still effectively unusable for the caller. I kept the admission rules intentionally narrow to avoid turning subjective answer quality into routing policy:

  • empty assistant output with no tool call payload
  • explicit length/max-token truncation
  • obvious repetitive-loop output

The important boundary is that this only applies to non-streaming JSON responses before client-visible commit, and only retries when another eligible target exists. Streaming still keeps the existing first-byte commit behavior, and single-target routing preserves availability by returning the response.

So I would describe this as a defensive routing hardening rather than a fix for one specific issue. If that still feels too speculative for this PR, I am happy to either narrow the wording further or hold it until we have a live repro from an agent harness.

@ndizazzo ndizazzo self-requested a review June 6, 2026 04:39
@ndizazzo ndizazzo merged commit c9047e7 into main Jun 6, 2026
48 of 49 checks passed
@ndizazzo ndizazzo deleted the codex/router-quality-gates branch June 6, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants