Skip to content

fix(inference): stop polluting schema-mode crop-describe responses#71

Merged
Liuhaai merged 2 commits into
mainfrom
fix-inference-schema-mode
May 19, 2026
Merged

fix(inference): stop polluting schema-mode crop-describe responses#71
Liuhaai merged 2 commits into
mainfrom
fix-inference-schema-mode

Conversation

@Liuhaai

@Liuhaai Liuhaai commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

/api/inference/crop-describe was silently mutating the wire response when callers passed a response_format=json_schema. Two bugs combined to make cortex's vehicle make field empty even when the VLM clearly recognized the make in its prose summary.

The bugs

1. _normalize_entity_item injected schema-forbidden fields

For every vehicle entity the normalizer ran:

label = norm.get("brand") or norm.get("make") or norm.get("description", "")
norm.setdefault("brand", label)
norm.setdefault("make", label)

When the model emitted {"id":"nv0","type":"suv","make":""}:

  • norm.get("make") returns "" (falsy) → falls through to description""
  • setdefault("brand", "") adds a brand key the schema never declared

Observed in prod (cam_1dd58540, frame obs_74a6bc82052b): the VLM summary said "A blue Audi SUV is parked near a payment kiosk…" yet the structured vehicles[] arrived at cortex as {"id":"nv0","type":"suv","make":"","brand":""}. The schema's additionalProperties: False contract was bypassed post-VLM.

2. Duplicate YOLO-context auto-prepend

Cortex already enumerates per-id detections in its scene_prompt (DETECTIONS:\n id=nv0 vehicle bbox=...). trio-core matched on the literal substring "YOLO detections" (which cortex's prompt does not contain) and prepended a second YOLO bbox table ahead of the caller's prompt.

Empirical A/B on qwen3-vl-flash with the Audi frame:

  • Cortex's prompt alone → make: "Audi" in 3/3 trials
  • Same prompt + trio-core's duplicate YOLO prefix → make: "Audi" in only 1/3 trials; the other 2/3 collapsed to make: "unknown"

Fix

Gate both behaviors on req.response_format is None:

  • _normalize_entities(entities, schema_mode=True) returns the parsed dict unchanged — no field invention, no iteration over animals/keys the caller's schema didn't declare.
  • The YOLO-context auto-prepend is skipped when a schema is in effect.

The legacy free-text default scene_prompt path keeps the string-to-dict fallback and the YOLO hint prepend; existing tests cover that and a new test pins it down.

Test plan

  • pytest tests/test_inference_router.py — 15 passed
  • Full suite: 419 passed, 7 skipped (no regressions)
  • New regression tests:
    • test_normalize_entities_schema_mode_preserves_empty_make
    • test_normalize_entities_schema_mode_drops_extra_keys_in_passthrough
    • test_crop_describe_schema_mode_skips_yolo_context_prepend
    • test_crop_describe_default_prompt_still_gets_yolo_context (legacy path preserved)
  • Manual A/B against the prod Audi frame via DashScope confirms duplicate-context fix restores attribute extraction.

🤖 Generated with Claude Code

Liuhaai and others added 2 commits May 19, 2026 15:26
When cortex calls /crop-describe with a json_schema response_format,
trio-core was silently mutating the wire response in two ways that
caused production VLM outputs to lose attributes the model had actually
identified:

1. **`_normalize_entity_item` injected schema-forbidden fields.** For
   every vehicle entity the normalizer ran
   `setdefault("brand", label)` even when the schema didn't declare a
   `brand` key, and `label` fell back to `description` (empty) when
   `make` was empty. Result on prod: model returned
   `{"id":"nv0","type":"suv","make":""}`; cortex received
   `{"id":"nv0","type":"suv","make":"","brand":""}`. The strict
   json_schema contract was bypassed post-VLM.

2. **YOLO-context auto-prepend duplicated the DETECTIONS hint.** Cortex
   already enumerates per-id detections in its scene_prompt. trio-core
   matched on the literal substring `"YOLO detections"` (which cortex's
   prompt doesn't contain) and prepended a second bbox table ahead of
   the caller's prompt. Empirical A/B on qwen3-vl-flash with the
   prod Audi frame: 3/3 trials emit `make: "Audi"` with the clean
   prompt, but only 1/3 trials with the duplicate-context prefix.

Fix: gate both behaviors on `req.response_format is None`. Schema
callers get the wire response back unchanged plus their prompt verbatim;
the legacy free-text default scene_prompt path keeps the string-to-dict
fallback and the YOLO hint prepend (verified by the existing tests, and
by a new `test_crop_describe_default_prompt_still_gets_yolo_context`).

Tests:
- `test_normalize_entities_schema_mode_preserves_empty_make` — schema
  mode must not inject `brand` or rewrite empty `make`.
- `test_normalize_entities_schema_mode_drops_extra_keys_in_passthrough`
  — schema mode does not iterate `animals` (or any other key).
- `test_crop_describe_schema_mode_skips_yolo_context_prepend` —
  scene_prompt reaches the engine byte-identical.
- `test_crop_describe_default_prompt_still_gets_yolo_context` — the
  legacy callers keep their auto-hint behavior.

419 passed, 7 skipped (no regressions).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Liuhaai Liuhaai merged commit 24315ef into main May 19, 2026
7 checks passed
@Liuhaai Liuhaai deleted the fix-inference-schema-mode branch May 19, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant