fix(inference): stop polluting schema-mode crop-describe responses#71
Merged
Conversation
When cortex calls /crop-describe with a json_schema response_format,
trio-core was silently mutating the wire response in two ways that
caused production VLM outputs to lose attributes the model had actually
identified:
1. **`_normalize_entity_item` injected schema-forbidden fields.** For
every vehicle entity the normalizer ran
`setdefault("brand", label)` even when the schema didn't declare a
`brand` key, and `label` fell back to `description` (empty) when
`make` was empty. Result on prod: model returned
`{"id":"nv0","type":"suv","make":""}`; cortex received
`{"id":"nv0","type":"suv","make":"","brand":""}`. The strict
json_schema contract was bypassed post-VLM.
2. **YOLO-context auto-prepend duplicated the DETECTIONS hint.** Cortex
already enumerates per-id detections in its scene_prompt. trio-core
matched on the literal substring `"YOLO detections"` (which cortex's
prompt doesn't contain) and prepended a second bbox table ahead of
the caller's prompt. Empirical A/B on qwen3-vl-flash with the
prod Audi frame: 3/3 trials emit `make: "Audi"` with the clean
prompt, but only 1/3 trials with the duplicate-context prefix.
Fix: gate both behaviors on `req.response_format is None`. Schema
callers get the wire response back unchanged plus their prompt verbatim;
the legacy free-text default scene_prompt path keeps the string-to-dict
fallback and the YOLO hint prepend (verified by the existing tests, and
by a new `test_crop_describe_default_prompt_still_gets_yolo_context`).
Tests:
- `test_normalize_entities_schema_mode_preserves_empty_make` — schema
mode must not inject `brand` or rewrite empty `make`.
- `test_normalize_entities_schema_mode_drops_extra_keys_in_passthrough`
— schema mode does not iterate `animals` (or any other key).
- `test_crop_describe_schema_mode_skips_yolo_context_prepend` —
scene_prompt reaches the engine byte-identical.
- `test_crop_describe_default_prompt_still_gets_yolo_context` — the
legacy callers keep their auto-hint behavior.
419 passed, 7 skipped (no regressions).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/api/inference/crop-describewas silently mutating the wire response when callers passed aresponse_format=json_schema. Two bugs combined to make cortex's vehiclemakefield empty even when the VLM clearly recognized the make in its prose summary.The bugs
1.
_normalize_entity_iteminjected schema-forbidden fieldsFor every vehicle entity the normalizer ran:
When the model emitted
{"id":"nv0","type":"suv","make":""}:norm.get("make")returns""(falsy) → falls through todescription→""setdefault("brand", "")adds abrandkey the schema never declaredObserved in prod (cam_1dd58540, frame
obs_74a6bc82052b): the VLM summary said "A blue Audi SUV is parked near a payment kiosk…" yet the structuredvehicles[]arrived at cortex as{"id":"nv0","type":"suv","make":"","brand":""}. The schema'sadditionalProperties: Falsecontract was bypassed post-VLM.2. Duplicate YOLO-context auto-prepend
Cortex already enumerates per-id detections in its
scene_prompt(DETECTIONS:\n id=nv0 vehicle bbox=...). trio-core matched on the literal substring"YOLO detections"(which cortex's prompt does not contain) and prepended a second YOLO bbox table ahead of the caller's prompt.Empirical A/B on qwen3-vl-flash with the Audi frame:
make: "Audi"in 3/3 trialsmake: "Audi"in only 1/3 trials; the other 2/3 collapsed tomake: "unknown"Fix
Gate both behaviors on
req.response_format is None:_normalize_entities(entities, schema_mode=True)returns the parsed dict unchanged — no field invention, no iteration overanimals/keys the caller's schema didn't declare.The legacy free-text default
scene_promptpath keeps the string-to-dict fallback and the YOLO hint prepend; existing tests cover that and a new test pins it down.Test plan
pytest tests/test_inference_router.py— 15 passedtest_normalize_entities_schema_mode_preserves_empty_maketest_normalize_entities_schema_mode_drops_extra_keys_in_passthroughtest_crop_describe_schema_mode_skips_yolo_context_prependtest_crop_describe_default_prompt_still_gets_yolo_context(legacy path preserved)🤖 Generated with Claude Code