Skip to content

vision: [Attached image: <desc>] marker missing on user-role msg for text-only personas #970

@joelteply

Description

@joelteply

Symptom

Text-only personas (e.g. CodeReview AI without a vision model) hallucinate "absence of images or attachments" when an image IS attached to the user's message. Empirical hit on PR #950 / Linux/CUDA at HEAD 056978c (Anvil flagged 2026-04-25 04:03Z).

Root cause (two-stage)

Stage 1 — TS side, PersonaResponseGenerator.ts:380-391

let description: string | undefined;
if (m.type === 'image') {
  try {
    const visionSvc = VisionDescriptionService.getInstance();
    if (visionSvc.descriptionStatus(base64) === 'cached') {
      const desc = await visionSvc.describeBase64(base64, m.mimeType ?? 'image/png', { maxLength: 200 });
      description = desc?.description;
    }
  } catch {
    // Best-effort; drop to undefined on any cache error
  }
}

description is only populated when VDS reports 'cached'. On the first message with a fresh image (cache cold, pre-warm in-flight), status is 'inflight' not 'cached', so description stays undefined. The signal payload then carries { itemType, base64, mimeType, description: undefined } to Rust.

Stage 2 — Rust side, cognition::respond / signal → ContentPart conversion

When the resolved persona model is text-only AND signal.media[i].description is undefined, the image is silently dropped from the user-role message. Result: the model sees the message as if no image existed, and CONFIDENTLY narrates "I don't see any attachment" — fail-silent fallback, the exact pattern memory_two_ironclad_rules calls out as illegal.

Proposed fix

Stage 1 (TS)

Replace the cached-only check with a bounded await — VDS already deduplicates in-flight requests, so the wait is short for already-pre-warmed images:

if (m.type === 'image') {
  try {
    const visionSvc = VisionDescriptionService.getInstance();
    const status = visionSvc.descriptionStatus(base64);
    if (status === 'cached' || status === 'inflight') {
      // Bounded wait — pre-warm started at chat-send, usually ready by now.
      // 8s caps worst-case for a fresh first-image scenario.
      const desc = await Promise.race([
        visionSvc.describeBase64(base64, m.mimeType ?? 'image/png', { maxLength: 200 }),
        new Promise<null>((resolve) => setTimeout(() => resolve(null), 8000)),
      ]);
      description = desc?.description;
    }
  } catch {
    // Best-effort; drop to undefined on any cache error
  }
}

Stage 2 (Rust)

When converting signal.media[i] to ContentPart for a text-only model:

  • If description is Some(d) → inject [Attached image: {d}] as a text part on the user-role message.
  • If description is None → inject [Attached image: vision description unavailable — {mime}, {len} bytes] (FAIL LOUD per memory_two_ironclad_rules; never silently drop).

This makes the persona either see the image (vision-capable), see the description (text-only with VDS), or know an image was attached (text-only without VDS) — three deterministic outcomes, zero silent drops.

Acceptance

  • Empirical: send an image to a text-only persona on the first message after a fresh start. Persona MUST acknowledge the image (either via description or via "image attached but I cannot describe it"). Persona MUST NOT say "I don't see any attachment."
  • Telemetry: a counter for vds_description_unavailable_marker_emitted so we can see how often the fallback marker fires vs the real description.

Files

  • src/system/user/server/modules/PersonaResponseGenerator.ts:380-391 — stage 1 TS fix.
  • workers/continuum-core/src/persona/respond.rs (or wherever signal.media → ContentPart conversion lives) — stage 2 Rust fix.

Severity

Persona-correctness regression. Not a #950-introduced regression — pre-existing per Anvil's triage. Filing as follow-up so it lands cleanly post-merge.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions