Skip to content

Gemini API field names are wrong — screenshots never reach the model #1

@rogue-socket

Description

@rogue-socket

Summary

Three @Json annotations in GeminiLlmClient.kt serialize to field names the Gemini API does not recognize. The most critical: inline_data and mime_type should be inlineData and mimeType (Gemini uses camelCase). This means the screenshot is silently dropped and the model makes UI decisions blind.

Wrong fields

Line Code has API expects Impact
335 @Json(name = "inline_data") inlineData Screenshot silently dropped
340 @Json(name = "mime_type") mimeType Part of same broken image payload
347 @Json(name = "responseJsonSchema") responseSchema JSON schema enforcement disabled

What happens at runtime

  1. The serialized JSON sends "inline_data": {...} but Gemini expects "inlineData": {...}. Gemini ignores unrecognized fields, so the image payload is discarded entirely. The model decides actions based on text alone.
  2. responseMimeType: "application/json" is correctly named, so the model still returns JSON — but without schema enforcement, it can return any shape, causing intermittent SchemaViolation exceptions.

Additional: system prompt placement

The system prompt is sent as a text part inside a "user" role message (lines 130-132) instead of using the API's dedicated top-level systemInstruction field. This works but degrades instruction-following quality.

Verified against

Google's API discovery document at generativelanguage.googleapis.com/$discovery/rest?version=v1beta.

Files

  • action-executor/src/main/java/com/scroller/agent/executor/GeminiLlmClient.kt (lines 320-358)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions