Summary
Three @Json annotations in GeminiLlmClient.kt serialize to field names the Gemini API does not recognize. The most critical: inline_data and mime_type should be inlineData and mimeType (Gemini uses camelCase). This means the screenshot is silently dropped and the model makes UI decisions blind.
Wrong fields
| Line |
Code has |
API expects |
Impact |
| 335 |
@Json(name = "inline_data") |
inlineData |
Screenshot silently dropped |
| 340 |
@Json(name = "mime_type") |
mimeType |
Part of same broken image payload |
| 347 |
@Json(name = "responseJsonSchema") |
responseSchema |
JSON schema enforcement disabled |
What happens at runtime
- The serialized JSON sends
"inline_data": {...} but Gemini expects "inlineData": {...}. Gemini ignores unrecognized fields, so the image payload is discarded entirely. The model decides actions based on text alone.
responseMimeType: "application/json" is correctly named, so the model still returns JSON — but without schema enforcement, it can return any shape, causing intermittent SchemaViolation exceptions.
Additional: system prompt placement
The system prompt is sent as a text part inside a "user" role message (lines 130-132) instead of using the API's dedicated top-level systemInstruction field. This works but degrades instruction-following quality.
Verified against
Google's API discovery document at generativelanguage.googleapis.com/$discovery/rest?version=v1beta.
Files
action-executor/src/main/java/com/scroller/agent/executor/GeminiLlmClient.kt (lines 320-358)
Summary
Three
@Jsonannotations inGeminiLlmClient.ktserialize to field names the Gemini API does not recognize. The most critical:inline_dataandmime_typeshould beinlineDataandmimeType(Gemini uses camelCase). This means the screenshot is silently dropped and the model makes UI decisions blind.Wrong fields
@Json(name = "inline_data")inlineData@Json(name = "mime_type")mimeType@Json(name = "responseJsonSchema")responseSchemaWhat happens at runtime
"inline_data": {...}but Gemini expects"inlineData": {...}. Gemini ignores unrecognized fields, so the image payload is discarded entirely. The model decides actions based on text alone.responseMimeType: "application/json"is correctly named, so the model still returns JSON — but without schema enforcement, it can return any shape, causing intermittentSchemaViolationexceptions.Additional: system prompt placement
The system prompt is sent as a text part inside a
"user"role message (lines 130-132) instead of using the API's dedicated top-levelsystemInstructionfield. This works but degrades instruction-following quality.Verified against
Google's API discovery document at
generativelanguage.googleapis.com/$discovery/rest?version=v1beta.Files
action-executor/src/main/java/com/scroller/agent/executor/GeminiLlmClient.kt(lines 320-358)