Support Gemini Image models in RubyLLM.paint#750
Conversation
The Gemini provider's image generation was hardcoded to the Imagen :predict endpoint, leaving the Gemini Image family (Nano Banana et al.) unreachable: RubyLLM.paint with gemini-2.5-flash-image, gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, or nano-banana-pro-preview raised "is not supported for predict" even though those models are listed in the registry with image output. Branch internally on imagen?(model). Imagen keeps its existing :predict/instances payload and predictions[].bytesBase64Encoded parsing unchanged. Everything else routes through :generateContent with contents/parts and parses candidates[].content.parts[].inlineData, matching the protocol Gemini chat already speaks. The fallthrough also covers nano-banana-pro-preview, which doesn't share the gemini- prefix. Image-to-image editing via with: is supported on the Gemini Image branch by reusing Gemini::Media#format_attachment to build inline_data parts. validate_paint_inputs! is overridden as a no-op so the base class's blanket attachment rejection doesn't fire; the model-aware checks (mask: rejection, Imagen-with-with: rejection) live in render_image_payload after @model is assigned. size: is translated through SIZE_TO_ASPECT_RATIO for the common DALL-E sizes; unknown sizes default to 1:1 with a debug log. Users override via params:, which deep-merges into the payload so nested generationConfig blocks aren't clobbered. Tests cover both the original Nano Banana (gemini-2.5-flash-image, paint + edit) and Nano Banana 2 (gemini-3.1-flash-image-preview, the exact model from the bug report). Imagen, OpenAI, and OpenRouter image tests pass unchanged.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #750 +/- ##
==========================================
+ Coverage 82.10% 82.18% +0.07%
==========================================
Files 137 137
Lines 6344 6381 +37
Branches 1122 1133 +11
==========================================
+ Hits 5209 5244 +35
Misses 684 684
- Partials 451 453 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
I'll implement the missing test cases. |
Codecov reported four uncovered lines in the Gemini Images branching: the Imagen `with:` rejection, the Imagen response-shape guard, the unknown-attachment-type rejection on the Gemini Image branch, and the unmapped-size default. None of these paths are reachable from the existing VCR-backed integration specs (Imagen rejects `with:` before hitting the wire; Gemini Image cassettes only exercise PNG inputs and the supported `1024x1024` size). Add a focused unit spec at spec/ruby_llm/providers/gemini/images_spec.rb that extends a bare object with Gemini::Media + Gemini::Images (same pattern used in chat_spec.rb) and exercises each branch directly with stubbed attachments and Faraday::Response doubles. No new cassettes needed. Brings lib/ruby_llm/providers/gemini/images.rb to 100% line coverage and lifts branch coverage from 66.67% to 79.17%.
|
FWIW: When using # https://ai.google.dev/gemini-api/docs/image-generation#go_3
RubyLLM::Providers::Gemini::Images.send(:remove_const, :SIZE_TO_ASPECT_RATIO)
RubyLLM::Providers::Gemini::Images.const_set(:SIZE_TO_ASPECT_RATIO, {
"1:1" => "1:1",
"1:4" => "1:4",
"1:8" => "1:8",
"2:3" => "2:3",
"3:2" => "3:2",
"3:4" => "3:4",
"4:1" => "4:1",
"4:3" => "4:3",
"4:5" => "4:5",
"5:4" => "5:4",
"8:1" => "8:1",
"9:16" => "9:16",
"16:9" => "16:9",
"21:9" => "21:9"
}.freeze) |
The previous SIZE_TO_ASPECT_RATIO map only handled five DALL-E-style WxH strings, forcing users who want aspect ratios outside that set (2:3, 4:5, 21:9, etc.) to monkey-patch the constant. aspect_ratio_for now passes any X:Y string straight through to Gemini's aspectRatio field. The WxH map stays as a convenience for the default '1024x1024' that Image.paint sends, and unknown shapes still default to '1:1' with a debug log. Avoids hardcoding the full list of supported ratios — Gemini's documented set can grow without code changes here. Reference: https://ai.google.dev/gemini-api/docs/image-generation
|
Thanks @nathancolgate — great catch, fair feedback that the WxH-only map was too restrictive. Pushed 084bf50: This avoids hardcoding the supported set so any future ratio Google adds works without a code change. |
…ha-0db974 # Conflicts: # spec/ruby_llm/protocols/gemini/images_spec.rb
Closes #473.
What this does
Fixes
RubyLLM.paintfor the Gemini Image model family (gemini-2.5-flash-image,gemini-3.1-flash-image-preview,gemini-3-pro-image-preview,nano-banana-pro-preview), which was hardcoded to the Imagen:predictprotocol and unreachable.The Gemini provider now branches on
imagen?(model)::predictwithinstances/parameters(byte-for-byte unchanged).:generateContentwithcontents/partsand parsescandidates[].content.parts[].inlineData, the same protocol Gemini chat uses.Improvements over the previous Gemini image generation
with:) — new capability. Before this PR the Gemini provider had no support for image references (with:was an unused method arg, rejected by the basevalidate_paint_inputs!). The Gemini Image branch now accepts one or more local files / URLs /Attachmentinstances viawith:, reusingGemini::Media#format_attachmentto buildinline_dataparts. Imagen still rejectswith:.size:is meaningful again on the Gemini Image branch. A small map translates the common DALL-E sizes (1024x1024,1792x1024,1024x1792,1408x1024,1024x1408) to GeminiaspectRatio. Unknown sizes default to1:1with a debug log. Imagen continues to ignoresize:.params:deep-merges into the payload so users can override any nestedgenerationConfig/imageConfigfield without clobbering the rest.usageMetadatafrom Gemini Image responses is passed through toImage#usage.The public signature
RubyLLM.paint(prompt, model:, with:, size:, params:)is unchanged.Reproduction (before this PR):
Type of change
Scope check
Quality check
overcommit --installand all hooks passbundle exec rake vcr:record[gemini]bundle exec rspecmodels.json,aliases.json)Tests added
Integration (VCR-backed, via
IMAGE_GENERATION_MODELS):gemini-2.5-flash-imagepaint + image edit withwith:gemini-3.1-flash-image-previewpaint (the exact model from the bug report)Unit-level (
spec/ruby_llm/providers/gemini/images_spec.rb, no network):with:raisesUnsupportedAttachmentErrorbytesBase64EncodedraisesRubyLLM::Error:unknownattachment type raisesUnsupportedAttachmentErrorsize:defaultsaspectRatioto1:1Coverage:
lib/ruby_llm/providers/gemini/images.rbat 100% line / 79% branch.AI-generated code
API changes
Out of scope
images.rbat all and needs OAuth — separate PR.:streamGenerateContent).pricing.images;Image#total_costfalls back tooutput_price_per_million).