Skip to content

feat(cloudflare): add vision support (Gemma 4, Llama 4 Scout, Llama 3.2 Vision)#43

Merged
stackbilt-admin merged 1 commit intomainfrom
feat/cf-vision-support
Apr 17, 2026
Merged

feat(cloudflare): add vision support (Gemma 4, Llama 4 Scout, Llama 3.2 Vision)#43
stackbilt-admin merged 1 commit intomainfrom
feat/cf-vision-support

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

  • CloudflareProvider previously had zero vision/image code — request.images sent via the CF path was silently dropped. This wires image input through to CF's OpenAI-compatible image_url content-part shape (base64 data URIs).
  • Adds three vision-capable models: @cf/google/gemma-4-26b-a4b-it (256K ctx + tools), @cf/meta/llama-4-scout-17b-16e-instruct (multimodal + tools), @cf/meta/llama-3.2-11b-vision-instruct.
  • Factory's analyzeImage() now dispatches to CF when configured; getDefaultVisionModel() falls back to Gemma 4 for CF-only deployments. Non-vision models and HTTP image URLs throw ConfigurationError with the vision-capable alternatives named.

Bumps to v1.3.0 (additive, strict semver).

Motivation

Reported by the foodfiles team — analyzeImage() routed to CF would silently drop the image. Options evaluated: call CF AI directly (strands the abstraction), fall back to Anthropic vision (works but costs per image), or properly add CF vision to the library. This PR is the proper fix.

Changes

  • src/providers/cloudflare.ts (+104/-3): supportsVision = true, three new model entries in getModelCapabilities(), attachImagesToLastUserMessage + buildImageDataUrl helpers, validation for non-vision model + HTTP URL.
  • src/factory.ts (+1): CF fallback in getDefaultVisionModel.
  • src/__tests__/cloudflare.test.ts (+93): 6 new tests — payload shape, multi-image, data-URL passthrough, HTTP-URL rejection, non-vision rejection, capability flags.
  • CHANGELOG.md, package.json: v1.3.0 entry.

Test plan

  • npm run typecheck — clean
  • npm test — 219/219 passing (14 cloudflare tests, 6 new)
  • Foodfiles integration: verify recipe extraction works end-to-end against @cf/google/gemma-4-26b-a4b-it once published

Consumer usage

factory.analyzeImage({
  image: { data: base64, mimeType: 'image/jpeg' },
  prompt: RECIPE_EXTRACTION_PROMPT,
  model: '@cf/google/gemma-4-26b-a4b-it',
});

🤖 Generated with Claude Code

…ma 3.2 Vision

CloudflareProvider previously advertised no vision capability; any
request.images sent via the CF path was silently dropped. This wires
request.images through to CF's OpenAI-compatible image_url content-part
shape (base64 data URIs) and ships three vision-capable model entries.

Factory's analyzeImage() now dispatches to CF when configured, and
getDefaultVisionModel() falls back to @cf/google/gemma-4-26b-a4b-it for
CF-only deployments. Non-vision models and HTTP image URLs throw
ConfigurationError with the vision-capable alternatives named.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stackbilt-admin stackbilt-admin merged commit bf713cd into main Apr 17, 2026
3 checks passed
@stackbilt-admin stackbilt-admin deleted the feat/cf-vision-support branch April 17, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant