feat(cloudflare): add vision support (Gemma 4, Llama 4 Scout, Llama 3.2 Vision) by stackbilt-admin · Pull Request #43 · Stackbilt-dev/llm-providers

stackbilt-admin · 2026-04-16T23:57:12Z

Summary

CloudflareProvider previously had zero vision/image code — request.images sent via the CF path was silently dropped. This wires image input through to CF's OpenAI-compatible image_url content-part shape (base64 data URIs).
Adds three vision-capable models: @cf/google/gemma-4-26b-a4b-it (256K ctx + tools), @cf/meta/llama-4-scout-17b-16e-instruct (multimodal + tools), @cf/meta/llama-3.2-11b-vision-instruct.
Factory's analyzeImage() now dispatches to CF when configured; getDefaultVisionModel() falls back to Gemma 4 for CF-only deployments. Non-vision models and HTTP image URLs throw ConfigurationError with the vision-capable alternatives named.

Bumps to v1.3.0 (additive, strict semver).

Motivation

Reported by the foodfiles team — analyzeImage() routed to CF would silently drop the image. Options evaluated: call CF AI directly (strands the abstraction), fall back to Anthropic vision (works but costs per image), or properly add CF vision to the library. This PR is the proper fix.

Changes

src/providers/cloudflare.ts (+104/-3): supportsVision = true, three new model entries in getModelCapabilities(), attachImagesToLastUserMessage + buildImageDataUrl helpers, validation for non-vision model + HTTP URL.
src/factory.ts (+1): CF fallback in getDefaultVisionModel.
src/__tests__/cloudflare.test.ts (+93): 6 new tests — payload shape, multi-image, data-URL passthrough, HTTP-URL rejection, non-vision rejection, capability flags.
CHANGELOG.md, package.json: v1.3.0 entry.

Test plan

npm run typecheck — clean
npm test — 219/219 passing (14 cloudflare tests, 6 new)
Foodfiles integration: verify recipe extraction works end-to-end against @cf/google/gemma-4-26b-a4b-it once published

Consumer usage

factory.analyzeImage({
  image: { data: base64, mimeType: 'image/jpeg' },
  prompt: RECIPE_EXTRACTION_PROMPT,
  model: '@cf/google/gemma-4-26b-a4b-it',
});

🤖 Generated with Claude Code

…ma 3.2 Vision CloudflareProvider previously advertised no vision capability; any request.images sent via the CF path was silently dropped. This wires request.images through to CF's OpenAI-compatible image_url content-part shape (base64 data URIs) and ships three vision-capable model entries. Factory's analyzeImage() now dispatches to CF when configured, and getDefaultVisionModel() falls back to @cf/google/gemma-4-26b-a4b-it for CF-only deployments. Non-vision models and HTTP image URLs throw ConfigurationError with the vision-capable alternatives named. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stackbilt-admin merged commit bf713cd into main Apr 17, 2026
3 checks passed

stackbilt-admin deleted the feat/cf-vision-support branch April 17, 2026 00:04

stackbilt-admin mentioned this pull request Apr 17, 2026

fix: retire claude-3-haiku-20240307 and gpt-4o (closes #44) #45

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cloudflare): add vision support (Gemma 4, Llama 4 Scout, Llama 3.2 Vision)#43

feat(cloudflare): add vision support (Gemma 4, Llama 4 Scout, Llama 3.2 Vision)#43
stackbilt-admin merged 1 commit intomainfrom
feat/cf-vision-support

stackbilt-admin commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stackbilt-admin commented Apr 16, 2026

Summary

Motivation

Changes

Test plan

Consumer usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant