Pi: emit "input": ["text", "image"] for vision-capable models by davanstrien · Pull Request #2128 · huggingface/huggingface.js

davanstrien · 2026-04-27T10:00:51Z

Summary

For models with pipeline_tag === "image-text-to-text", the Pi "Use this model" snippet now writes "input": ["text", "image"] into the generated ~/.pi/agent/models.json. Without this field, Pi can't pass images to the model — so users had to discover the field manually from the docs to get vision working.
Behavior for text-only models is unchanged (existing test extended; new pi - vision test added).
Cross-ref: Add vision support note to Pi + llama.cpp guide hub-docs#2408 — the documentation PR that explains the same field. @gary149's review there suggested surfacing it directly in the snippet.

Test plan

pnpm test in packages/tasks (17 tests pass, including new pi - vision)
pnpm check, pnpm lint:check, pnpm format:check all clean
Verified generated JSON output via local script: text models produce identical output to before; vision models gain the input field correctly nested

🤖 Generated with Claude Code

Note

Low Risk
Low risk: only adjusts Pi snippet JSON generation for models flagged as vision-capable and adds a focused unit test; no auth, persistence, or runtime execution paths change beyond emitted config text.

Overview
Updates the Pi “Use this model” snippet to include "input": ["text", "image"] in the generated models.json entry when the selected model has pipeline_tag === "image-text-to-text", enabling image inputs for vision-capable models.

Adds a new pi - vision test to assert the input field is emitted (and preserves existing behavior for text-only and MLX-backed Pi snippets).

^{Reviewed by Cursor Bugbot for commit bf73c01. Bugbot is set up for automated code reviews on this repo. Configure here.}

@gary149

When the model has pipeline_tag === "image-text-to-text", the Pi "Use this model" snippet now writes the required input field into the generated ~/.pi/agent/models.json, so users get a working config without having to read the docs to discover the flag. Pre-existing behavior is unchanged for text-only models. Cross-ref: huggingface/hub-docs#2408 (and @gary149's review suggestion there to surface this in the snippet directly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per Cursor Bugbot review: a regression dropping "text" from ["text", "image"] would have gone undetected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

julien-c

lgtm

cursor · 2026-05-01T13:00:40Z

 			};

 	// Step 2: Pi config — port and provider name differ
+	const modelEntry: Record<string, unknown> = { id: isMLX ? model.id : modelName };


Missing --jinja flag for vision models in server command

High Severity

The test at line 152 asserts that the llama-server command for vision models includes --jinja, but the implementation on line 481 of local-apps.ts never conditionally appends this flag. The isVision variable is computed but only used for the modelEntry.input field — it's never used to modify the server command. This means either the test will fail, or vision models won't get the --jinja flag they need for proper Jinja template handling with llama.cpp.

Additional Locations (1)

packages/tasks/src/local-apps.spec.ts#L151-L152

^{Reviewed by Cursor Bugbot for commit 671b44f. Configure here.}

@cursoragent IIUC jinja is default in llamacpp now?

Unable to authenticate your request. Please make sure to connect your GitHub account to Cursor. Go to Cursor

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit bf73c01. Configure here.}

cursor · 2026-05-06T12:18:59Z

+		};
+		const snippet = snippetFunc(model);
+
+		expect(snippet[0].content).toContain(`llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:{{QUANT_TAG}} --jinja`);


Test expects --jinja flag that implementation never produces

High Severity

The new vision test asserts that snippet[0].content contains --jinja in the llama-server command, but the snippetPi implementation at line 481 builds the content as `llama-server -hf ${model.id}${getQuantTag(filepath)}` with no --jinja flag appended — and no other code path adds it. The --jinja string appears nowhere in local-apps.ts. This test will fail at runtime. Given the PR discussion confirming jinja is now default in llama.cpp, the test expectation is likely stale and the assertion needs to drop --jinja.

Additional Locations (1)

packages/tasks/src/local-apps.ts#L480-L481

^{Reviewed by Cursor Bugbot for commit bf73c01. Configure here.}

davanstrien requested review from SBrandeis, Wauplin, gary149, julien-c, ngxson and pcuenca as code owners April 27, 2026 10:00

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread packages/tasks/src/local-apps.spec.ts

Pi vision test: also assert "text" in input array

c4a9eb0

Per Cursor Bugbot review: a regression dropping "text" from ["text", "image"] would have gone undetected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

julien-c approved these changes Apr 30, 2026

View reviewed changes

Merge branch 'main' into pi-snippet-vision-input

671b44f

cursor Bot reviewed May 1, 2026

View reviewed changes

gary149 approved these changes May 5, 2026

View reviewed changes

Merge branch 'main' into pi-snippet-vision-input

bf73c01

cursor Bot reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pi: emit "input": ["text", "image"] for vision-capable models#2128

Pi: emit "input": ["text", "image"] for vision-capable models#2128
davanstrien wants to merge 4 commits into
mainfrom
pi-snippet-vision-input

davanstrien commented Apr 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

julien-c left a comment

Uh oh!

cursor Bot May 1, 2026

Uh oh!

davanstrien May 1, 2026

Uh oh!

cursor Bot May 1, 2026

Uh oh!

julien-c May 1, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

davanstrien commented Apr 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 1, 2026

Choose a reason for hiding this comment

Missing --jinja flag for vision models in server command

Uh oh!

davanstrien May 1, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

julien-c May 1, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 6, 2026

Choose a reason for hiding this comment

Test expects --jinja flag that implementation never produces

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davanstrien commented Apr 27, 2026 •

edited by cursor Bot

Loading

Missing `--jinja` flag for vision models in server command

Test expects `--jinja` flag that implementation never produces