Pi: emit "input": ["text", "image"] for vision-capable models#2128
Pi: emit "input": ["text", "image"] for vision-capable models#2128davanstrien wants to merge 4 commits into
Conversation
When the model has pipeline_tag === "image-text-to-text", the Pi "Use this model" snippet now writes the required input field into the generated ~/.pi/agent/models.json, so users get a working config without having to read the docs to discover the flag. Pre-existing behavior is unchanged for text-only models. Cross-ref: huggingface/hub-docs#2408 (and @gary149's review suggestion there to surface this in the snippet directly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Cursor Bugbot review: a regression dropping "text" from ["text", "image"] would have gone undetected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| }; | ||
|
|
||
| // Step 2: Pi config — port and provider name differ | ||
| const modelEntry: Record<string, unknown> = { id: isMLX ? model.id : modelName }; |
There was a problem hiding this comment.
Missing --jinja flag for vision models in server command
High Severity
The test at line 152 asserts that the llama-server command for vision models includes --jinja, but the implementation on line 481 of local-apps.ts never conditionally appends this flag. The isVision variable is computed but only used for the modelEntry.input field — it's never used to modify the server command. This means either the test will fail, or vision models won't get the --jinja flag they need for proper Jinja template handling with llama.cpp.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 671b44f. Configure here.
There was a problem hiding this comment.
@cursoragent IIUC jinja is default in llamacpp now?
There was a problem hiding this comment.
Unable to authenticate your request. Please make sure to connect your GitHub account to Cursor. Go to Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit bf73c01. Configure here.
| }; | ||
| const snippet = snippetFunc(model); | ||
|
|
||
| expect(snippet[0].content).toContain(`llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:{{QUANT_TAG}} --jinja`); |
There was a problem hiding this comment.
Test expects --jinja flag that implementation never produces
High Severity
The new vision test asserts that snippet[0].content contains --jinja in the llama-server command, but the snippetPi implementation at line 481 builds the content as `llama-server -hf ${model.id}${getQuantTag(filepath)}` with no --jinja flag appended — and no other code path adds it. The --jinja string appears nowhere in local-apps.ts. This test will fail at runtime. Given the PR discussion confirming jinja is now default in llama.cpp, the test expectation is likely stale and the assertion needs to drop --jinja.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit bf73c01. Configure here.


Summary
pipeline_tag === "image-text-to-text", the Pi "Use this model" snippet now writes"input": ["text", "image"]into the generated~/.pi/agent/models.json. Without this field, Pi can't pass images to the model — so users had to discover the field manually from the docs to get vision working.pi - visiontest added).Test plan
pnpm testinpackages/tasks(17 tests pass, including newpi - vision)pnpm check,pnpm lint:check,pnpm format:checkall cleaninputfield correctly nested🤖 Generated with Claude Code
Note
Low Risk
Low risk: only adjusts Pi snippet JSON generation for models flagged as vision-capable and adds a focused unit test; no auth, persistence, or runtime execution paths change beyond emitted config text.
Overview
Updates the Pi “Use this model” snippet to include
"input": ["text", "image"]in the generatedmodels.jsonentry when the selected model haspipeline_tag === "image-text-to-text", enabling image inputs for vision-capable models.Adds a new
pi - visiontest to assert theinputfield is emitted (and preserves existing behavior for text-only and MLX-backed Pi snippets).Reviewed by Cursor Bugbot for commit bf73c01. Bugbot is set up for automated code reviews on this repo. Configure here.