Add local model support via Ollama by lmassaron · Pull Request #44 · huggingface/ml-intern

lmassaron · 2026-04-21T22:34:08Z

Summary

This PR adds support for running the agent with local models via Ollama. It enables users to use the intern without incurring API costs or requiring external API keys for inference.

Key Changes

Ollama Integration: Integrated local model support using Ollama's OpenAI-compatible endpoint for improved
tool-calling reliability.
Readiness Checks: The agent now automatically verifies if ollama serve is running and if the requested model is
available locally.
Interactive Model Management:
Added /add-model <id> command to dynamically add new local models during a session.
Added an interactive prompt to pull missing models directly from the CLI.
Optional Authentication: Modified initialization to make the HF_TOKEN optional when using local models (prefixed
with ollama/).
Robust Initialization: Improved the startup loop to handle local environment failures without hanging.

How to test

Ensure Ollama is running (ollama serve).
Run the agent: ml-intern.
Switch to a local model: /model ollama/qwen3.6 (or use /add-model).

lmassaron · 2026-04-24T07:29:08Z

Thank you @statikkkkk for the review. I have addressed the security concerns and the documentation feedback. I also took the opportunity to reapply these features on top of the latest upstream main to align with the recent architectural refactoring.

SSRF Mitigation & Security
I have implemented a dual-layered defense to ensure local inference remains strictly local and cannot be abused
in hosted environments:

Hostname Validation: In agent/core/llm_params.py, I added strict validation for OLLAMA_API_BASE. It is now
restricted exclusively to localhost or 127.0.0.1. Any other hostname will raise a ValueError, preventing the
agent from being used to probe internal network services.
Backend Guard: In backend/routes/agent.py, I added a security guard to the /api/config/model endpoint. It explicitly rejects any model ID starting with ollama/. This makes Ollama support a CLI-only feature,
ensuring it cannot be reached by remote users via the web API.

Context Window Documentation

Default Fallback Note: Added a documentation note in agent/core/session.py clarifying that while
ollama/qwen3.6 has a specific 128k override, all other Ollama models fall back to the project's standard
200k default context window.

Alignment with Latest Refactor

Architecture Update: I have adapted the Ollama logic to the new upstream structure. This includes
integrating readiness checks into the new agent/core/model_switcher.py module and updating the asynchronous
interactive loop in agent/main.py.
Feature Persistence: Verified that /add-model, automated model pulling, and server reachability checks are
all fully functional within the new async architecture.

Verification Results
I have performed a full regression test suite:

Headless/CLI: Confirmed local reasoning and tool-calling with ollama/qwen3.6.
Security: Verified via script that the backend API correctly rejects local model requests with a 400 Bad Request.
Robustness: Confirmed that the agent handles unreachable Ollama servers gracefully with a clear error
messages.

The branch has been force-pushed with a clean, rebased history.

akseljoonas · 2026-04-24T14:08:23Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:38Z

Claude finished @akseljoonas's task in 2m 24s —— View job

PR Review: Add local model support via Ollama

Post initial comment
Read CLAUDE.md and repo structure
Read all changed files
Analyze security, correctness, and routing impact
Post final review

Verdict: NEEDS CHANGES — The SSRF mitigation in _resolve_llm_params is solid, but the backend guard has a meaningful gap, there are two correctness bugs in ollama_utils.py, and blocking I/O in an async function will stall the event loop during model pulls.

Security

[MEDIUM] Backend guard is incomplete — ollama/ models are still reachable via session creation

ollama/qwen3.6 is listed in AVAILABLE_MODELS (backend/routes/agent.py:68-74), so the valid_ids set in POST /api/session and POST /api/session/{id}/model includes it. The ollama/ rejection guard was added only to POST /api/config/model (line 234). A web user can bypass it:

# POST /api/session  — no ollama guard, ollama/qwen3.6 is in valid_ids
valid_ids = {m["id"] for m in AVAILABLE_MODELS}   # ← includes ollama/qwen3.6
if model and model not in valid_ids:              # passes if model == "ollama/qwen3.6"
    raise HTTPException(...)

Same gap in POST /api/session/{session_id}/model (lines 444-447). Either remove ollama/qwen3.6 from AVAILABLE_MODELS entirely (the cleanest fix, since it's CLI-only by design), or add the same guard to both session endpoints. Fix this →

[LOW] ollama_utils.py doesn't validate OLLAMA_API_BASE hostname

get_ollama_base_url() reads the env var without checking it's localhost. This means is_ollama_running(), is_model_available(), and pull_ollama_model() can all send requests to an arbitrary URL before _resolve_llm_params ever gets to apply its validation. In a pure CLI context the user controls their env, but it's inconsistent with the protection added in llm_params.py. Fix this →

Correctness

[HIGH] Blocking I/O in async pull_ollama_model will freeze the event loop

pull_ollama_model is async def but uses synchronous requests.post() with streaming (for line in response.iter_lines()). This blocks the event loop for the entire duration of a model pull (potentially minutes for large models). is_ollama_running() and is_model_available() have the same issue — they're called from ensure_ollama_readiness which is awaited directly. These should use asyncio.to_thread or be replaced with httpx.AsyncClient. Fix this →

[MEDIUM] None-divide crash in progress display (ollama_utils.py:58-65)

completed = data.get("completed")   # can be None
total = data.get("total")
if status != last_status:
    if total:
        percent = (completed / total) * 100   # ← TypeError if completed is None

Ollama emits total without completed in some status messages (e.g. manifest download). Guard it: if total and completed is not None. Fix this →

[LOW] Unrendered Rich markup in ensure_ollama_readiness (ollama_utils.py:82-83)

print(f"\n[bold red]Error:[/bold red] Ollama server is not reachable.")

print() doesn't process Rich markup — this will render literally as [bold red]Error:[/bold red] in the terminal. Use get_console().print(...) from terminal_display or strip the markup. Fix this →

LiteLLM / Bedrock routing impact

No regressions. The ollama/ branch is inserted first in _resolve_llm_params and has a clean prefix that doesn't collide with bedrock/, anthropic/, openai/, or the HF fallback. The model is forwarded as openai/<name> with a custom api_base, which is the standard LiteLLM pattern. Bedrock routing is untouched.

The _MAX_TOKENS_MAP in session.py is checked before litellm.get_model_info, so the 128k override for qwen3.6 works correctly without interfering with the litellm catalog for other models.

Minor nits

headless_main imports PromptSession from prompt_toolkit inside the function (main.py:1107) but never uses it — the headless Ollama check is non-interactive. Dead import.
/add-model always prefixes ollama/ (main.py:765) — intentional, but the help text (/add-model <id>) doesn't mention this. Worth documenting.

Summary

The core feature is well-structured and the SSRF concern from the first review was thoughtfully addressed in _resolve_llm_params. The two blocking issues for merge are: (1) the incomplete backend guard that lets web users set an Ollama model via session creation, and (2) the blocking I/O in pull_ollama_model that will lock the async event loop. The completed is None divide-by-zero and the unrendered Rich markup are smaller but will produce observable bugs in practice.

lmassaron · 2026-04-24T16:55:36Z

I have addressed all the security and correctness concerns in the latest
commits:

Robust Backend Guarding

Listing Removal: I removed ollama/qwen3.6 from AVAILABLE_MODELS in the backend route entirely.
Entry-Point Guards: Added explicit startswith('ollama/') checks to POST /api/session, POST/api/session/restore-summary, and POST /api/session/{session_id}/model. These local models are now strictly unreachable via the web API.

Async I/O & Correctness

Non-blocking Network: Refactored ollama_utils.py to use httpx.AsyncClient. All Ollama interactions (status
checks, tag listing, and streaming pulls) are now fully asynchronous and won't block the event loop.
Progress Display Fix: Added a guard for None values in the pull progress logic to prevent TypeError
divide-by-zero crashes.
Rich Rendering: Switched to get_console().print() in the utility module so that Rich formatting is correctly
rendered in the terminal.

Security & Consistency

Hostname Validation: Added hostname checks to get_ollama_base_url() consistent with _resolve_llm_params, restricting it to loopback addresses.
CLI Consistency: Fixed a bug where the --model flag was ignored in interactive mode. It now correctly overrides
the default model for both headless and interactive use.
Documentation: Updated the /help command text to explicitly mention the ollama/ prefix behavior.

I have verified these fixes with a regression test suite covering headless execution, interactive mode overrides,
and robust backend guard testing via curl. The branch has been force-pushed with the updated history.

lmassaron force-pushed the ollama-local branch 2 times, most recently from 0efa588 to 91944bc Compare April 22, 2026 14:55

andrejvysny mentioned this pull request Apr 23, 2026

Draft: Add new providers via adapters #60

Draft

lmassaron force-pushed the ollama-local branch from 3eefcc5 to cef45c2 Compare April 24, 2026 06:44

lmassaron force-pushed the ollama-local branch 3 times, most recently from a21980d to 48e4bb9 Compare April 27, 2026 15:37

feat: add local model support via Ollama with SSRF and backend security

70d4072

lmassaron force-pushed the ollama-local branch from 48e4bb9 to 70d4072 Compare April 27, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local model support via Ollama#44

Add local model support via Ollama#44
lmassaron wants to merge 1 commit intohuggingface:mainfrom
lmassaron:ollama-local

lmassaron commented Apr 21, 2026

Uh oh!

lmassaron commented Apr 24, 2026 •

edited

Loading

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

lmassaron commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmassaron commented Apr 21, 2026

Summary

Key Changes

How to test

Uh oh!

lmassaron commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add local model support via Ollama

Security

Correctness

LiteLLM / Bedrock routing impact

Minor nits

Summary

Uh oh!

lmassaron commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmassaron commented Apr 24, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading

lmassaron commented Apr 24, 2026 •

edited

Loading