Skip to content

Add local model support for OpenAI-compatible endpoints#99

Open
bhargav1000 wants to merge 3 commits intohuggingface:mainfrom
bhargav1000:add-local-model-support
Open

Add local model support for OpenAI-compatible endpoints#99
bhargav1000 wants to merge 3 commits intohuggingface:mainfrom
bhargav1000:add-local-model-support

Conversation

@bhargav1000
Copy link
Copy Markdown

@bhargav1000 bhargav1000 commented Apr 24, 2026

Summary

Adds local model support for OpenAI-compatible inference servers so the agent can run against local Ollama, vLLM, llama.cpp, or generic OpenAI-compatible endpoints.

  • New local model routing for ollama/<model>, vllm/<model>, llamacpp/<model>, and local://<model>.
  • Local model options are exposed in the web UI only when ENABLE_LOCAL_MODELS=true, so the feature stays dark by default.
  • Backend validation now accepts custom local model paths behind the same feature flag.
  • CLI /model now accepts local model paths and skips HF Router catalog validation for them.
  • The web model selector now loads model options from /api/config/model and includes a custom local model path input.
  • Fixes frontend Claude model path detection to match the backend Bedrock model ID.

Usage

# .env
ENABLE_LOCAL_MODELS=true

# Optional endpoint overrides
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
LLAMACPP_BASE_URL=http://localhost:8001
LOCAL_LLM_BASE_URL=http://localhost:8000

For Ollama:

ollama pull llama3.1
ollama serve
ml-intern

Then switch models in the CLI:

/model ollama/llama3.1

Other supported local paths:

/model vllm/Qwen3.5-2B
/model llamacpp/<model-id-from-/v1/models>
/model local://my-model

In the web UI, when ENABLE_LOCAL_MODELS=true, the model menu shows local presets and a custom local model path field for values like ollama/qwen2.5-coder or local://my-model.

Files changed

  • agent/core/llm_params.py — add LiteLLM params for Ollama, vLLM, llama.cpp, and generic local OpenAI-compatible endpoints.
  • agent/core/model_switcher.py — allow CLI /model to accept local model path prefixes.
  • backend/model_catalog.py — new model catalog and validation helpers, including feature-flagged local models.
  • backend/routes/agent.py — use the model catalog for session creation, restore, model switching, and /api/config/model.
  • frontend/src/components/Chat/ChatInput.tsx — load model options from the backend and add custom local model path input.
  • frontend/src/utils/model.ts — align Claude model path detection with backend Bedrock IDs.
  • README.md — document model selection and local model setup.
  • tests/unit/test_llm_params.py — unit tests for local routing params.
  • tests/unit/test_local_model_validation.py — unit tests for feature-flagged custom local validation.
  • tests/unit/test_user_quotas.py — use package import so backend.user_quotas resolves cleanly.

Test plan

  • uv --cache-dir /tmp/uv-cache run pytest17 passed.
  • npm run build — frontend TypeScript/Vite build passes.
  • Local routing params resolve for ollama/, vllm/, llamacpp/, and local://.
  • Local routing params do not include duplicate timeout kwargs.
  • Custom local model IDs are rejected when ENABLE_LOCAL_MODELS is unset.
  • Custom local model IDs are accepted when ENABLE_LOCAL_MODELS=true.
  • Invalid custom local model paths with empty suffixes or whitespace are rejected.
  • Web model selector builds successfully with backend-driven model options.
  • Manual smoke test with a real local inference server in a full CLI/web session.

@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 2m 7s —— View job


PR Review: Add local model support for OpenAI-compatible endpoints

  • Read CLAUDE.md for repo guidelines
  • Read changed files and gather context
  • Review correctness and logic
  • Review security implications
  • Review LiteLLM/Bedrock routing impact
  • Assess merge safety against current main
  • Post final review

Summary

The implementation is largely correct and well-structured. Routing, feature-flagging, and validation are all sound. A few issues worth addressing before merging, in rough priority order:


Issues

1. Dead module-level AVAILABLE_MODELS in agent.py:42 — minor but confusing

# agent.py line 42
AVAILABLE_MODELS = get_available_models()   # captured at startup, never used

After this PR, AVAILABLE_MODELS is no longer referenced anywhere in agent.py — all three validation call sites now use is_valid_model_id(), and the /api/config/model endpoint calls get_available_models() directly. This stale module-level snapshot will confuse future contributors who might reach for it thinking it's the live list (it won't reflect runtime ENABLE_LOCAL_MODELS changes). Fix this →


2. LOCAL_MODEL_PREFIXES duplicated in three places — DRY violation

The same tuple/list is defined independently in:

  • backend/model_catalog.py:6 (LOCAL_MODEL_PREFIXES)
  • agent/core/model_switcher.py:36 (_LOCAL_MODEL_PREFIXES)
  • frontend/src/components/Chat/ChatInput.tsx:90 (LOCAL_MODEL_PREFIXES)

Adding a new local backend (e.g. lmstudio/) requires updating three files. The backend pair at minimum should share a single definition.


3. _is_anthropic_model over-matches local model names — logic issue

# backend/routes/agent.py:45
def _is_anthropic_model(model_id: str) -> bool:
    return "anthropic" in model_id

A local model like local://my-anthropic-wrapper or ollama/anthropic-clone satisfies this check and will incorrectly hit the HF-org gate at _require_hf_for_anthropic. This is a pre-existing issue, but local model support materially increases the chance of it firing. The check should be anchored to the canonical prefixes:

def _is_anthropic_model(model_id: str) -> bool:
    return model_id.startswith(("anthropic/", "bedrock/")) and "anthropic" in model_id

Fix this →


4. Frontend silently swallows invalid custom model path errors — UX gap

// ChatInput.tsx:293-295
} catch { /* ignore */ }

When the user submits a bad custom local model path and the backend returns 400, nothing is shown. The "Use" button just does nothing. At minimum the response status should be checked and an error state surfaced. This only affects ENABLE_LOCAL_MODELS=true deployments. Fix this →


5. Reasoning effort silently dropped for local models — behavioral surprise

_resolve_llm_params for ollama/, vllm/, llamacpp/, and local:// ignores reasoning_effort without raising UnsupportedEffortError or returning any signal. Users who have reasoning_effort set in their config and switch to a local model will silently lose it — no warning in the CLI or in the probe cascade. Bedrock has the same gap, but this PR extends the pattern. A brief log warning or UnsupportedEffortError on strict=True (matching how HF and Anthropic branches work) would make behavior consistent.


Correctness / Routing — looks good

  • Local routing branches in llm_params.py are inserted before the HF router fallthrough, so they don't interfere with existing anthropic/, bedrock/, or openai/ paths.
  • rstrip('/') on base URLs is correct; tested.
  • is_custom_local_model_id whitespace rejection is correct (any(char.isspace() for char in model_id) covers embedded spaces; the != strip() check is redundant but harmless).
  • Feature flag check in local_models_enabled() handles the common truthy strings correctly.
  • /api/config/model calls get_available_models() dynamically (not the stale snapshot), so the frontend model list is always live.
  • test_llm_params.py uses an unusual importlib load but is functional.

Security — acceptable

  • Local endpoints are read from server-side env vars only, not from user input — no SSRF risk from model IDs.
  • ENABLE_LOCAL_MODELS gates both the API validation (is_valid_model_id) and the frontend UI, so the feature is fully dark by default.
  • The customModelPath is validated server-side before use; the frontend field is just a text input with no filesystem access.

Merge verdict

Safe to merge after addressing issue #3 (_is_anthropic_model false-positive). Issues #1 and #4 are cleanup/UX and can be follow-up PRs. Issue #2 is technical debt. Issue #5 is low-priority behavioral parity.

@bhargav1000 bhargav1000 force-pushed the add-local-model-support branch from 251fcdc to 3e2e186 Compare April 24, 2026 17:09
@bhargav1000 bhargav1000 force-pushed the add-local-model-support branch from 3e2e186 to 1d79b90 Compare April 24, 2026 17:13
@bhargav1000
Copy link
Copy Markdown
Author

bhargav1000 commented Apr 24, 2026

Review follow-up

Addressed the prior review suggestions:

  • Removed the stale AVAILABLE_MODELS snapshot.
  • Anchored Anthropic gating to anthropic/ and bedrock/ IDs to avoid local false positives.
  • Shared local model prefix helpers via ml_intern/local_models.py.
  • Added frontend error feedback for invalid custom local paths.
  • Made local reasoning-effort rejection explicit in strict probe mode.
  • Resolved the latest backend/routes/agent.py rebase conflict by keeping upstream's feedback endpoint.
  • Added uv dev dependency metadata so uv run pytest uses project test deps instead of a system pytest.

Usage

ENABLE_LOCAL_MODELS=true
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
LLAMACPP_BASE_URL=http://localhost:8001
LOCAL_LLM_BASE_URL=http://localhost:8000
/model ollama/llama3.1
/model vllm/Qwen3.5-2B
/model llamacpp/<model-id-from-/v1/models>
/model local://my-model

Local inference servers must already be running; the agent talks to their OpenAI-compatible HTTP APIs.

Files changed

  • agent/core/llm_params.py, agent/core/model_switcher.py, ml_intern/local_models.py
  • backend/model_catalog.py, backend/routes/agent.py
  • frontend/src/components/Chat/ChatInput.tsx, frontend/src/utils/model.ts
  • README.md, pyproject.toml, uv.lock
  • tests/unit/test_llm_params.py, tests/unit/test_local_model_validation.py, tests/unit/test_user_quotas.py

Test plan

  • Branch current with upstream/main (0 behind, 3 ahead when checked).
  • uv --cache-dir /tmp/uv-cache run pytest65 passed.
  • npm run build from frontend/ — passed.
  • python -m py_compile agent/core/llm_params.py agent/core/model_switcher.py ml_intern/local_models.py backend/model_catalog.py backend/routes/agent.py
  • Manual smoke test with a real local inference server.

@bhargav1000
Copy link
Copy Markdown
Author

Review follow-up

Addressed the prior review suggestions:

  • Removed the stale AVAILABLE_MODELS snapshot.
  • Anchored Anthropic gating to anthropic/ and bedrock/ IDs to avoid local false positives.
  • Shared local model prefix helpers via ml_intern/local_models.py.
  • Added frontend error feedback for invalid custom local paths.
  • Made local reasoning-effort rejection explicit in strict probe mode.
  • Resolved the latest backend/routes/agent.py rebase conflict by keeping upstream's feedback endpoint.
  • Added uv dev dependency metadata so uv run pytest uses project test deps instead of a system pytest.

Usage

ENABLE_LOCAL_MODELS=true
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
LLAMACPP_BASE_URL=http://localhost:8001
LOCAL_LLM_BASE_URL=http://localhost:8000
/model ollama/llama3.1
/model vllm/Qwen3.5-2B
/model llamacpp/<model-id-from-/v1/models>
/model local://my-model

Local inference servers must already be running; the agent talks to their OpenAI-compatible HTTP APIs.

Files changed

  • agent/core/llm_params.py, agent/core/model_switcher.py, ml_intern/local_models.py
  • backend/model_catalog.py, backend/routes/agent.py
  • frontend/src/components/Chat/ChatInput.tsx, frontend/src/utils/model.ts
  • README.md, pyproject.toml, uv.lock
  • tests/unit/test_llm_params.py, tests/unit/test_local_model_validation.py, tests/unit/test_user_quotas.py

Test plan

  • Branch current with upstream/main (0 behind, 3 ahead when checked).
  • uv --cache-dir /tmp/uv-cache run pytest65 passed.
  • npm run build from frontend/ — passed.
  • python -m py_compile agent/core/llm_params.py agent/core/model_switcher.py ml_intern/local_models.py backend/model_catalog.py backend/routes/agent.py
  • Manual smoke test with a real local inference server.

@claude please acknowledge fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants