Add local model support for OpenAI-compatible endpoints#99
Add local model support for OpenAI-compatible endpoints#99bhargav1000 wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters. |
|
Claude finished @akseljoonas's task in 2m 7s —— View job PR Review: Add local model support for OpenAI-compatible endpoints
SummaryThe implementation is largely correct and well-structured. Routing, feature-flagging, and validation are all sound. A few issues worth addressing before merging, in rough priority order: Issues1. Dead module-level # agent.py line 42
AVAILABLE_MODELS = get_available_models() # captured at startup, never usedAfter this PR, 2. The same tuple/list is defined independently in:
Adding a new local backend (e.g. 3. # backend/routes/agent.py:45
def _is_anthropic_model(model_id: str) -> bool:
return "anthropic" in model_idA local model like def _is_anthropic_model(model_id: str) -> bool:
return model_id.startswith(("anthropic/", "bedrock/")) and "anthropic" in model_id4. Frontend silently swallows invalid custom model path errors — UX gap // ChatInput.tsx:293-295
} catch { /* ignore */ }When the user submits a bad custom local model path and the backend returns 400, nothing is shown. The "Use" button just does nothing. At minimum the response status should be checked and an error state surfaced. This only affects 5. Reasoning effort silently dropped for local models — behavioral surprise
Correctness / Routing — looks good
Security — acceptable
Merge verdictSafe to merge after addressing issue #3 ( |
251fcdc to
3e2e186
Compare
3e2e186 to
1d79b90
Compare
Review follow-upAddressed the prior review suggestions:
UsageENABLE_LOCAL_MODELS=true
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
LLAMACPP_BASE_URL=http://localhost:8001
LOCAL_LLM_BASE_URL=http://localhost:8000Local inference servers must already be running; the agent talks to their OpenAI-compatible HTTP APIs. Files changed
Test plan
|
@claude please acknowledge fixes |
Summary
Adds local model support for OpenAI-compatible inference servers so the agent can run against local Ollama, vLLM, llama.cpp, or generic OpenAI-compatible endpoints.
ollama/<model>,vllm/<model>,llamacpp/<model>, andlocal://<model>.ENABLE_LOCAL_MODELS=true, so the feature stays dark by default./modelnow accepts local model paths and skips HF Router catalog validation for them./api/config/modeland includes a custom local model path input.Usage
For Ollama:
Then switch models in the CLI:
Other supported local paths:
In the web UI, when
ENABLE_LOCAL_MODELS=true, the model menu shows local presets and a custom local model path field for values likeollama/qwen2.5-coderorlocal://my-model.Files changed
agent/core/llm_params.py— add LiteLLM params for Ollama, vLLM, llama.cpp, and generic local OpenAI-compatible endpoints.agent/core/model_switcher.py— allow CLI/modelto accept local model path prefixes.backend/model_catalog.py— new model catalog and validation helpers, including feature-flagged local models.backend/routes/agent.py— use the model catalog for session creation, restore, model switching, and/api/config/model.frontend/src/components/Chat/ChatInput.tsx— load model options from the backend and add custom local model path input.frontend/src/utils/model.ts— align Claude model path detection with backend Bedrock IDs.README.md— document model selection and local model setup.tests/unit/test_llm_params.py— unit tests for local routing params.tests/unit/test_local_model_validation.py— unit tests for feature-flagged custom local validation.tests/unit/test_user_quotas.py— use package import sobackend.user_quotasresolves cleanly.Test plan
uv --cache-dir /tmp/uv-cache run pytest— 17 passed.npm run build— frontend TypeScript/Vite build passes.ollama/,vllm/,llamacpp/, andlocal://.timeoutkwargs.ENABLE_LOCAL_MODELSis unset.ENABLE_LOCAL_MODELS=true.