feat(backend): add multi-provider LLM support via LLM_PROFILE#3
Merged
Conversation
Adds five new profiles alongside deepinfra (gemini, nim, together, local, local_gemma) and threads base_url through to LitellmModel so OpenAI-compatible local servers (llama.cpp, vLLM, LM Studio) work via LLM_PROFILE=local or local_gemma. Switching providers is a single env var flip — no other config changes required. - config.py: six profiles in _LLM_PROFILES; agent_api_base setting; resolved_api_base property; startup log line for the resolved base URL. - agent.py: _create_model() passes base_url=resolved_api_base when set. - docker-compose.local.yml: extra_hosts maps host.docker.internal so the container can reach a local LLM on the host (Linux parity; Docker Desktop already maps this). - README + .env.example: provider matrix and one-flip workflow. Verified end-to-end with both deepinfra (cloud Gemma 4 31B) and local_gemma (AWQ Gemma via vLLM on :8002). All 56 backend tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds five new LLM provider profiles alongside
deepinfra, so the agent backend can run on any of:geminigemini/gemini-3.1-flash-lite-previewGEMINI_API_KEYdeepinfradeepinfra/google/gemma-4-31B-itDEEPINFRA_API_KEYnimnvidia_nim/google/gemma-4-31b-itNVIDIA_NIM_API_KEYtogethertogether_ai/google/gemma-4-31B-itTOGETHER_API_KEYlocalopenai/qwen(llama.cpp / vLLM / LM Studio on :8003)LOCAL_API_KEY+LOCAL_API_BASElocal_gemmaopenai/cyankiwi/gemma-4-31B-it-AWQ-4bit(Gemma 4 31B AWQ on :8002)LOCAL_API_KEY+LOCAL_API_BASE_GEMMASwitching providers is a single env-var flip (
LLM_PROFILE=...) — no other config changes required.What's in the diff
backend/API/config.py— six profiles in_LLM_PROFILES; newagent_api_basesetting +resolved_api_baseproperty; resolved base URL logged on startup.backend/API/agent.py—_create_model()passesbase_url=resolved_api_basetoLitellmModelwhen the active profile defines anapi_base_env. Cloud profiles behave exactly as before (base_urlis omitted).docker-compose.local.yml— declareshost.docker.internal:host-gatewayon the backend service so containerised runs can reach a local LLM running on the host. Docker Desktop already maps this; the explicit entry gives Linux parity.backend/README.md+backend/.env.example— provider matrix and one-flip workflow documented.Verification
pytest— all 56 backend tests pass.LLM_PROFILE=deepinfraend-to-end via Docker Compose (cloud Gemma 4 31B).LLM_PROFILE=local_gemmaend-to-end via Docker Compose (AWQ Gemma served by vLLM on:8002, reached throughhost.docker.internal).Test plan
backend/.env.exampletobackend/.env, fill inDEEPINFRA_API_KEY, rundocker-compose -f docker-compose.local.yml up --build. Hit/api/agent-status, expect"llm_profile": "deepinfra"and"litellm_model_initialized": true.LLM_PROFILE=local(orlocal_gemma) with the matchingLOCAL_API_BASE*, restart the backend withdocker compose up -d(no rebuild), repeat — same result over the local endpoint.