Make runtime provider switching work on the FastAPI surface#27
Draft
nyimbi wants to merge 3 commits into
Draft
Conversation
…viders Extends OpenSwarm's provider matrix from three (OpenAI / Anthropic / Google) to seven, plus a runtime provider switch for the orchestrator. New providers - Azure OpenAI Service: your own gpt-* deployment on Azure (LiteLLM prefix `azure/`, env: AZURE_API_KEY + AZURE_API_BASE + AZURE_API_VERSION). - Azure AI Foundry: catalog of non-OpenAI models on Azure including Anthropic Claude (Opus / Sonnet), Llama, Mistral, DeepSeek (LiteLLM prefix `azure_ai/`, env: AZURE_AI_API_KEY + AZURE_AI_API_BASE). For Anthropic models the base URL must end with `/anthropic`. - Ollama (local): no key required, defaults to http://localhost:11434. OLLAMA_API_BASE is threaded explicitly into LitellmModel. - OpenAI-compatible: generic route for any vendor with an OpenAI-shaped API — Ollama Cloud, Groq, Together AI, Mistral La Plateforme, OpenRouter, vLLM-based deployments. Uses dedicated OPENAI_COMPAT_* env vars so a real OPENAI_API_KEY kept for fallback is never overwritten. Only the base URL is required; key is optional for keyless local endpoints. Provider routing - Single source of truth: config.PROVIDER_REGISTRY maps slug to (prefix, required_env). Both the SwitchProvider tool and the onboarding wizard derive their behavior from this table. - DEFAULT_MODEL=openai_compat/<model> is a sentinel that config._resolve() unwraps to LiteLLM's openai/<model> with the dedicated credentials passed via base_url and api_key. - get_active_provider() classifies via longest-prefix-wins lookup (so azure_ai/ matches before azure/) and returns "unknown" for unrecognized litellm/<vendor>/<model> strings. Runtime switching - New SwitchProvider tool in orchestrator/tools/, registered only on the orchestrator. Users say "switch to ollama llama3.1" or "/switch-provider azure_ai claude-opus-4-1"; the tool validates credentials, writes DEFAULT_MODEL to .env atomically, and signals run_utils.main() to rebuild the agency on next TUI exit. The orchestrator's "router only" contract is preserved with a single documented carve-out for this administrative concern. - The FastAPI server (server.py) doesn't read the restart signal — switching from an API client is a documented no-op. - Restart flag files live in a user-scoped tempdir (mode 0o700) so a co-tenant on /tmp can't force a spurious restart. Hardening - SSRF defense: SwitchProvider refuses any openai_compat switch where OPENAI_COMPAT_API_BASE isn't an https:// URL with a real hostname. Closes the prompt-injection chain where an attacker pre-positions the base URL and induces a switch, redirecting all subsequent LLM traffic (with bearer tokens and conversation history). - Input validation: model field requires alphanumeric start + the characters real model names use ([\w.:-/]). Blocks newline injection into .env, shell metacharacters, and `..`-style ids. - Atomic .env write: the restart flag is touched BEFORE the .env rewrite so a crash in any window leaves recoverable state. The rewrite uses set_key on a temp copy then os.replace to avoid partial-read exposure. - config._resolve() raises RuntimeError when openai_compat is configured without the base URL, instead of returning a LitellmModel with None credentials that would fail cryptically at first call. - The except clause in _resolve catches only ImportError; TypeError now propagates so misconfigured kwargs surface immediately rather than degrading to a bare model string. Tests - 36 pytest cases cover provider validation, SSRF guard, input validation, atomic write recovery, missing-credential errors, prefix classification (incl. longest-prefix-wins for azure_ai/ vs azure/), openai_compat unwrap to openai/<model>, RuntimeError on missing API_BASE, ImportError graceful degradation, TypeError propagation, dotenv quoting round-trips, OSError on flag touch refuses switch, and the wizard's PROVIDERS data shape contract. - Test scaffolding stubs agency_swarm + openai.types.shared in sys.modules so the suite runs from a bare Python install with just pytest + python-dotenv + pydantic — no need for the full agency-swarm dependency chain. Documentation - README updated: 7-provider list, runtime switch description, upgrading-from-earlier-version section. - AGENTS.md documents the orchestrator/tools/ convention and the PROVIDER_REGISTRY contract. - orchestrator/instructions.md documents the administrative carve-out. - .env.example documents every new env var with vendor URL examples for openai_compat (Groq, Together, Mistral, OpenRouter; Ollama Cloud points at https://docs.ollama.com since the canonical endpoint can change). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous PR (be205e9) documented FastAPI as a no-op for runtime switching, on the assumption that the agency was constructed once at startup. Reading agency-swarm's request handlers shows the agency is actually rebuilt per-request — `agency_factory(load_threads_callback=...)` is invoked inside each chat/run call (see endpoint_handlers.py:457, :552, :825). All that's needed for FastAPI to pick up a switch is for os.environ to reflect the new .env values before the next request. Three small changes: - SwitchProvider.run() now calls load_dotenv(override=True) on the freshly written .env after the atomic rewrite. This refreshes the running process's os.environ so the next agency build (whether driven by a FastAPI request or a TUI restart) sees the new DEFAULT_MODEL and credentials. - The TUI restart flag becomes best-effort. The switch is already live in-process via env reload; the flag is now just a UX cue for the TUI to refresh its display state. A failed flag touch is non-fatal — we return success since the switch did apply. - The previous "Cannot switch — no restart signal available" path is gone. Running outside the TUI loop is now a supported context, not an error. Updated docs: - server.py header: removed the "switching is a no-op" warning; describes how per-request rebuilds pick up the change. - orchestrator/instructions.md: removed the "exit the TUI to apply" instruction. Switches are live immediately; TUI users only quit if they want a fresh display. - SwitchProvider docstring: explains why FastAPI works (per-request rebuild) and why the flag is now best-effort. Tests (36 → 38): - test_no_flag_env_var_still_succeeds: verifies the FastAPI-style context (no flag env var) gets a successful switch + .env write + os.environ refresh. - test_oserror_on_flag_touch_does_not_abort_switch: a failing flag touch returns success because the env reload already applied. - test_switch_refreshes_os_environ_for_fastapi_path: the core guarantee — os.environ["DEFAULT_MODEL"] reflects the switch after run() returns. - test_switch_refreshes_provider_credentials_in_environ: pre-existing .env credentials become visible in os.environ post-switch, so the next agency build can authenticate. - The two tests asserting "no flag means refused" / "OSError aborts" were updated to match the new behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four pytest cases that hit real provider endpoints when credentials are in the environment, skipped cleanly otherwise. Marked `live` so the default `pytest` invocation includes them but a CI / stub-only run can exclude with `pytest -m "not live"`. Tests - test_live_ollama_chat: real chat against a local Ollama server. Discovers an available model via /api/tags, skips if Ollama is unreachable or no models are pulled. - test_live_azure_ai_foundry_claude: real chat against Azure-hosted Claude. Validates the /anthropic URL suffix path documented in PROVIDER_REGISTRY by sending a real prompt to a real Azure endpoint. - test_live_azure_openai: real chat against an Azure OpenAI Service deployment. Skipped unless AZURE_OPENAI_DEPLOYMENT is set (since there's no canonical default for "your deployment"). - test_live_switch_provider_transition: starts on local Ollama, invokes SwitchProvider to change to Azure AI Foundry, verifies the TUI flag was touched, os.environ was refreshed in-process, and the very next live call reaches Claude on Azure with no restart. This is the end-to-end proof of the FastAPI runtime-switching guarantee from PR VRSEN#27. Credentials - Read only from the shell environment; never written to source. - Bridges OpenAI-SDK-style names (AZURE_OPENAI_API_KEY, ANTHROPIC_FOUNDRY_RESOURCE, OLLAMA_HOST) to OpenSwarm's LiteLLM- style names (AZURE_API_KEY, AZURE_AI_API_BASE, OLLAMA_API_BASE) inside the fixture so users with either convention can run. - Each test uses pytest.skip with a clear reason when its credentials are absent, so missing keys never become test failures. Run pytest # all tests, live ones skip if no creds pytest -m live -v # only live tests pytest -m "not live" # only stub-friendly unit tests (CI) Result on the author's machine (Azure AI Foundry + local Ollama): 41 passed, 1 skipped (Azure OpenAI Service — no deployment in env) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
Author
Live verification — runtime switching works on a real provider transitionRan
All in one Python process, no restart, no supervisor, no file watcher. That's the end-to-end proof for this PR's claim about FastAPI compatibility. Full live suite resultsThe Azure OpenAI Service direct test ( Combined unit + live suite
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #26 — that PR documented FastAPI as a no-op for runtime provider switching on the assumption that the agency was constructed once at startup. Reading agency-swarm's request handlers shows the agency is actually rebuilt per-request (
agency_factory(...)is invoked inside each chat/run call atendpoint_handlers.pylines 457, 552, 825). All that's needed for FastAPI to pick up a switch is foros.environto reflect the new.envvalues before the next request rebuilds the agency.Stacked on #26
This branch is built on top of
feat/azure-ollama-providers. Until #26 merges, the diff shown here includes both PRs' changes. The actual delta for this PR is 4 files / +98 / -63.If #26 lands first, this PR's diff will collapse to just the runtime-switching fix.
What changed
orchestrator/tools/SwitchProvider.py: After the atomic.envrewrite, the tool now callsload_dotenv(override=True)so the running process'sos.environreflects the new values. The next request that rebuilds the agency reads them naturally.TUI restart flag → best-effort. The switch is now live in-process the moment
run()returns; the flag is just a UX cue for the TUI to refresh its display. A failed flag touch is non-fatal — the tool reports success because the switch did apply. The previous "Cannot switch — no restart signal available" error path is gone; running outside the TUI loop (i.e. the FastAPI process) is now a supported context.Documentation:
server.pyheader: removed the "switches stay pinned until restart" warning; describes how per-request rebuilds pick up changes naturally.orchestrator/instructions.md: removed the "exit the TUI to apply" instruction. Switches are live immediately. TUI users only/quitif they want a fresh display.SwitchProviderdocstring: explains why FastAPI works (per-request rebuild) and why the flag is now best-effort.Tests (36 → 38)
test_no_flag_env_var_still_succeeds: verifies the FastAPI-style context (noOPENSWARM_SWITCH_FLAGenv var) gets a successful switch +.envwrite +os.environrefresh.test_oserror_on_flag_touch_does_not_abort_switch: a failing flag touch returns success because the env reload already applied.test_switch_refreshes_os_environ_for_fastapi_path: the core guarantee —os.environ[\"DEFAULT_MODEL\"]reflects the switch afterrun()returns.test_switch_refreshes_provider_credentials_in_environ: pre-existing.envcredentials become visible inos.environpost-switch so the next agency build can authenticate.Test plan
endpoint_handlers.pyto confirmagency_factoryis per-request (verified at lines 457, 552, 825).Backwards compatibility
No behavior change for the TUI surface — the restart flag is still touched on success and the loop in
run_utils.main()works exactly as before. The TUI just stops being the only valid execution context forSwitchProvider. Existing tests for the TUI path continue to pass.🤖 Generated with Claude Code