feat(gemini): add Gemini model support via GEMINI_API_KEY#95
feat(gemini): add Gemini model support via GEMINI_API_KEY#95karthiksa wants to merge 4 commits intohuggingface:mainfrom
Conversation
Adds end-to-end support for Google Gemini models using LiteLLM's native Gemini adapter. GEMINI_API_KEY is resolved automatically by LiteLLM for any model prefixed with "gemini/". Changes: - agent/core/llm_params.py: new `gemini/` provider branch that maps reasoning effort levels to thinking_config.thinking_budget token budgets (low=1024, medium=8192, high=24576). "minimal" normalises to "low". "max"/"xhigh" raise UnsupportedEffortError so the probe cascade walks down to "high" without a wasted network call. - agent/core/model_switcher.py: `gemini/` prefix bypasses the HF router catalog lookup (same pattern as anthropic/openai); Gemini 2.5 Pro and 2.5 Flash added to SUGGESTED_MODELS; help text updated. - backend/routes/agent.py: gemini/gemini-2.5-pro added to AVAILABLE_MODELS as a free-tier model (no HF-org gate — billed via the caller's GEMINI_API_KEY, not the Space's ANTHROPIC_API_KEY). - frontend/src/components/Chat/ChatInput.tsx: Gemini 2.5 Pro added to MODEL_OPTIONS with Google favicon as avatar. - README.md: GEMINI_API_KEY documented in the .env quick-start block. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Covers effort→thinking_budget mapping, edge cases (minimal→low normalisation, strict/non-strict invalid effort), and regression checks to ensure Anthropic, OpenAI, and HF router paths remain unaffected. Includes an opt-in live smoke-test against the real Gemini API (--live flag). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…inking dict
The previous implementation passed `thinking={"type": "enabled", "budget_tokens": N}`
(Anthropic's format) to LiteLLM for Gemini models. Live testing revealed that
gemini-2.5-pro returned empty content because the Anthropic-style thinking param
was not consumed by LiteLLM's Gemini adapter — all token budget went to reasoning
with none left for text output.
LiteLLM's Gemini adapter already maps `reasoning_effort` → `thinking_config.thinkingBudget`
natively (confirmed in vertex_and_google_ai_studio_gemini.py). Switching to pass
`reasoning_effort` directly fixes the empty-response bug and simplifies the code.
Live smoke tests confirmed:
- gemini/gemini-2.5-flash with reasoning_effort=low → returns text content ✓
- gemini/gemini-2.5-pro with no effort params + max_tokens=512 → returns text content ✓
Unit tests updated to assert `reasoning_effort` key instead of `thinking` dict.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters. |
|
Claude finished @akseljoonas's task in 1m 50s —— View job PR Review: feat(gemini): add Gemini model support via GEMINI_API_KEY
SummaryThe routing logic is correct and non-breaking. The Bedrock/Anthropic/OpenAI paths are untouched. However, there are two issues worth addressing before merging — one is a meaningful security/cost gap, and one is a minor inconsistency. Issues🔴 Missing quota gate for Gemini 2.5 Pro
{
"id": "gemini/gemini-2.5-pro",
"provider": "gemini",
"tier": "free", # ← no quota / no org-membership gate
},Gemini 2.5 Pro is comparable in cost to Claude Opus, but it has:
If If the intended deployment never sets Either add an access guard (HF org membership check, at minimum), or explicitly document the intent ("Gemini only works when the operator sets 🟡
|
Summary
gemini/<model>routing in_resolve_llm_params— LiteLLM picks upGEMINI_API_KEYautomatically; thinking-capable models (e.g.gemini-2.5-pro,gemini-2.5-flash) receivethinking_config.thinking_budgetmapped from effort levels (low=1024, medium=8192, high=24576).gemini/gemini-2.5-proandgemini/gemini-2.5-flashin the model switcher suggested list and the backendAVAILABLE_MODELSlist.GEMINI_API_KEYinREADME.md.Test plan
uv run pytest tests/unit/test_llm_params_gemini.py -vpassesGEMINI_API_KEYand runuv run python tests/unit/test_llm_params_gemini.py --livefor a live smoke-testgemini/gemini-2.5-proin/model— no routing-info error🤖 Generated with Claude Code