Skip to content

feat(gemini): add Gemini model support via GEMINI_API_KEY#95

Open
karthiksa wants to merge 4 commits intohuggingface:mainfrom
karthiksa:feat/gemini-model-support
Open

feat(gemini): add Gemini model support via GEMINI_API_KEY#95
karthiksa wants to merge 4 commits intohuggingface:mainfrom
karthiksa:feat/gemini-model-support

Conversation

@karthiksa
Copy link
Copy Markdown

Summary

  • Adds gemini/<model> routing in _resolve_llm_params — LiteLLM picks up GEMINI_API_KEY automatically; thinking-capable models (e.g. gemini-2.5-pro, gemini-2.5-flash) receive thinking_config.thinking_budget mapped from effort levels (low=1024, medium=8192, high=24576).
  • Registers gemini/gemini-2.5-pro and gemini/gemini-2.5-flash in the model switcher suggested list and the backend AVAILABLE_MODELS list.
  • Exposes Gemini 2.5 Pro in the frontend chat model picker.
  • Documents GEMINI_API_KEY in README.md.
  • Adds unit tests covering effort→budget mapping, edge cases (minimal→low normalisation, strict/non-strict invalid effort), and regression guards for Anthropic / OpenAI / HF router paths.

Test plan

  • uv run pytest tests/unit/test_llm_params_gemini.py -v passes
  • Set GEMINI_API_KEY and run uv run python tests/unit/test_llm_params_gemini.py --live for a live smoke-test
  • Open the frontend model picker — "Gemini 2.5 Pro" appears in the list
  • Switch to gemini/gemini-2.5-pro in /model — no routing-info error

🤖 Generated with Claude Code

karthik-marala_inmobi and others added 4 commits April 23, 2026 22:09
Adds end-to-end support for Google Gemini models using LiteLLM's native
Gemini adapter. GEMINI_API_KEY is resolved automatically by LiteLLM for
any model prefixed with "gemini/".

Changes:
- agent/core/llm_params.py: new `gemini/` provider branch that maps
  reasoning effort levels to thinking_config.thinking_budget token
  budgets (low=1024, medium=8192, high=24576). "minimal" normalises to
  "low". "max"/"xhigh" raise UnsupportedEffortError so the probe cascade
  walks down to "high" without a wasted network call.
- agent/core/model_switcher.py: `gemini/` prefix bypasses the HF router
  catalog lookup (same pattern as anthropic/openai); Gemini 2.5 Pro and
  2.5 Flash added to SUGGESTED_MODELS; help text updated.
- backend/routes/agent.py: gemini/gemini-2.5-pro added to
  AVAILABLE_MODELS as a free-tier model (no HF-org gate — billed via
  the caller's GEMINI_API_KEY, not the Space's ANTHROPIC_API_KEY).
- frontend/src/components/Chat/ChatInput.tsx: Gemini 2.5 Pro added to
  MODEL_OPTIONS with Google favicon as avatar.
- README.md: GEMINI_API_KEY documented in the .env quick-start block.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Covers effort→thinking_budget mapping, edge cases (minimal→low normalisation,
strict/non-strict invalid effort), and regression checks to ensure Anthropic,
OpenAI, and HF router paths remain unaffected. Includes an opt-in live
smoke-test against the real Gemini API (--live flag).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…inking dict

The previous implementation passed `thinking={"type": "enabled", "budget_tokens": N}`
(Anthropic's format) to LiteLLM for Gemini models. Live testing revealed that
gemini-2.5-pro returned empty content because the Anthropic-style thinking param
was not consumed by LiteLLM's Gemini adapter — all token budget went to reasoning
with none left for text output.

LiteLLM's Gemini adapter already maps `reasoning_effort` → `thinking_config.thinkingBudget`
natively (confirmed in vertex_and_google_ai_studio_gemini.py). Switching to pass
`reasoning_effort` directly fixes the empty-response bug and simplifies the code.

Live smoke tests confirmed:
- gemini/gemini-2.5-flash with reasoning_effort=low → returns text content ✓
- gemini/gemini-2.5-pro with no effort params + max_tokens=512 → returns text content ✓

Unit tests updated to assert `reasoning_effort` key instead of `thinking` dict.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 1m 50s —— View job


PR Review: feat(gemini): add Gemini model support via GEMINI_API_KEY

  • Read changed files
  • Review correctness
  • Review security / access control
  • Review LiteLLM/Bedrock routing impact
  • Assess merge safety
  • Post findings

Summary

The routing logic is correct and non-breaking. The Bedrock/Anthropic/OpenAI paths are untouched. However, there are two issues worth addressing before merging — one is a meaningful security/cost gap, and one is a minor inconsistency.


Issues

🔴 Missing quota gate for Gemini 2.5 Pro

backend/routes/agent.py lines 56–62

{
    "id": "gemini/gemini-2.5-pro",
    "provider": "gemini",
    "tier": "free",   # ← no quota / no org-membership gate
},

Gemini 2.5 Pro is comparable in cost to Claude Opus, but it has:

  • No _require_hf_for_gemini equivalent
  • No daily-quota check (_enforce_claude_quota only fires on "anthropic" in model_id)

If GEMINI_API_KEY is set server-side in the Space, every authenticated user gets unlimited Gemini 2.5 Pro — billed to the Space owner with no cap. The Anthropic path has both an org-membership gate and a per-user daily cap; Gemini needs at least one of those.

If the intended deployment never sets GEMINI_API_KEY in the Space env, this is moot and the model is effectively dead on arrival. But the model listing in the frontend picker will still appear clickable and fail silently, which is confusing.

Either add an access guard (HF org membership check, at minimum), or explicitly document the intent ("Gemini only works when the operator sets GEMINI_API_KEY in their own deployment — not enabled on the public Space") and hide it from the frontend picker when the key isn't present. Fix this →


🟡 gemini-2.5-flash is in CLI suggested list but not in AVAILABLE_MODELS

agent/core/model_switcher.py line 30 vs backend/routes/agent.py

# model_switcher.py — CLI suggested list
{"id": "gemini/gemini-2.5-flash", "label": "Gemini 2.5 Flash"},

# agent.py — backend AVAILABLE_MODELS — Flash is absent

CLI users who pick gemini/gemini-2.5-flash from /model can use it fine (the CLI doesn't gate against AVAILABLE_MODELS). But any web-frontend call to POST /api/session/{id}/model with Flash will return 400 Unknown model. Flash is also absent from the frontend model picker.

Either add Flash to AVAILABLE_MODELS (and apply the same quota/access considerations as Pro), or drop it from SUGGESTED_MODELS for now. Fix this →


Minor Notes

frontend/src/components/Chat/ChatInput.tsx line 49

avatarUrl: 'https://www.gstatic.com/lamda/images/gemini_favicon_f069958c85030456e93de685481c559f160ea06.svg',

Hardcoded external asset from gstatic.com — hash in the filename is stable now, but if Google ever rotates it, the avatar silently breaks. Low risk, but worth noting if bundled SVGs are used elsewhere.

agent/core/llm_params.py line 81–82

# LiteLLM maps reasoning_effort → thinking_config.thinkingBudget natively for Gemini;
# we keep this dict for unit-test assertions and documentation only.
_GEMINI_THINKING_BUDGETS = {"low": 1024, "medium": 8192, "high": 24576}

The dict isn't used in any live code path, only imported in tests to assert ordering. That's fine as documentation, but the test_thinking_budgets_are_ordered test only checks relative ordering — it won't catch if LiteLLM's actual budget values differ. Not blocking.


Routing Safety ✅

The routing order in _resolve_llm_params is correct — gemini/ is checked explicitly before the HF fallback, so Gemini IDs will never accidentally route to the HF router. Bedrock, Anthropic, and OpenAI paths are unaffected; the new block is a clean insertion with no side effects on existing logic. Unit tests cover the other-provider regression cases.


Verdict: The core implementation is correct. The cost/access-control gap on Gemini (no quota gate) is the main blocker — address it or document the deployment intent clearly before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants