feat(gemini): add Gemini model support via GEMINI_API_KEY by karthiksa · Pull Request #95 · huggingface/ml-intern

karthiksa · 2026-04-24T05:24:32Z

Summary

Adds gemini/<model> routing in _resolve_llm_params — LiteLLM picks up GEMINI_API_KEY automatically; thinking-capable models (e.g. gemini-2.5-pro, gemini-2.5-flash) receive thinking_config.thinking_budget mapped from effort levels (low=1024, medium=8192, high=24576).
Registers gemini/gemini-2.5-pro and gemini/gemini-2.5-flash in the model switcher suggested list and the backend AVAILABLE_MODELS list.
Exposes Gemini 2.5 Pro in the frontend chat model picker.
Documents GEMINI_API_KEY in README.md.
Adds unit tests covering effort→budget mapping, edge cases (minimal→low normalisation, strict/non-strict invalid effort), and regression guards for Anthropic / OpenAI / HF router paths.

Test plan

uv run pytest tests/unit/test_llm_params_gemini.py -v passes
Set GEMINI_API_KEY and run uv run python tests/unit/test_llm_params_gemini.py --live for a live smoke-test
Open the frontend model picker — "Gemini 2.5 Pro" appears in the list
Switch to gemini/gemini-2.5-pro in /model — no routing-info error

🤖 Generated with Claude Code

Adds end-to-end support for Google Gemini models using LiteLLM's native Gemini adapter. GEMINI_API_KEY is resolved automatically by LiteLLM for any model prefixed with "gemini/". Changes: - agent/core/llm_params.py: new `gemini/` provider branch that maps reasoning effort levels to thinking_config.thinking_budget token budgets (low=1024, medium=8192, high=24576). "minimal" normalises to "low". "max"/"xhigh" raise UnsupportedEffortError so the probe cascade walks down to "high" without a wasted network call. - agent/core/model_switcher.py: `gemini/` prefix bypasses the HF router catalog lookup (same pattern as anthropic/openai); Gemini 2.5 Pro and 2.5 Flash added to SUGGESTED_MODELS; help text updated. - backend/routes/agent.py: gemini/gemini-2.5-pro added to AVAILABLE_MODELS as a free-tier model (no HF-org gate — billed via the caller's GEMINI_API_KEY, not the Space's ANTHROPIC_API_KEY). - frontend/src/components/Chat/ChatInput.tsx: Gemini 2.5 Pro added to MODEL_OPTIONS with Google favicon as avatar. - README.md: GEMINI_API_KEY documented in the .env quick-start block. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Covers effort→thinking_budget mapping, edge cases (minimal→low normalisation, strict/non-strict invalid effort), and regression checks to ensure Anthropic, OpenAI, and HF router paths remain unaffected. Includes an opt-in live smoke-test against the real Gemini API (--live flag). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…inking dict The previous implementation passed `thinking={"type": "enabled", "budget_tokens": N}` (Anthropic's format) to LiteLLM for Gemini models. Live testing revealed that gemini-2.5-pro returned empty content because the Anthropic-style thinking param was not consumed by LiteLLM's Gemini adapter — all token budget went to reasoning with none left for text output. LiteLLM's Gemini adapter already maps `reasoning_effort` → `thinking_config.thinkingBudget` natively (confirmed in vertex_and_google_ai_studio_gemini.py). Switching to pass `reasoning_effort` directly fixes the empty-response bug and simplifies the code. Live smoke tests confirmed: - gemini/gemini-2.5-flash with reasoning_effort=low → returns text content ✓ - gemini/gemini-2.5-pro with no effort params + max_tokens=512 → returns text content ✓ Unit tests updated to assert `reasoning_effort` key instead of `thinking` dict. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

akseljoonas · 2026-04-24T14:08:01Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:21Z

Claude finished @akseljoonas's task in 1m 50s —— View job

PR Review: feat(gemini): add Gemini model support via GEMINI_API_KEY

Summary

The routing logic is correct and non-breaking. The Bedrock/Anthropic/OpenAI paths are untouched. However, there are two issues worth addressing before merging — one is a meaningful security/cost gap, and one is a minor inconsistency.

Issues

🔴 Missing quota gate for Gemini 2.5 Pro

backend/routes/agent.py lines 56–62

{
    "id": "gemini/gemini-2.5-pro",
    "provider": "gemini",
    "tier": "free",   # ← no quota / no org-membership gate
},

Gemini 2.5 Pro is comparable in cost to Claude Opus, but it has:

No _require_hf_for_gemini equivalent
No daily-quota check (_enforce_claude_quota only fires on "anthropic" in model_id)

If GEMINI_API_KEY is set server-side in the Space, every authenticated user gets unlimited Gemini 2.5 Pro — billed to the Space owner with no cap. The Anthropic path has both an org-membership gate and a per-user daily cap; Gemini needs at least one of those.

If the intended deployment never sets GEMINI_API_KEY in the Space env, this is moot and the model is effectively dead on arrival. But the model listing in the frontend picker will still appear clickable and fail silently, which is confusing.

Either add an access guard (HF org membership check, at minimum), or explicitly document the intent ("Gemini only works when the operator sets GEMINI_API_KEY in their own deployment — not enabled on the public Space") and hide it from the frontend picker when the key isn't present. Fix this →

🟡 `gemini-2.5-flash` is in CLI suggested list but not in `AVAILABLE_MODELS`

agent/core/model_switcher.py line 30 vs backend/routes/agent.py

# model_switcher.py — CLI suggested list
{"id": "gemini/gemini-2.5-flash", "label": "Gemini 2.5 Flash"},

# agent.py — backend AVAILABLE_MODELS — Flash is absent

CLI users who pick gemini/gemini-2.5-flash from /model can use it fine (the CLI doesn't gate against AVAILABLE_MODELS). But any web-frontend call to POST /api/session/{id}/model with Flash will return 400 Unknown model. Flash is also absent from the frontend model picker.

Either add Flash to AVAILABLE_MODELS (and apply the same quota/access considerations as Pro), or drop it from SUGGESTED_MODELS for now. Fix this →

Minor Notes

frontend/src/components/Chat/ChatInput.tsx line 49

avatarUrl: 'https://www.gstatic.com/lamda/images/gemini_favicon_f069958c85030456e93de685481c559f160ea06.svg',

Hardcoded external asset from gstatic.com — hash in the filename is stable now, but if Google ever rotates it, the avatar silently breaks. Low risk, but worth noting if bundled SVGs are used elsewhere.

agent/core/llm_params.py line 81–82

# LiteLLM maps reasoning_effort → thinking_config.thinkingBudget natively for Gemini;
# we keep this dict for unit-test assertions and documentation only.
_GEMINI_THINKING_BUDGETS = {"low": 1024, "medium": 8192, "high": 24576}

The dict isn't used in any live code path, only imported in tests to assert ordering. That's fine as documentation, but the test_thinking_budgets_are_ordered test only checks relative ordering — it won't catch if LiteLLM's actual budget values differ. Not blocking.

Routing Safety ✅

The routing order in _resolve_llm_params is correct — gemini/ is checked explicitly before the HF fallback, so Gemini IDs will never accidentally route to the HF router. Bedrock, Anthropic, and OpenAI paths are unaffected; the new block is a clean insertion with no side effects on existing logic. Unit tests cover the other-provider regression cases.

Verdict: The core implementation is correct. The cost/access-control gap on Gemini (no quota gate) is the main blocker — address it or document the deployment intent clearly before merging.

karthik-marala_inmobi and others added 4 commits April 23, 2026 22:09

Merge branch 'main' into feat/gemini-model-support

aa1dd8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemini): add Gemini model support via GEMINI_API_KEY#95

feat(gemini): add Gemini model support via GEMINI_API_KEY#95
karthiksa wants to merge 4 commits intohuggingface:mainfrom
karthiksa:feat/gemini-model-support

karthiksa commented Apr 24, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karthiksa commented Apr 24, 2026

Summary

Test plan

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(gemini): add Gemini model support via GEMINI_API_KEY

Summary

Issues

🔴 Missing quota gate for Gemini 2.5 Pro

🟡 gemini-2.5-flash is in CLI suggested list but not in AVAILABLE_MODELS

Minor Notes

Routing Safety ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 24, 2026 •

edited

Loading

🟡 `gemini-2.5-flash` is in CLI suggested list but not in `AVAILABLE_MODELS`