Switch from Bedrock to Anthropic endpoint as default. Include support for gpt-5.5 by lewtun · Pull Request #118 · huggingface/ml-intern

lewtun · 2026-04-25T16:30:43Z

Summary

When running ml-intern with the default config, the CLI was using bedrock/us.anthropic.claude-opus-4-6-v1. On machines without permission to invoke that Bedrock inference profile, even a trivial prompt failed with a litellm.APIConnectionError wrapping a Bedrock authorization error for bedrock:InvokeModelWithResponseStream.

This PR switches the default path away from Bedrock, adds direct OpenAI GPT-5 model support to the suggested model list and effort validation, and fixes a probe edge case that showed up while testing GPT-5.5.

What changed

switch the default model from Bedrock to direct Anthropic (anthropic/claude-opus-4-6)
update /model suggestions to surface direct Anthropic and OpenAI options
add direct OpenAI suggestions for openai/gpt-5.4 and openai/gpt-5.5
widen OpenAI effort validation so xhigh is accepted for GPT-5.4 / GPT-5.5
raise the effort probe token budget so GPT-5 models do not fail capability detection with a false output-limit error
document OPENAI_API_KEY usage in the README

Testing

Live smoke tests were run against:

anthropic/claude-opus-4-6
anthropic/claude-opus-4-7
openai/gpt-5.4
openai/gpt-5.5

Results:

Opus 4.6 and 4.7 both accepted max reasoning effort and returned a successful response in streaming and non-streaming modes
GPT-5.4 and GPT-5.5 both correctly fell back from max to xhigh and returned a successful response in streaming and non-streaming modes
unit tests: uv run pytest tests/unit/test_llm_params.py

cc @jagwar for viz

Co-authored-by: OpenAI Codex <codex@openai.com>

claude · 2026-04-25T16:31:06Z

PR Review in Progress

Gather context and read changed files
Analyze diff and review code
Post findings

View job run

claude · 2026-04-25T16:31:46Z

Claude finished @lewtun's task in 2m 14s —— View job

PR Review

No blocking issues — 1 P1
Verdict: ready to merge

What I checked

Default model change: bedrock/us.anthropic.claude-opus-4-6-v1 → anthropic/claude-opus-4-6 in configs/main_agent_config.json
_OPENAI_EFFORTS expansion to include xhigh and cascade correctness end-to-end
GPT-5.5 / GPT-5.4 routing through _resolve_llm_params (the openai/ branch, llm_params.py:165-175)
_PROBE_MAX_TOKENS bump from 16 → 64 and its effect on the probe cascade
pyproject.toml rename (hf-agent → ml-intern) — checked for residual references in CI, Dockerfile, and Python imports; none break
New tests in tests/unit/test_llm_params.py
uv.lock — skipped per policy

P1 — Stale user-facing string and comment now misrepresent `xhigh` as Anthropic-only

The PR adds "xhigh" to _OPENAI_EFFORTS (llm_params.py:75), correctly allowing it to be forwarded to OpenAI models. But two places still tell users (and future readers) that xhigh is Anthropic-only:

agent/main.py:774 — rendered directly in the terminal when a user runs /effort:

'max' and 'xhigh' are Anthropic-only; the cascade falls back to whatever the model actually accepts.

agent/core/effort_probe.py:35 — code comment above _EFFORT_CASCADE:

# try. First success wins.  ``max`` / ``xhigh`` are Anthropic-only; providers
# that don't accept them raise ``UnsupportedEffortError`` synchronously ...

After this PR xhigh is also offered to openai/ models, so both statements are factually wrong. The cascade still works correctly regardless (it falls back if the API rejects the level), but a user running /effort xhigh after switching to openai/gpt-5.5 will see help text that tells them it won't work when it actually will. Fix this →

Notes

The _PROBE_MAX_TOKENS 16 → 64 bump is well-motivated: frontier reasoning models can exhaust 16 tokens before completing even a trivial reply, causing a false "output limit reached" misclassification in the probe.
Tests: test_openai_max_effort_is_still_rejected uses a bare try/except + raise AssertionError instead of pytest.raises() (P2 style nit, not flagging formally).
Package rename (hf-agent → ml-intern): the only residual hf-agent references are in scripts/build_kpis.py ASCII-art comments (HF Hub dataset names, not Python package references) and frontend/package.json (a separate npm package). Neither is broken by this change.

Co-authored-by: OpenAI Codex <codex@openai.com>

claude · 2026-04-25T16:36:51Z

Claude finished @lewtun's task in 46s —— View job

Re-review (follow-up commit `623b5af`)

No blocking issues
Verdict: ready to merge

P1 fix acknowledged — The stale xhigh messaging flagged in the previous review has been addressed in both locations:

agent/main.py:774 now reads 'max' is Anthropic-only; 'xhigh' is also supported by current OpenAI GPT-5 models.
agent/core/effort_probe.py:35 comment updated to match.

Both fixes are accurate and consistent with the _OPENAI_EFFORTS expansion in llm_params.py:75. No new P0 findings in this commit.

lewtun and others added 4 commits April 25, 2026 16:08

Default ml-intern to Anthropic

9831e43

Co-authored-by: OpenAI Codex <codex@openai.com>

Add direct OpenAI GPT-5 model support

fe79c27

Co-authored-by: OpenAI Codex <codex@openai.com>

Raise probe budget for GPT-5 models

b85ca41

Co-authored-by: OpenAI Codex <codex@openai.com>

Fix deps

48d851b

Merge branch 'main' into fix-claude

603bafd

Fix stale xhigh provider messaging

623b5af

Co-authored-by: OpenAI Codex <codex@openai.com>

lewtun merged commit 0545e40 into main Apr 25, 2026
1 check passed

lewtun deleted the fix-claude branch April 25, 2026 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from Bedrock to Anthropic endpoint as default. Include support for gpt-5.5#118

Switch from Bedrock to Anthropic endpoint as default. Include support for gpt-5.5#118
lewtun merged 6 commits intomainfrom
fix-claude

lewtun commented Apr 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewtun commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Testing

Uh oh!

claude Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review in Progress

Uh oh!

claude Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

P1 — Stale user-facing string and comment now misrepresent xhigh as Anthropic-only

Notes

Uh oh!

claude Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review (follow-up commit 623b5af)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewtun commented Apr 25, 2026 •

edited

Loading

claude Bot commented Apr 25, 2026 •

edited

Loading

claude Bot commented Apr 25, 2026 •

edited

Loading

P1 — Stale user-facing string and comment now misrepresent `xhigh` as Anthropic-only

claude Bot commented Apr 25, 2026 •

edited

Loading

Re-review (follow-up commit `623b5af`)