Skip to content

Switch from Bedrock to Anthropic endpoint as default. Include support for gpt-5.5#118

Merged
lewtun merged 6 commits intomainfrom
fix-claude
Apr 25, 2026
Merged

Switch from Bedrock to Anthropic endpoint as default. Include support for gpt-5.5#118
lewtun merged 6 commits intomainfrom
fix-claude

Conversation

@lewtun
Copy link
Copy Markdown
Member

@lewtun lewtun commented Apr 25, 2026

Summary

When running ml-intern with the default config, the CLI was using bedrock/us.anthropic.claude-opus-4-6-v1. On machines without permission to invoke that Bedrock inference profile, even a trivial prompt failed with a litellm.APIConnectionError wrapping a Bedrock authorization error for bedrock:InvokeModelWithResponseStream.

This PR switches the default path away from Bedrock, adds direct OpenAI GPT-5 model support to the suggested model list and effort validation, and fixes a probe edge case that showed up while testing GPT-5.5.

What changed

  • switch the default model from Bedrock to direct Anthropic (anthropic/claude-opus-4-6)
  • update /model suggestions to surface direct Anthropic and OpenAI options
  • add direct OpenAI suggestions for openai/gpt-5.4 and openai/gpt-5.5
  • widen OpenAI effort validation so xhigh is accepted for GPT-5.4 / GPT-5.5
  • raise the effort probe token budget so GPT-5 models do not fail capability detection with a false output-limit error
  • document OPENAI_API_KEY usage in the README

Testing

Live smoke tests were run against:

  • anthropic/claude-opus-4-6
  • anthropic/claude-opus-4-7
  • openai/gpt-5.4
  • openai/gpt-5.5

Results:

  • Opus 4.6 and 4.7 both accepted max reasoning effort and returned a successful response in streaming and non-streaming modes
  • GPT-5.4 and GPT-5.5 both correctly fell back from max to xhigh and returned a successful response in streaming and non-streaming modes
  • unit tests: uv run pytest tests/unit/test_llm_params.py

cc @jagwar for viz

lewtun and others added 4 commits April 25, 2026 16:08
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

PR Review in Progress

  • Gather context and read changed files
  • Analyze diff and review code
  • Post findings

View job run

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @lewtun's task in 2m 14s —— View job


PR Review

No blocking issues — 1 P1
Verdict: ready to merge

What I checked

  • Default model change: bedrock/us.anthropic.claude-opus-4-6-v1anthropic/claude-opus-4-6 in configs/main_agent_config.json
  • _OPENAI_EFFORTS expansion to include xhigh and cascade correctness end-to-end
  • GPT-5.5 / GPT-5.4 routing through _resolve_llm_params (the openai/ branch, llm_params.py:165-175)
  • _PROBE_MAX_TOKENS bump from 16 → 64 and its effect on the probe cascade
  • pyproject.toml rename (hf-agentml-intern) — checked for residual references in CI, Dockerfile, and Python imports; none break
  • New tests in tests/unit/test_llm_params.py
  • uv.lock — skipped per policy

P1 — Stale user-facing string and comment now misrepresent xhigh as Anthropic-only

The PR adds "xhigh" to _OPENAI_EFFORTS (llm_params.py:75), correctly allowing it to be forwarded to OpenAI models. But two places still tell users (and future readers) that xhigh is Anthropic-only:

agent/main.py:774 — rendered directly in the terminal when a user runs /effort:

'max' and 'xhigh' are Anthropic-only; the cascade falls back to whatever the model actually accepts.

agent/core/effort_probe.py:35 — code comment above _EFFORT_CASCADE:

# try. First success wins.  ``max`` / ``xhigh`` are Anthropic-only; providers
# that don't accept them raise ``UnsupportedEffortError`` synchronously ...

After this PR xhigh is also offered to openai/ models, so both statements are factually wrong. The cascade still works correctly regardless (it falls back if the API rejects the level), but a user running /effort xhigh after switching to openai/gpt-5.5 will see help text that tells them it won't work when it actually will. Fix this →


Notes

  • The _PROBE_MAX_TOKENS 16 → 64 bump is well-motivated: frontier reasoning models can exhaust 16 tokens before completing even a trivial reply, causing a false "output limit reached" misclassification in the probe.
  • Tests: test_openai_max_effort_is_still_rejected uses a bare try/except + raise AssertionError instead of pytest.raises() (P2 style nit, not flagging formally).
  • Package rename (hf-agentml-intern): the only residual hf-agent references are in scripts/build_kpis.py ASCII-art comments (HF Hub dataset names, not Python package references) and frontend/package.json (a separate npm package). Neither is broken by this change.

Co-authored-by: OpenAI Codex <codex@openai.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @lewtun's task in 46s —— View job


Re-review (follow-up commit 623b5af)

No blocking issues
Verdict: ready to merge

P1 fix acknowledged — The stale xhigh messaging flagged in the previous review has been addressed in both locations:

  • agent/main.py:774 now reads 'max' is Anthropic-only; 'xhigh' is also supported by current OpenAI GPT-5 models.
  • agent/core/effort_probe.py:35 comment updated to match.

Both fixes are accurate and consistent with the _OPENAI_EFFORTS expansion in llm_params.py:75. No new P0 findings in this commit.


@lewtun lewtun merged commit 0545e40 into main Apr 25, 2026
1 check passed
@lewtun lewtun deleted the fix-claude branch April 25, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant