Skip to content

Don't ship 'strict': false on tool definitions when the tool didn't opt in#7782

Open
alvinttang wants to merge 1 commit into
microsoft:mainfrom
alvinttang:fix/openai-tool-strict-omit-default
Open

Don't ship 'strict': false on tool definitions when the tool didn't opt in#7782
alvinttang wants to merge 1 commit into
microsoft:mainfrom
alvinttang:fix/openai-tool-strict-omit-default

Conversation

@alvinttang
Copy link
Copy Markdown

Was poking at AgentChat 0.4.7+ with vLLM and got hit by #5814extra_forbidden — loc=('body','tools',0,'function','strict'). Same story other folks have been reporting with Qwen/DashScope and Mistral via LiteLLM.

convert_tools() in autogen-ext's OpenAI client always shipped strict in the function payload. The dead-looking guard if "strict" in tool_schema at line 264 was, well, dead — BaseTool.schema always sets strict=<bool>, so every request went out with the key. OpenAI accepts it. Almost nobody else does.

@ekzhu had agreed in the thread that strict should be optional. This patch just makes the serialization match: build the FunctionDefinition without strict, then add function_def["strict"] = True only when the tool actually opted in. FunctionDefinition is TypedDict(total=False) so absence is valid, and OpenAI's documented default for the missing key is False — same semantics, just doesn't trip the other servers' schema validators.

Reproduction in test

test_convert_tools_omits_strict_when_not_enabled builds a FunctionTool with no strict flag and asserts "strict" not in function_def. Companion test_convert_tools_includes_strict_when_enabled covers the opt-in case.

Without the fix:

AssertionError: convert_tools must not emit 'strict' on function definitions
Got: {'name': 'echo', 'description': '...', 'parameters': {...}, 'strict': False}

After the fix: 2/2 pass. Updated test_tool_calling's _expected_function_payload helper to strip strict from the expected dict so the existing assertions still hold. Full pytest tests/models/: 161 passed, 64 skipped. ruff check + ruff format clean. pyright delta: 0 (24 pre-existing errors unchanged).

Notes

  • Strict-tool tests (test_create_args_with_strict_tools) keep emitting strict=True and stay green.
  • count_tokens_openai doesn't read function["strict"], so the token-counting path is unaffected.
  • Behaviour for an OpenAI endpoint with a non-strict tool: identical (omitted == False).
  • Behaviour for OpenAI-compatible endpoints (vLLM/Qwen/Mistral): no longer rejected with extra_forbidden.

Refs #5814

…ible servers

OpenAI-compatible chat-completion servers (vLLM, Qwen-Tongyi/DashScope,
Mistral via LiteLLM, ...) reject any unknown field on the function tool
definition with:

    extra_forbidden — 'loc': ('body', 'tools', 0, 'function', 'strict')

``convert_tools`` was emitting ``strict: false`` for every tool that did
not explicitly opt in, breaking these endpoints. OpenAI's own API treats
``strict`` as optional with an implicit default of ``False`` when the key
is absent — so dropping the key for the non-strict case preserves OpenAI
semantics while restoring compatibility with everyone else.

Behaviour:
  * Tool with ``strict=False`` (default) -> ``strict`` key absent.
  * Tool with ``strict=True`` -> ``strict: true`` preserved.

Refs microsoft#5814
Copy link
Copy Markdown

@t0ugh-sys t0ugh-sys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Thank you for this well-scoped fix! The change is clean and addresses a real compatibility problem that many users have hit (vLLM, Qwen/DashScope, Mistral via LiteLLM). A few observations:

✅ What looks good

  1. Clear rationale and comment. The inline comment explains why strict is omitted rather than just what is done, and links the relevant issue (#5814). This is excellent for future maintainers.

  2. Minimal, focused diff. The production change is ~8 lines — only convert_tools() is touched. No unnecessary refactoring, no scope creep.

  3. Good test coverage. Two new dedicated tests cover both the omission (strict not present) and the opt-in (strict=True propagated) paths. The existing test_tool_calling assertions are updated cleanly with a helper to keep them DRY.

  4. Correct use of tool_schema.get("strict"). Using .get() instead of the previous if "strict" in tool_schema guard is both safer and correctly treats any falsy value (including explicit False) as "don't emit."

⚠️ Potential concerns

  1. Edge case: strict=0 or other falsy-but-intentional values. tool_schema.get("strict") will silently drop strict: 0 or strict: "". This is almost certainly fine in practice (nobody sets strict=0), but worth noting. If you wanted to be maximally explicit, if tool_schema.get("strict") is True would make the intent even clearer — though the current approach is pragmatic and correct for all realistic inputs.

  2. FunctionDefinition is a TypedDict(total=False) — mutation after construction. The code builds function_def without strict, then conditionally sets function_def["strict"] = True. This works because TypedDict(total=False) allows missing keys, and runtime TypedDict instances are just plain dicts in Python. However, static type checkers may not fully track the conditional addition. Since you mention pyright delta is 0, this seems fine in practice — just flagging it.

  3. No changelog entry. This is a user-facing behavioral change (even if the previous behavior was arguably a bug). Consider adding a changelog/note so users of downstream tools know strict: false will no longer appear in API payloads. This may matter for users who were (incorrectly) relying on the field's presence.

  4. The helper in test_tool_calling could be slightly more defensive. _expected_function_payload filters by k != "strict", which assumes strict is the only key that might be conditionally present. If more keys are added this way in the future, the helper would need updating. This is a minor nit — it's fine for now.

💡 Suggestion

Consider adding a brief comment near the .get("strict") check noting that BaseTool.schema always includes strict (even when False), so the .get() returning None only happens for raw dict schemas that don't include the field. This would help future readers understand why the condition works for both FunctionTool and raw schema paths.

Summary

This is a solid, well-tested fix for a real interoperability issue. The change is minimal, the tests are thorough, and the reasoning is sound. LGTM with minor suggestions above. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants