Multi-turn tool call chat completions conversations by jaredoconnell · Pull Request #712 · vllm-project/guidellm

jaredoconnell · 2026-04-29T18:06:09Z

Summary

This PR adds client-side chat completions conversations to the http backend.

Details

The design is data-driven. The only data passed to the backend is an API field not included in datasets (required vs auto) and the behavior on how to handle missing tool calls.
Allows external datasets or synthetic data. If synthetic data, it's a simple JSON result with a field populated with the same generation logic as any other synthetic data in GuideLLM.
Allows specifying tool calls as auto or required to the model. Good for testing various scenarios. Models behave differently depending on the value set. required is best for predictability.
Allows specifying how to handle missing tool calls. Useful to set whether it's okay or an error condition, and if it is okay, whether to end the conversation early or continue the conversation.

Test Plan

Run the tests
Follow the documentation to run vLLM with tool calls enabled

Related Issues

Resolves Support benchmarking of reasoning and tool calling Chat Completions requests #416

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Adds variable size responses for synthetic data, and better handles edge cases for external datasets. Also improves documentation. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Extracts functionality to new static methods. Assisted-by: Claude Code Sonnet 4.5 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

sjmonson

Not finished reviewing; will add more comments in a bit. Github is acting up and not letting me add to this review for some reason.

sjmonson · 2026-04-30T20:34:35Z

+                # Preserve the original turn index from the column name so
+                # that sparse columns (e.g. tools_0, tools_3) stay aligned
+                # with the turns they belong to.
+                for original_turn, column_name in sorted(turn_columns):
                    column_type = cast("GenerativeDatasetColumnType", column_type)
-                    mappings[(column_type, turn)].append((index, column_name))
+                    mappings[(column_type, original_turn)].append((index, column_name))


I'm fine with this change since I debated this behavior initially, but remove the comment and rename the var back to turn since original_turn is meaningless to someone who is not looking at the original code.

sjmonson · 2026-04-30T20:42:53Z

+    stop_conversation: bool = Field(
+        default=False,
+        description=(
+            "When True, the worker cancels all remaining turns in the "
+            "conversation after this request completes successfully. "
+            "Set by the backend when a turn's result means the conversation "
+            "should not continue (e.g. expected tool call not produced)."
+        ),
+    )


We already use exceptions to trigger early stops from the backend so I prefer that route. "error_stop" can throw any exception and "ignore_stop" might work to throw asyncio.CancelledError (See future comment for examples).

sjmonson · 2026-04-30T20:44:21Z

Don't need these changes with above suggestion.

sjmonson · 2026-05-01T18:46:28Z

+        # Tool calling requires the model to stop naturally after producing
+        # valid JSON; ignore_eos would force generation past that point and
+        # break the server's constrained decoding grammar.
+        body.pop("ignore_eos", None)
+        body.pop("stop", None)
+
+        # On tool-call turns, let the model finish valid JSON naturally;
+        # max_completion_tokens would truncate output mid-JSON and corrupt
+        # the arguments sent in conversation history on follow-up turns.
+        if data.expects_tool_call:
+            body.pop("max_completion_tokens", None)
+            body.pop("max_tokens", None)


This does not seem right. Either all of these key pops belong under if data.expects_tool_call or none of them do. If you pop ignore_eos but not max_tokens then you end up with the same problem as is mentioned in the first comment.

Its also a very bad idea to remove these keys unless we absolutely have to because it could destroy benchmark reproducibility. So if you haven't already I would suggest you double check the behavior.

sjmonson · 2026-05-01T19:32:33Z

+# Tool calling configuration
+@click.option(
+    "--tool-choice",
+    type=str,
+    default=None,
+    help=(
+        'Tool choice mode: "required", "auto", or "none". '
+        "Controls whether the model is forced to produce tool calls on "
+        "tool-call turns. Overrides the per-request default (required) set "
+        "when tools come from the dataset."
+    ),
+)
+@click.option(
+    "--tool-call-missing-behavior",
+    type=click.Choice(["ignore_continue", "ignore_stop", "error_stop"]),
+    default=None,
+    help=(
+        "What the worker does when a tool call is expected but the model "
+        "does not produce one. ignore_continue: continue to next turn, "
+        "ignore_stop: cancel remaining turns, error_stop: error and "
+        "cancel remaining turns. Default: error_stop."
+    ),
+)


No new top-level args. Handle this in --backend-kwargs

sjmonson · 2026-05-01T19:35:58Z

This should be its own guide, not inserted into the generic Multiturn one.

These are the diffs recommended in the comments. They are untested, and require some follow-up changes. Co-authored-by: Samuel Monson <smonson@irbash.net> Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>

mergify · 2026-05-05T01:42:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jaredoconnell.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Moves documentation. Switches fully to exceptions to stop conversations early. Also includes info gained from vLLM contributor. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell · 2026-05-05T18:09:10Z

jaredoconnell · 2026-05-05T18:09:32Z

 ]

+# Short placeholder used when no tool_response_tokens size is configured.
+DEFAULT_SYNTHETIC_TOOL_RESPONSE = '{"status": "ok"}'


jaredoconnell · 2026-05-05T18:10:06Z

@@ -434,6 +449,12 @@ async def _process_next_request(  # noqa: C901
                logger.opt(exception=True).debug(
                    f"Backend exception for request {request_info.request_id}"
                )
+                # Cancel remaining conversation turns on backend error
+                for skip_req, skip_info in conversation:
+                    skip_info.error = f"Cancelled: {request_info.error}"
+                    skip_info.timings.resolve_end = time.time()
+                    self._send_update("cancelled", None, skip_req, skip_info)
+                conversation.clear()
        finally:
            if request_info is not None:


What do you think of this design?

jaredoconnell added 7 commits April 29, 2026 13:56

AI Generated Continuation-based client-side tool call support

ea40711

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Redesign tool call design to make it entirely data driven

93ec8ee

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Improve handling of multi-turn data

6ef35ce

Adds variable size responses for synthetic data, and better handles edge cases for external datasets. Also improves documentation. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Clarify the type of tool call support

aa4c929

Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Use a more generic design and improve docs

3709831

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Fix linter errors and test

00771fe

Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

Refactor code to improve clarity

aa90a2b

Extracts functionality to new static methods. Assisted-by: Claude Code Sonnet 4.5 Signed-off-by: Jared O'Connell <joconnel@redhat.com>

sjmonson requested changes May 1, 2026

View reviewed changes

Apply suggestions from code review

6bdcf61

These are the diffs recommended in the comments. They are untested, and require some follow-up changes. Co-authored-by: Samuel Monson <smonson@irbash.net> Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>

mergify Bot added the needs-rebase label May 5, 2026

Addressed review feedback and integrated prior suggestions

96018c1

Moves documentation. Switches fully to exceptions to stop conversations early. Also includes info gained from vLLM contributor. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>

jaredoconnell commented May 5, 2026

View reviewed changes

Conversation

jaredoconnell commented Apr 29, 2026

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants