Skip to content

feat: accumulate function_call events in streaming path#59

Merged
franciscojavierarceo merged 4 commits into
vllm-project:mainfrom
ashwing:feat/function-call-accumulation
Jun 17, 2026
Merged

feat: accumulate function_call events in streaming path#59
franciscojavierarceo merged 4 commits into
vllm-project:mainfrom
ashwing:feat/function-call-accumulation

Conversation

@ashwing

@ashwing ashwing commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

The streaming path's ResponseAccumulator handles text message events but didn't yet have handlers for function_call events — they fell through the wildcard arm. This adds the missing accumulation so streaming responses include FunctionCall output items, matching the blocking JSON path.

Prerequisite for #51 (execute_loop needs FunctionCall items available in the streaming path to dispatch tools).

Before

A client sends a streaming request with tools:

{"model": "meta-llama/...", "stream": true, "input": "What's the weather?", "tools": [{"type": "function", "name": "get_weather", ...}]}

vLLM responds with function_call SSE events, but the accumulator ignores them. The client gets back:

{"status": "completed", "output": []}

After

The same request now produces:

{
  "status": "completed",
  "output": [{
    "type": "function_call",
    "id": "fc_1",
    "call_id": "call_abc",
    "name": "get_weather",
    "arguments": "{\"location\":\"Paris\"}",
    "status": "completed"
  }]
}

What's handled

  • OutputItemAdded with item_type == "function_call" — starts a new in-flight FunctionToolCall
  • FunctionCallArgumentsDelta — appends to argument buffer
  • FunctionCallArgumentsDone — finalizes with authoritative arguments, pushes to output
  • Multiple function calls in one response (parallel tool use)
  • Interleaved message → function_call → message ordering preserved
  • Stream disconnect mid-arguments — partial args retained via forced finalize
  • Orphaned deltas (no active function call) — safely cleared before next call
  • Coexists with reasoning accumulation (PR feat: support reasoning output items #57) — unified match with three-way type dispatch

Test Plan

Unit tests (8 in accumulator.rs):

  • Full lifecycle, parallel tool use, interleaved with messages, orphaned deltas, forced finalize on disconnect

Cassette integration tests (12 in accumulator_cassette_test.rs):

  • tool_choice=auto streaming + non-streaming (parallel tool calls)
  • tool_choice=required streaming + non-streaming
  • tool_choice=named streaming + non-streaming
  • tool_choice=none streaming + non-streaming (zero function calls)
  • Reasoning streaming — Qwen3 (reasoning + message)
  • Reasoning streaming — GPT-oss (reasoning only, see note below)
  • Legacy Gemma4 function_call + text-only regression guard

Non-streaming tests exercise the from_json path against the same cassettes.

All tests pass, cargo fmt --check + cargo clippy -- -D warnings clean.

Note: GPT-oss streaming gap

While testing the reasoning cassettes, I found that GPT-oss doesn't emit response.output_item.added for the message item after reasoning — it jumps from reasoning_text.doneoutput_text.doneoutput_item.done. The streaming accumulator can't capture that message because no output_item.added creates the in-flight state. Qwen3 emits the event correctly. Not blocking — the full output is in the response.completed payload. Filed as #62 for follow-up.

@ashwing

ashwing commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

@maralbahari This is the function_call accumulation PR we discussed in #51. Ready for review when you get a chance.

Covers the full streaming lifecycle: OutputItemAdded(function_call) → argument deltas → FunctionCallArgumentsDone → finalized OutputItem::FunctionCall in the response payload. Also handles parallel tool calls, interleaving with text messages, and stream disconnect with partial args.

Once this lands, #51's dispatch_tools will have real FunctionCall items to work with in the streaming path.

@maralbahari

Copy link
Copy Markdown
Collaborator

@ashwing thank you for the quick PR.
I have added #60 to capture all the scenarios of request payload for variety of function_call options as tool_choices to vLLM upstream and recorded them. so we can cover comprehensive tests. could you please try them out.

franciscojavierarceo pushed a commit that referenced this pull request Jun 17, 2026
## Summary
- Adds `--tools` and `--tool-choice` options to `record_cassette.py` to
inject tool definitions and tool_choice into requests
- Adds `tool_calls/tools.json` with 8 tool definitions (get_weather,
get_time, get_stock_price, search_web, translate_text, calculate,
send_email, read_file)
- Adds `record_tool_call_cassettes.sh` to record 8 cassettes covering
all four tool_choice modes (auto, none, required, named) × streaming and
non-streaming

This recorded cassettes in this PR assist to test
#59

## How to Record

**1. Start vLLM with tool-call support:**

```bash
vllm serve Qwen/Qwen3-30B-A3B-FP8 \
    --tool-call-parser hermes \
    --enable-auto-tool-choice \
    --port 5050
```

**2. Run the cassette recorder:**

```bash
VLLM_URL=http://0.0.0.0:5050 MODEL=Qwen/Qwen3-30B-A3B-FP8 bash tests/cassettes/record_tool_call_cassettes.sh
```

---------

Signed-off-by: maral <maralbahari.98@gmail.com>
ashwing added 4 commits June 17, 2026 13:39
The ResponseAccumulator's process_event previously dropped all
function_call SSE events via the wildcard arm, causing streaming
responses to lose tool-call output items. This adds handlers for
OutputItemAdded(function_call), FunctionCallArgumentsDelta, and
FunctionCallArgumentsDone — matching the blocking JSON path's
behavior and unblocking execute_loop for streaming tool dispatch.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…tion

Feeds real vLLM SSE recordings (gemma-4-26B) through the full
accumulator pipeline and verifies OutputItem::FunctionCall is
produced with correct name, arguments, call_id, and usage.
Also adds a text-only regression guard ensuring no function_call
items leak from text-only streams.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…lator

Add 10 new cassette tests covering:
- All tool_choice modes streaming (auto, required, named, none)
- All tool_choice modes non-streaming (validates from_json path)
- Reasoning streaming (Qwen3 and GPT-oss)

Validates the accumulator correctly handles real multi-tool streaming
responses and non-streaming JSON responses from multiple model families.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
@franciscojavierarceo franciscojavierarceo merged commit e1b9f6d into vllm-project:main Jun 17, 2026
3 checks passed
@ashwing

ashwing commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

@maralbahari thanks! I rebased on top of #60 and added cassette tests that exercise all 4 tool_choice modes through the accumulator:

  • auto streaming + non-streaming (parallel tool calls — get_stock_price + search_web)
  • required streaming + non-streaming
  • named streaming + non-streaming
  • none streaming + non-streaming (asserts zero function calls)

Also added reasoning cassette tests (Qwen3 + GPT-oss) as a regression guard — found an interesting edge case with GPT-oss where it skips output_item.added for the message after reasoning. Filed #62 for follow-up.

12 cassette tests total, all passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants