feat: accumulate function_call events in streaming path by ashwing · Pull Request #59 · vllm-project/agentic-api

ashwing · 2026-06-17T05:45:01Z

Summary

The streaming path's ResponseAccumulator handles text message events but didn't yet have handlers for function_call events — they fell through the wildcard arm. This adds the missing accumulation so streaming responses include FunctionCall output items, matching the blocking JSON path.

Prerequisite for #51 (execute_loop needs FunctionCall items available in the streaming path to dispatch tools).

Before

A client sends a streaming request with tools:

{"model": "meta-llama/...", "stream": true, "input": "What's the weather?", "tools": [{"type": "function", "name": "get_weather", ...}]}

vLLM responds with function_call SSE events, but the accumulator ignores them. The client gets back:

{"status": "completed", "output": []}

After

The same request now produces:

{
  "status": "completed",
  "output": [{
    "type": "function_call",
    "id": "fc_1",
    "call_id": "call_abc",
    "name": "get_weather",
    "arguments": "{\"location\":\"Paris\"}",
    "status": "completed"
  }]
}

What's handled

OutputItemAdded with item_type == "function_call" — starts a new in-flight FunctionToolCall
FunctionCallArgumentsDelta — appends to argument buffer
FunctionCallArgumentsDone — finalizes with authoritative arguments, pushes to output
Multiple function calls in one response (parallel tool use)
Interleaved message → function_call → message ordering preserved
Stream disconnect mid-arguments — partial args retained via forced finalize
Orphaned deltas (no active function call) — safely cleared before next call
Coexists with reasoning accumulation (PR feat: support reasoning output items #57) — unified match with three-way type dispatch

Test Plan

Unit tests (8 in accumulator.rs):

Full lifecycle, parallel tool use, interleaved with messages, orphaned deltas, forced finalize on disconnect

Cassette integration tests (12 in accumulator_cassette_test.rs):

tool_choice=auto streaming + non-streaming (parallel tool calls)
tool_choice=required streaming + non-streaming
tool_choice=named streaming + non-streaming
tool_choice=none streaming + non-streaming (zero function calls)
Reasoning streaming — Qwen3 (reasoning + message)
Reasoning streaming — GPT-oss (reasoning only, see note below)
Legacy Gemma4 function_call + text-only regression guard

Non-streaming tests exercise the from_json path against the same cassettes.

All tests pass, cargo fmt --check + cargo clippy -- -D warnings clean.

Note: GPT-oss streaming gap

While testing the reasoning cassettes, I found that GPT-oss doesn't emit response.output_item.added for the message item after reasoning — it jumps from reasoning_text.done → output_text.done → output_item.done. The streaming accumulator can't capture that message because no output_item.added creates the in-flight state. Qwen3 emits the event correctly. Not blocking — the full output is in the response.completed payload. Filed as #62 for follow-up.

ashwing · 2026-06-17T06:09:51Z

@maralbahari This is the function_call accumulation PR we discussed in #51. Ready for review when you get a chance.

Covers the full streaming lifecycle: OutputItemAdded(function_call) → argument deltas → FunctionCallArgumentsDone → finalized OutputItem::FunctionCall in the response payload. Also handles parallel tool calls, interleaving with text messages, and stream disconnect with partial args.

Once this lands, #51's dispatch_tools will have real FunctionCall items to work with in the streaming path.

maralbahari · 2026-06-17T06:44:49Z

@ashwing thank you for the quick PR.
I have added #60 to capture all the scenarios of request payload for variety of function_call options as tool_choices to vLLM upstream and recorded them. so we can cover comprehensive tests. could you please try them out.

## Summary - Adds `--tools` and `--tool-choice` options to `record_cassette.py` to inject tool definitions and tool_choice into requests - Adds `tool_calls/tools.json` with 8 tool definitions (get_weather, get_time, get_stock_price, search_web, translate_text, calculate, send_email, read_file) - Adds `record_tool_call_cassettes.sh` to record 8 cassettes covering all four tool_choice modes (auto, none, required, named) × streaming and non-streaming This recorded cassettes in this PR assist to test #59 ## How to Record **1. Start vLLM with tool-call support:** ```bash vllm serve Qwen/Qwen3-30B-A3B-FP8 \ --tool-call-parser hermes \ --enable-auto-tool-choice \ --port 5050 ``` **2. Run the cassette recorder:** ```bash VLLM_URL=http://0.0.0.0:5050 MODEL=Qwen/Qwen3-30B-A3B-FP8 bash tests/cassettes/record_tool_call_cassettes.sh ``` --------- Signed-off-by: maral <maralbahari.98@gmail.com>

The ResponseAccumulator's process_event previously dropped all function_call SSE events via the wildcard arm, causing streaming responses to lose tool-call output items. This adds handlers for OutputItemAdded(function_call), FunctionCallArgumentsDelta, and FunctionCallArgumentsDone — matching the blocking JSON path's behavior and unblocking execute_loop for streaming tool dispatch. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

…tion Feeds real vLLM SSE recordings (gemma-4-26B) through the full accumulator pipeline and verifies OutputItem::FunctionCall is produced with correct name, arguments, call_id, and usage. Also adds a text-only regression guard ensuring no function_call items leak from text-only streams. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

Signed-off-by: Ashwin Giridharan <girida@amazon.com>

…lator Add 10 new cassette tests covering: - All tool_choice modes streaming (auto, required, named, none) - All tool_choice modes non-streaming (validates from_json path) - Reasoning streaming (Qwen3 and GPT-oss) Validates the accumulator correctly handles real multi-tool streaming responses and non-streaming JSON responses from multiple model families. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing · 2026-06-17T21:28:22Z

@maralbahari thanks! I rebased on top of #60 and added cassette tests that exercise all 4 tool_choice modes through the accumulator:

auto streaming + non-streaming (parallel tool calls — get_stock_price + search_web)
required streaming + non-streaming
named streaming + non-streaming
none streaming + non-streaming (asserts zero function calls)

Also added reasoning cassette tests (Qwen3 + GPT-oss) as a regression guard — found an interesting edge case with GPT-oss where it skips output_item.added for the message after reasoning. Filed #62 for follow-up.

12 cassette tests total, all passing.

ashwing marked this pull request as ready for review June 17, 2026 06:09

ashwing requested review from bbrowning, franciscojavierarceo, jiahuei, leseb, maralbahari, noobHappylife, qandrew and tjtanaa as code owners June 17, 2026 06:09

ashwing mentioned this pull request Jun 17, 2026

feat: add tool dispatch layer — ToolContext, traits, and LoopDecision #51

Closed

maralbahari mentioned this pull request Jun 17, 2026

test support: add function_call cassettes for vLLM response mode #60

Merged

ashwing added 4 commits June 17, 2026 13:39

fix: clippy too_many_lines and fmt for merged state machine

b044b28

Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing force-pushed the feat/function-call-accumulation branch from 35241da to 569a1f5 Compare June 17, 2026 21:02

ashwing mentioned this pull request Jun 17, 2026

GPT-oss streaming: message not captured when output_item.added is missing #62

Closed

franciscojavierarceo approved these changes Jun 17, 2026

View reviewed changes

franciscojavierarceo merged commit e1b9f6d into vllm-project:main Jun 17, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: accumulate function_call events in streaming path#59

feat: accumulate function_call events in streaming path#59
franciscojavierarceo merged 4 commits into
vllm-project:mainfrom
ashwing:feat/function-call-accumulation

ashwing commented Jun 17, 2026 •

edited

Loading

Uh oh!

ashwing commented Jun 17, 2026

Uh oh!

maralbahari commented Jun 17, 2026

Uh oh!

Uh oh!

ashwing commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ashwing commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

What's handled

Test Plan

Note: GPT-oss streaming gap

Uh oh!

ashwing commented Jun 17, 2026

Uh oh!

maralbahari commented Jun 17, 2026

Uh oh!

Uh oh!

ashwing commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ashwing commented Jun 17, 2026 •

edited

Loading