Skip to content

bug: intermittent Empty completion with BigModel glm-5-turbo on long Sage runs #2874

@chindris-mihai-alexandru

Description

Summary

I hit repeated retries with:

Retryable(Empty completion received - no content, tool calls, or valid finish reason)

when using Forge against BigModel (open.bigmodel.cn) with glm-5-turbo during a long Sage session.

This seems to happen after a successful upstream connection/response event, but before Forge can build a non-empty final completion.

Environment

  • Forge: local build (0.1.0-dev)
  • OS: macOS
  • Provider: big_model
  • Model: glm-5-turbo
  • Endpoint: https://open.bigmodel.cn/api/paas/v4/chat/completions

What happened

During a deep analysis run, previous turns worked, then at message_count: 82 Forge started retrying the same turn and failed 8 times with the empty completion error.

Relevant log excerpt (sanitized):

{"timestamp":" 297.233908375s","level":"INFO","fields":{"message":"Connecting Upstream","url":"https://open.bigmodel.cn/api/paas/v4/chat/completions","model":"glm-5-turbo","message_count":"82","message_cache_count":"81"}}
{"timestamp":" 300.078292042s","level":"DEBUG","fields":{"message":"Received completion from Upstream"}}
{"timestamp":" 300.078443209s","level":"ERROR","fields":{"message":"Retry attempt due to error","error":"Retryable(Empty completion received - no content, tool calls, or valid finish reason)","model":"glm-5-turbo"}}

Then the same pattern repeated multiple times for the same turn.

Raw SSE capture / replay

I captured/replayed the exact request payload shape (82 messages) via direct curl to BigModel and got valid SSE endings in repeated runs:

  • some runs ended with finish_reason: "stop" + [DONE]
  • some runs ended with finish_reason: "tool_calls" + [DONE]
  • no raw replay run reproduced an empty terminal completion

Example tail (stop):

data: {"choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{...}}
data: [DONE]

Example tail (tool_calls):

data: {"choices":[{"index":0,"finish_reason":"tool_calls","delta":{"role":"assistant","content":""}}],"usage":{...}}
data: [DONE]

Why I think this may be Forge-side (or at least needs better diagnostics)

The error is thrown when the aggregated stream result has:

  • empty content
  • no tool calls
  • no finish reason
  • no thought signature

Given the upstream was connected and a completion event was observed, this is hard to debug without raw response chunk logging at failure time.

Web/docs checks

  • No obvious duplicate issue found in this repo for the exact error text.
  • Z.ai docs note that abnormal SSE termination should indicate reason via finish_reason.

Request

Could maintainers please:

  1. Investigate this edge case in OpenAI-compatible stream aggregation for BigModel/GLM-5, and
  2. Add optional debug logging/dump of terminal parsed chunk state when EmptyCompletion is raised (to distinguish provider empty stream vs local parse/aggregation gap).

Artifacts

Sanitized artifacts are in this gist:

https://gist.github.com/chindris-mihai-alexandru/43f711cbb45e214d3307025d25a28425

(contains core Forge log excerpt, SSE tails, replay summary, and duplicate-search notes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions