Skip to content

message_not_in_streaming_state after undocumented timeouts hit #1859

@beaugunderson

Description

@beaugunderson

Reproducible in:

$ pip freeze | grep slack
slack-sdk==3.41.0

$ python --version
Python 3.14.2

Reproduces in production on Python 3.14 on Linux.

The Slack SDK version

slack-sdk==3.41.0

Python runtime version

Python 3.14.2

OS info

macOS 26.4.1 (Darwin 25.4.0) for local repro; Linux (Aptible) in production. Behavior is server-side, so the host OS shouldn't matter.

Steps to reproduce:

Any long-running AI-agent bot that calls chat.appendStream over a window of more than a few minutes reliably reproduces this. Minimal case:

from slack_sdk import WebClient
import time

client = WebClient(token=...)
stream = client.chat_stream(
    channel=CHANNEL,
    thread_ts=THREAD_TS,
    recipient_team_id=TEAM_ID,
    recipient_user_id=USER_ID,
)
for i in range(60):
    stream.append(markdown_text=f"tick {i}\n")
    time.sleep(15)  # 15 minutes total
stream.stop()
  1. Call chat_stream(...) and capture the ChatStream helper.
  2. Call stream.append(...) periodically (even with active traffic — this is not just an idle issue).
  3. At some point before the loop finishes, stream.append() raises SlackApiError with {'ok': False, 'error': 'message_not_in_streaming_state'}, and stream.stop() then raises the same error, leaving the message frozen in Slack's UI as a gray "Something went wrong" pill indefinitely.

Expected result:

One or more of (ranked by usefulness):

  1. Document the empirical TTL / idle behavior on the chat.startStream / chat.appendStream / chat.stopStream reference pages and/or the ChatStream module docstring. Even a "streams may be closed by the server after an undocumented window; recommend rotating every N minutes" note would save teams days of debugging.
  2. Expose ChatStream.started_at / ChatStream.age (or equivalent) so callers can implement proactive rotation without subclassing.
  3. Add an optional auto_rotate_after: timedelta | None = None parameter that stops the current stream and starts a fresh one under the hood before the server kills it.
  4. Classify message_not_in_streaming_state as a distinct, typed exception (e.g. StreamExpiredError) rather than a generic SlackApiError, so apps can branch on it without string-matching the error code.

Actual result:

chat.appendStream raises:

slack_sdk.errors.SlackApiError: The request to the Slack API failed.
(url: https://slack.com/api/chat.appendStream)
The server responded with: {'ok': False, 'error': 'message_not_in_streaming_state'}

chat.stopStream called on the same message afterward returns the same error, so the orphaned streaming message cannot be cleanly closed — Slack's UI then renders it as a permanent "Something went wrong" pill.

Real traceback from our production logs (Slack bot investigating a user question):

Traceback (most recent call last):
  File "/app/main.py", line 920, in _send_keepalive
    self._append_with_retry(chunks)
  File "/app/main.py", line 943, in _append_with_retry
    self._client.chat_appendStream(
        channel=self._channel,
        ts=ts,
        chunks=chunks,
    )
  File "/app/.venv/lib/python3.14/site-packages/slack_sdk/web/client.py", line 2654, in chat_appendStream
    return self.api_call("chat.appendStream", json=kwargs)
  ...
slack_sdk.errors.SlackApiError: The request to the Slack API failed.
(url: https://slack.com/api/chat.appendStream)
The server responded with: {'ok': False, 'error': 'message_not_in_streaming_state'}

This is a widespread production issue. Several AI-agent bots are independently working around it in production:

  • AuraHQ-ai/aura#421 — observed "undocumented ~30s idle timeout", 38 occurrences in 6 days. Workaround: 20s keepalive during tool calls.
  • AuraHQ-ai/aura#177 — unhandled message_not_in_streaming_state crashed response pipeline.
  • AuraHQ-ai/aura#702 — ~96 errors in 2 weeks, 67% silent data loss with a naïve post-message fallback. Workaround: capture the frozen stream's ts and chat.update it with the finalized content.
  • hrygo/hotplex-legacy#237 — observed ~5 minute wallclock TTL even with active traffic. Workaround: proactive rotation at ~4 min.
  • hrygo/hotplex-legacy#336 — P1 report after a 70+ turn session.
  • getsentry/junior#202 — Sentry's internal Slack bot tracking the same error.

Observed empirical timeouts span roughly 30 seconds of idle to ~5 minutes wallclock across these reports — they diverge enough that both an idle timer and a hard lifetime appear to be in play on the server side.

I've checked: the chat.appendStream, chat.startStream, chat.stopStream and chat_stream module docs, plus the changelog from Oct 2025 through Apr 2026 — none document a stream lifetime or error recovery path.

I understand the underlying timeout is a platform/server-side behavior; I'm filing here because (a) the SDK is where the ChatStream helper lives, (b) this is the right place to at least document the behavior and surface a rotation primitive, and (c) slackapi/python-slack-sdk is the repo teams actually find when debugging message_not_in_streaming_state. If this needs to go to platform feedback instead, happy to cross-post — let me know.

Requirements

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions