feat(agent): support live streaming output with Markdown rendering by kowyo · Pull Request #56 · kowyo/mini-agent

kowyo · 2026-04-25T16:03:16Z

Description

Adds live streaming output so users see AI responses token-by-token as they arrive, rather than waiting for the full response. Markdown rendering is preserved during streaming via rich.live.Live.

How it works

Text blocks → re-rendered incrementally with rich.live.Live + Markdown — code fences, headings, bold, lists all appear live as the model writes them
Thinking blocks → streamed inline with dim styling and immediate stdout flush for zero-delay feel
Tool-use blocks → accumulated silently during the stream, then executed exactly as before

Implementation

_stream_response() iterates Anthropic RawStreamEvent items and drives live display
_StreamedResponse thin wrapper replaces the Message object previously obtained from stream.get_final_message()
_flush_thinking() prints thinking deltas with sys.stdout.flush() for immediacy
Bails out of the Thinking… spinner before streaming begins so it doesn't fight with Live

Before/After

Before	After
`stream.get_final_message()` blocks entire response	`for event in stream:` emits per-token
Full output appears at once	Output appears incrementally, live
Markdown rendered once at end	Markdown re-rendered on each text delta

Type of change

New feature (non-breaking change which adds functionality)

Test

ruff check and ruff format pass
commitizen check passes
Import-time smoke test passes
Structural type checks: _StreamedResponse, stream kwarg building for all effort levels

gemini-code-assist

Code Review

This pull request introduces live streaming for agent responses, utilizing rich.live to display text and thinking blocks in real-time. The core logic is refactored into a new _stream_response function that handles various content block deltas. Review feedback identifies a formatting issue where rendering Markdown for every thinking delta causes incorrect vertical alignment. Additionally, a critical bug was found where returning MessageDeltaUsage instead of the final aggregated message results in missing input_tokens and potential runtime errors during token tracking. Improvements to type hints for usage objects were also suggested.

gemini-code-assist · 2026-04-25T16:05:04Z

+
+def _flush_thinking(text: str) -> None:
+    """Print thinking text immediately so it appears live."""
+    console.print(Markdown(text), end="", style=LIGHT_HINT_STYLE_RICH)


Using Markdown(text) for every thinking delta is incorrect for streaming. Markdown is a block-level element in rich and will append a newline after every call, causing the thinking output to render vertically (one delta per line). Additionally, it is inefficient to create a new Markdown object for every token. Since thinking blocks are intended to be streamed inline with dim styling, you should print the raw text directly.

Suggested change

console.print(Markdown(text), end="", style=LIGHT_HINT_STYLE_RICH)

console.print(text, end="", style=LIGHT_HINT_STYLE_RICH)

gemini-code-assist · 2026-04-25T16:05:04Z

+        _stop_live()
+
+    return _StreamedResponse(content_blocks, usage, stop_reason)


The current implementation of _stream_response has a critical bug: it returns a MessageDeltaUsage object (captured from the message_delta event) which lacks input_tokens. This causes an AttributeError in agent_loop at line 239.

Instead of manually accumulating usage and content, you should use stream.get_final_message() after the loop. This ensures you get the fully aggregated message with complete usage statistics (including cache hits/misses) and correctly parsed tool inputs.

Suggested change

_stop_live()

return _StreamedResponse(content_blocks, usage, stop_reason)

_stop_live()

final_msg = stream.get_final_message()

return _StreamedResponse(final_msg.content, final_msg.usage, final_msg.stop_reason)

gemini-code-assist · 2026-04-25T16:05:04Z

+        usage: anthropic.types.Usage
+        | anthropic.types.message_delta_usage.MessageDeltaUsage
+        | None,


The MessageDeltaUsage type should be removed from the type hint. If the response is built correctly using the final state of the stream, the usage will always be a full anthropic.types.Usage object. Including MessageDeltaUsage is misleading as that type lacks critical fields like input_tokens which are required by the token tracker.

usage: anthropic.types.Usage | None,

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-25T16:06:14Z

+            elif event.type == "message_delta":
+                stop_reason = event.delta.stop_reason or stop_reason
+                usage = event.usage
+
+        _stop_live()


🔴 Input and cache token tracking always reports 0 because message_start event usage is never captured

The _stream_response function only captures usage from message_delta events (line 177), but the Anthropic streaming API reports input_tokens, cache_creation_input_tokens, and cache_read_input_tokens in the message_start event, not in message_delta. The MessageDeltaUsage object returned by message_delta has all of these fields defaulting to None (verified against anthropic SDK v0.94.0). As a result, input_tokens is always None → 0 (via or 0 at line 241), and cache tokens are always None → 0 (via getattr(..., 0) or 0 at lines 237-238). The old code used stream.get_final_message() which internally combined usage from both message_start and message_delta into a complete Usage object. The new manual event loop needs to also handle message_start events to capture the input token counts.

MessageDeltaUsage defaults

MessageDeltaUsage(output_tokens=100) yields:

input_tokens: None

cache_creation_input_tokens: None

cache_read_input_tokens: None

These are all Optional[int] fields that default to None, unlike Usage where input_tokens is required.

Prompt for agents

The _stream_response function in src/mini_agent/agent/agent.py needs to also handle message_start events to capture input token usage. Currently only message_delta events are handled (lines 175-179), which provides a MessageDeltaUsage object that has input_tokens=None, cache_creation_input_tokens=None, and cache_read_input_tokens=None. The fix should: 1. Add handling for event.type == 'message_start' in the event loop (around line 107). 2. From the message_start event, extract event.message.usage which is an anthropic.types.Usage object containing the actual input_tokens and cache token counts. 3. Store this initial usage separately (e.g. as initial_usage). 4. When building the final _StreamedResponse, combine the input token info from message_start with the output_tokens from message_delta. One approach: store both usages and merge them, or just track input_tokens/cache tokens from message_start and output_tokens from message_delta separately, then construct a single Usage object to return. The old code used stream.get_final_message() which internally handled this merging. The new manual event processing must replicate that behavior.

Was this helpful? React with 👍 or 👎 to provide feedback.

cubic-dev-ai

3 issues found across 1 file

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/mini_agent/agent/agent.py">

<violation number="1" location="src/mini_agent/agent/agent.py:61">
P2: `Markdown(text)` is a block-level renderable in Rich — each call produces a self-contained block with trailing newlines. Using it for every tiny thinking delta means each token is rendered as a separate Markdown paragraph, causing vertical stacking instead of inline streaming. Since thinking output is meant to stream inline with dim styling, print the raw text directly instead.</violation>

<violation number="2" location="src/mini_agent/agent/agent.py:104">
P1: The `Live` display is not cleaned up if an exception occurs during streaming. Wrap the streaming loop in `try/finally` to ensure `_stop_live()` is always called, preventing terminal corruption on errors or `KeyboardInterrupt`.</violation>

<violation number="3" location="src/mini_agent/agent/agent.py:177">
P1: `message_delta` events carry a `MessageDeltaUsage` object that only contains `output_tokens`. The `input_tokens`, `cache_creation_input_tokens`, and `cache_read_input_tokens` fields are reported in the `message_start` event's `Usage` object, which this event loop never handles. As a result, `usage.input_tokens` downstream will either raise `AttributeError` or silently report 0, breaking all input/cache token tracking.

Add a handler for `message_start` events to capture the initial `Usage` (e.g., `event.message.usage`), then merge the input-side counts from `message_start` with the `output_tokens` from `message_delta` when constructing `_StreamedResponse`. The previous `stream.get_final_message()` call did this merge internally.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-25T16:08:59Z

+        )
+        live_display.start()
+
+    with client.messages.stream(**stream_kwargs) as stream:


P1: The Live display is not cleaned up if an exception occurs during streaming. Wrap the streaming loop in try/finally to ensure _stop_live() is always called, preventing terminal corruption on errors or KeyboardInterrupt.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/mini_agent/agent/agent.py, line 104: <comment>The `Live` display is not cleaned up if an exception occurs during streaming. Wrap the streaming loop in `try/finally` to ensure `_stop_live()` is always called, preventing terminal corruption on errors or `KeyboardInterrupt`.</comment> <file context> @@ -35,6 +38,149 @@ + ) + live_display.start() + + with client.messages.stream(**stream_kwargs) as stream: + for event in stream: + if event.type == "content_block_start": </file context>

Stream AI responses token-by-token while keeping the SDK-built Message object. The _display_stream_events() function handles only visual output (Live Markdown for text, inline flush for thinking), and stream.get_final_message() still provides the proper SDK-built response for tool execution — no manual block reconstruction. - Add _display_stream_events() to drive live display from stream events - Use stream.get_final_message() for the correctly-built Message object - Stop the Thinking spinner before streaming so it doesn't fight Live - Guard against None input_tokens from MessageDeltaUsage Co-authored-by: kowyo-bot <258374017+kowyo-bot@users.noreply.github.com>

Each thinking delta was wrapped in Markdown() which caused Rich to emit unwanted trailing newlines per chunk. Print raw strings instead and use console.print() (not bare print()) for the final newline after the thinking block ends, so Rich's line tracking stays consistent. Co-authored-by: kowyo-bot <258374017+kowyo-bot@users.noreply.github.com>

The newline emitted by console.print() at content_block_stop can be buffered and lost when Rich's Live display starts immediately after for the next text block. Explicitly flush stdout after each block-stop newline. Co-authored-by: kowyo-bot <258374017+kowyo-bot@users.noreply.github.com>

…etween thinking and text Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n nested if

kowyo added the enhancement New feature or request label Apr 25, 2026

kowyo marked this pull request as draft April 25, 2026 16:04

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 25, 2026

View reviewed changes

kowyo force-pushed the feat/streaming-output branch from df7e053 to 23a23cf Compare April 25, 2026 16:09

kowyo force-pushed the feat/streaming-output branch from 23a23cf to 82535a5 Compare April 25, 2026 16:12

kowyo and others added 4 commits April 26, 2026 00:15

refactor(agent): simplify _display_stream_events and add empty line b…

3cf6413

…etween thinking and text Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(agent): render streamed thinking as Markdown

fdfa7ba

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

kowyo force-pushed the feat/streaming-output branch from 1bf0f6f to fdfa7ba Compare April 25, 2026 16:41

kowyo added 2 commits April 26, 2026 12:25

fix: resolve linting issues - remove unused block_type var and flatte…

a60abad

…n nested if

refactor: move display_stream_events to cli/display/stream module

2544ea1

kowyo closed this Apr 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): support live streaming output with Markdown rendering#56

feat(agent): support live streaming output with Markdown rendering#56
kowyo wants to merge 7 commits intomainfrom
feat/streaming-output

kowyo commented Apr 25, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Apr 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	console.print(Markdown(text), end="", style=LIGHT_HINT_STYLE_RICH)
	console.print(text, end="", style=LIGHT_HINT_STYLE_RICH)

		_stop_live()

		return _StreamedResponse(content_blocks, usage, stop_reason)

Conversation

kowyo commented Apr 25, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How it works

Implementation

Before/After

Type of change

Test

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kowyo commented Apr 25, 2026 •

edited by devin-ai-integration Bot

Loading

cubic-dev-ai Bot Apr 25, 2026 •

edited

Loading