Add verbose flow logging for agent tool calls and results#243
Conversation
`assembly -v live` only showed the httpx request lines because the deepagents
brain answers each spoken turn with a single blocking `graph.invoke(...)` that
runs the whole tool loop internally — so a turn that stalled mid-tool ("Let me
search…" then nothing) was undebuggable.
When verbose logging is active (`debuglog.active()`) and the graph can stream,
`brain._run_graph` now drives it as incremental state snapshots and logs each
tool call (name + args), tool result (truncated), and interim assistant line as
it lands, reusing the coding agent's `events.message_events` vocabulary. The
non-verbose path and the invoke-only test fakes keep the plain `invoke` branch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01T3YQmYYiZDEZkDvWqXYk6D
| if isinstance(event, ToolCall): | ||
| _FLOW_LOG.info("tool call %s args=%s", event.name, event.args) | ||
| elif isinstance(event, ToolResult): | ||
| _FLOW_LOG.info("tool result %s -> %s", event.name, _clip(event.content)) |
There was a problem hiding this comment.
tool result content is logged verbatim (event.content via _clip). Sanitize, escape, or redact sensitive fields before logging.
Details
✨ AI Reasoning
The change logs tool results at INFO level: _FLOW_LOG.info("tool result %s -> %s", event.name, _clip(event.content)). Even though _clip truncates length, it does not sanitize or escape CR/LF, templates, or sensitive content. Logging raw external content risks personal-data leakage and log injection.
🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
There was a problem hiding this comment.
Addressed the actionable part in eb60cff: _clip now collapses all whitespace (CR/LF included) to single spaces before truncating, so untrusted tool output can't forge extra [aai_cli.…] log lines.
On the redaction part: this is opt-in -v debug output to the user's own stderr on a single-user local CLI, deliberately showing the tool content the user asked to see — and registered secrets (API key / session JWT) are already masked across every record by the debuglog redacting formatter. So beyond the newline-injection fix, redacting the content further would defeat the feature without closing a real leak.
Generated by Claude Code
Untrusted tool output (a fetched page, a search payload) is logged in the verbose flow log; embedded CR/LF could forge fake `[aai_cli.…]` log lines. `_clip` now collapses all whitespace to single spaces before truncating, so each result stays on one line and can't inject log lines. (Secrets remain masked separately by the debuglog formatter.) Addresses the Aikido review note on PR #243. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01T3YQmYYiZDEZkDvWqXYk6D
Summary
Adds structured logging of the agent's tool-call loop when running in verbose mode (
-v), making it possible to debug stalled spoken turns by seeing which tools the agent reached for and what they returned.Changes
brain.py:_run_graph()to conditionally stream the graph's state snapshots in verbose mode instead of invoking it once, enabling incremental logging_log_flow()to extract and log tool calls, tool results, and interim assistant messages from state snapshots using the coding agent's event vocabulary_clip()to truncate long tool outputs (capped at 500 chars) so a single large result doesn't bury the rest of the flow logbuild_completer()via_run_graph()instead of directinvoke()aai_cli.agent_cascade.brainlogger so plain-vsurfaces themtest_agent_cascade_brain.py:_StreamingGraphtest fixture that yields scripted state snapshots to verify streaming behaviortest_run_graph_streams_and_logs_flow_when_verbose()to verify the loop streams withstream_mode="values"and logs tool calls with args, tool results (clipped), and interim assistant linestest_run_graph_invokes_when_not_verbose()to verify non-verbose mode uses plaininvoke()test_run_graph_invokes_when_graph_cannot_stream()to verify fallback toinvoke()when graph lacksstream()methodtest_log_flow_ignores_non_list_messages()for defensive handlingtest_clip_passes_short_text_and_truncates_long_text()to verify truncation behaviorToolMessageimport to module level (used in existing test)Implementation Details
debuglog.active()returnsTrue(verbose mode) AND the graph has astream()method, ensuring backward compatibility with test fakes and non-streaming graphs_log_flow()reuses the coding agent'smessage_events()vocabulary (AssistantText, ToolCall, ToolResult) so the flow log speaks the same language as the TUIseencounter) ensures each snapshot only logs newly-added messages, avoiding duplicates across growing state snapshotshttps://claude.ai/code/session_01T3YQmYYiZDEZkDvWqXYk6D