Summary
When using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner) in streaming mode, planning and reasoning content appears twice in responses when tool calls are made:
- First during streaming as individual text chunks (lines 1288-1296)
- Again in the aggregated tool-call message with
content=text (line 1352)
This violates OpenAI/LiteLLM conventions and creates unnecessary duplication in conversation history.
Environment
- ADK Version: 1.19.0
- Affected File: lite_llm.py
- Python Version: 3.11+
- Models Affected: All non-Gemini models accessed via LiteLLM (Claude, GPT, etc.) when using planning workflows
- Feature: Streaming responses with tool calls
Expected Behavior
According to OpenAI/LiteLLM API specifications:
- When a message contains only tool calls (no user-facing answer text), the
content field should be None
- Planning/reasoning text like
<PLANNING>I need to search...</PLANNING> is internal reasoning, not the final answer
- Tool-call messages should follow this structure:
{
"role": "assistant",
"content": None, # No content for tool-only messages
"tool_calls": [...]
}
Actual Behavior
The aggregated response at line 1348-1359 sets content=text, including all accumulated planning/reasoning text:
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=text, # Includes planning text, causing duplication
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)
Result: Planning text appears twice:
- During streaming (lines 1288-1296):
<PLANNING>I need to search...</PLANNING> streamed chunk-by-chunk
- In aggregated message (line 1352): Same text included in
content field
Impact
1. Content Duplication
- Frontend receives the same planning text twice
- Requires additional filtering logic in application code
- Poor user experience if not handled
2. API Convention Violation
- OpenAI/Claude/GPT APIs expect
content=None for tool-only messages
- Current implementation sends
content=<planning_text>, which is semantically incorrect
- Tool-call messages should not contain answer text in
content
3. Conversation History Bloat
- Planning text unnecessarily stored in message
content field
- Already preserved separately in
thought_parts (line 1357)
- Increases storage and memory overhead
4. Semantic Confusion
content=text implies "model generated answer text AND called tools"
- Reality: model only generated internal reasoning before calling tools
- Misrepresents the actual interaction flow
Steps to Reproduce
- Create an agent with LiteLLM model:
from google.adk.agents import Agent
from google.adk.models import LiteLlm
from google.adk.planners import PlanReActPlanner
agent = Agent(
model=LiteLlm(model="vertex_ai/claude-3-5-sonnet-v2@20241022"),
planner=PlanReActPlanner(),
tools=[search_tool, ...]
)
- Enable streaming and send a query requiring tools:
async for response in agent.run_streaming("What's the weather in Boston?"):
print(response.content)
-
Observe in logs:
- Planning text like
<PLANNING>I need to search for weather</PLANNING> streamed as chunks
- Same planning text appears again in aggregated response
content field when tool calls are made
-
Check conversation history:
- Tool-call message has
content="<PLANNING>..." instead of content=None
Root Cause
Lines 1268-1303: Accumulate all text chunks (including planning) into text variable:
text = ""
...
elif isinstance(chunk, TextChunk):
text += chunk.text # Accumulates planning/reasoning text
yield _message_to_generate_content_response(...) # Already streamed to user
Line 1352: Includes accumulated text again in aggregated message:
content=text, # Duplicates already-streamed planning text
Line 1357: Planning already preserved separately:
thought_parts=list(reasoning_parts) if reasoning_parts else None,
Proposed Fix
Change line 1352 to set content=None for tool-only messages:
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=None, # ✅ FIX: No duplication, follows OpenAI/LiteLLM spec
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)
Comparison with Non-Streaming
Non-streaming path (around line 770) correctly handles this by:
- Creating a single response with complete tool call information
- No opportunity for duplication (no incremental streaming)
Streaming path (lines 1268-1400) has the duplication issue because:
- Text chunks yielded immediately during streaming
- Then included again in final aggregated message
The fix brings streaming behavior in line with non-streaming and API conventions.
Additional Context
- This issue specifically affects planning workflows where models generate reasoning text before calling tools
- Does not affect simple tool-call scenarios without planning text
thought_parts parameter already exists to preserve reasoning separately from message content
- Frontend applications using ADK planning need to implement workarounds to deduplicate content
Recommended Fix (Summary)
# Line 1348-1359
aggregated_llm_response_with_tool_call = (
_message_to_generate_content_response(
ChatCompletionAssistantMessage(
role="assistant",
content=None, # ✅ FIX: Avoid duplication, follow OpenAI spec
tool_calls=tool_calls,
),
model_version=part.model,
thought_parts=list(reasoning_parts)
if reasoning_parts
else None,
)
)
This single-line change eliminates content duplication, aligns with API standards, and maintains semantic correctness for tool-call messages in streaming responses.
Summary
When using LiteLLM models with ADK's planning features (e.g.,
PlanReActPlanner) in streaming mode, planning and reasoning content appears twice in responses when tool calls are made:content=text(line 1352)This violates OpenAI/LiteLLM conventions and creates unnecessary duplication in conversation history.
Environment
Expected Behavior
According to OpenAI/LiteLLM API specifications:
contentfield should beNone<PLANNING>I need to search...</PLANNING>is internal reasoning, not the final answer{ "role": "assistant", "content": None, # No content for tool-only messages "tool_calls": [...] }Actual Behavior
The aggregated response at line 1348-1359 sets
content=text, including all accumulated planning/reasoning text:Result: Planning text appears twice:
<PLANNING>I need to search...</PLANNING>streamed chunk-by-chunkcontentfieldImpact
1. Content Duplication
2. API Convention Violation
content=Nonefor tool-only messagescontent=<planning_text>, which is semantically incorrectcontent3. Conversation History Bloat
contentfieldthought_parts(line 1357)4. Semantic Confusion
content=textimplies "model generated answer text AND called tools"Steps to Reproduce
Observe in logs:
<PLANNING>I need to search for weather</PLANNING>streamed as chunkscontentfield when tool calls are madeCheck conversation history:
content="<PLANNING>..."instead ofcontent=NoneRoot Cause
Lines 1268-1303: Accumulate all text chunks (including planning) into
textvariable:Line 1352: Includes accumulated text again in aggregated message:
Line 1357: Planning already preserved separately:
Proposed Fix
Change line 1352 to set
content=Nonefor tool-only messages:Comparison with Non-Streaming
Non-streaming path (around line 770) correctly handles this by:
Streaming path (lines 1268-1400) has the duplication issue because:
The fix brings streaming behavior in line with non-streaming and API conventions.
Additional Context
thought_partsparameter already exists to preserve reasoning separately from message contentRecommended Fix (Summary)
This single-line change eliminates content duplication, aligns with API standards, and maintains semantic correctness for tool-call messages in streaming responses.