Summary
altk-evolve guideline generation currently parses OpenAI Agents SDK-style assistant function calls, but does not fully support OpenAI Chat Completions tool-calling transcript format.
Evidence (current main)
Parser: altk_evolve/llm/guidelines/guidelines.py (parse_openai_agents_trajectory)
Current behavior:
- Assistant tool calls are parsed only when
message.role == "assistant" and message.content is a list with items of {"type":"function_call", ...}.
- Top-level Chat Completions field
assistant.tool_calls is not parsed.
role == "tool" messages (tool outputs keyed by tool_call_id) are not parsed into observations.
- If first user message content is non-string, parser raises (
First user message was not a task instruction.), which is strict for modern message variants.
save_trajectory in altk_evolve/frontend/mcp/mcp_server.py passes the loaded messages directly to generate_guidelines(messages), so parser coverage determines what reaches guideline synthesis.
Why this matters
Many agents produce standard OpenAI Chat Completions transcripts:
- assistant:
{role:"assistant", content:"", tool_calls:[...]}
- tool:
{role:"tool", tool_call_id:"...", content:"..."}
With current parsing, these trajectories lose tool-action/observation information in guideline generation.
Requested change
Update parse_openai_agents_trajectory to support both formats:
- OpenAI Chat Completions format
- Parse
assistant.tool_calls into action steps.
- Parse
role="tool" messages into observation steps and associate by tool_call_id where possible.
- OpenAI Agents SDK format (existing)
- Keep support for
assistant.content list blocks with type="function_call".
- Compatibility and normalization
- Accept mixed transcripts safely.
- Be tolerant of dict/list/string content variants without hard-failing the whole trajectory.
- Keep unknown/unsupported blocks non-fatal (skip + log), instead of raising for every unexpected content item.
Suggested acceptance criteria
tests/unit/test_guidelines.py adds parser coverage for:
- assistant top-level
tool_calls (Chat Completions)
role="tool" observation messages
- mixed agents-style + chat-completions-style transcripts
- non-string first user message handling fallback
generate_guidelines trajectory summary includes:
- action lines for tool invocations
- observation lines for tool outputs/errors
- No regression for existing agents-style test cases.
Notes
This keeps altk-evolve general across agents and avoids requiring agent-specific side channels.
Summary
altk-evolveguideline generation currently parses OpenAI Agents SDK-style assistant function calls, but does not fully support OpenAI Chat Completions tool-calling transcript format.Evidence (current
main)Parser:
altk_evolve/llm/guidelines/guidelines.py(parse_openai_agents_trajectory)Current behavior:
message.role == "assistant"andmessage.contentis a list with items of{"type":"function_call", ...}.assistant.tool_callsis not parsed.role == "tool"messages (tool outputs keyed bytool_call_id) are not parsed into observations.First user message was not a task instruction.), which is strict for modern message variants.save_trajectoryinaltk_evolve/frontend/mcp/mcp_server.pypasses the loaded messages directly togenerate_guidelines(messages), so parser coverage determines what reaches guideline synthesis.Why this matters
Many agents produce standard OpenAI Chat Completions transcripts:
{role:"assistant", content:"", tool_calls:[...]}{role:"tool", tool_call_id:"...", content:"..."}With current parsing, these trajectories lose tool-action/observation information in guideline generation.
Requested change
Update
parse_openai_agents_trajectoryto support both formats:assistant.tool_callsinto action steps.role="tool"messages into observation steps and associate bytool_call_idwhere possible.assistant.contentlist blocks withtype="function_call".Suggested acceptance criteria
tests/unit/test_guidelines.pyadds parser coverage for:tool_calls(Chat Completions)role="tool"observation messagesgenerate_guidelinestrajectory summary includes:Notes
This keeps
altk-evolvegeneral across agents and avoids requiring agent-specific side channels.