Skip to content

Support OpenAI Chat Completions tool_calls/role=tool in guideline trajectory parsing #223

@gaodan-fang

Description

@gaodan-fang

Summary

altk-evolve guideline generation currently parses OpenAI Agents SDK-style assistant function calls, but does not fully support OpenAI Chat Completions tool-calling transcript format.

Evidence (current main)

Parser: altk_evolve/llm/guidelines/guidelines.py (parse_openai_agents_trajectory)

Current behavior:

  1. Assistant tool calls are parsed only when message.role == "assistant" and message.content is a list with items of {"type":"function_call", ...}.
  2. Top-level Chat Completions field assistant.tool_calls is not parsed.
  3. role == "tool" messages (tool outputs keyed by tool_call_id) are not parsed into observations.
  4. If first user message content is non-string, parser raises (First user message was not a task instruction.), which is strict for modern message variants.

save_trajectory in altk_evolve/frontend/mcp/mcp_server.py passes the loaded messages directly to generate_guidelines(messages), so parser coverage determines what reaches guideline synthesis.

Why this matters

Many agents produce standard OpenAI Chat Completions transcripts:

  • assistant: {role:"assistant", content:"", tool_calls:[...]}
  • tool: {role:"tool", tool_call_id:"...", content:"..."}

With current parsing, these trajectories lose tool-action/observation information in guideline generation.

Requested change

Update parse_openai_agents_trajectory to support both formats:

  1. OpenAI Chat Completions format
  • Parse assistant.tool_calls into action steps.
  • Parse role="tool" messages into observation steps and associate by tool_call_id where possible.
  1. OpenAI Agents SDK format (existing)
  • Keep support for assistant.content list blocks with type="function_call".
  1. Compatibility and normalization
  • Accept mixed transcripts safely.
  • Be tolerant of dict/list/string content variants without hard-failing the whole trajectory.
  • Keep unknown/unsupported blocks non-fatal (skip + log), instead of raising for every unexpected content item.

Suggested acceptance criteria

  1. tests/unit/test_guidelines.py adds parser coverage for:
  • assistant top-level tool_calls (Chat Completions)
  • role="tool" observation messages
  • mixed agents-style + chat-completions-style transcripts
  • non-string first user message handling fallback
  1. generate_guidelines trajectory summary includes:
  • action lines for tool invocations
  • observation lines for tool outputs/errors
  1. No regression for existing agents-style test cases.

Notes

This keeps altk-evolve general across agents and avoids requiring agent-specific side channels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions