Skip to content

Bug: server-side compaction is not emitted on Responses tool-call-only turns #3075

@y-melamed

Description

@y-melamed

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Describe the bug

I am using the Responses API through openai-python with:

  • context_management=[{"type": "compaction", "compact_threshold": 1000}]
  • store=False
  • gpt-5.4

I see different behavior depending on the output type of the turn:

  • For a long plain request, response.output contains:

    • message
    • compaction
  • For a long request that returns only function_call, response.output contains only:

    • function_call
  • If I continue the tool loop and the next turn is again only function_call, there is still no compaction.

  • Only when the model finally returns an assistant message does response.output include:

    • message
    • compaction

This means that in tool-heavy agent loops with several consecutive tool-call turns, context can continue growing without any emitted compaction item, and the loop can eventually hit
context_length_exceeded before compaction appears.

I reproduced this through openai-python using both client.responses.create(...) and client.responses.parse(...).

If this is expected backend/API behavior rather than a Python SDK issue, please let me know and I can move the report.

To Reproduce

  1. Create a long input that is clearly above the compaction threshold.
  2. Enable server-side compaction with a very low threshold, for example:
    context_management=[{"type": "compaction", "compact_threshold": 1000}]
  3. Force the first turn to produce a function_call.
  4. Send the corresponding function_call_output.
  5. If the model produces another function_call, observe that there is still no compaction item in response.output.
  6. Observe that compaction only appears once the model finally emits an assistant message.

Observed output from my repro:

R1 input_tokens 5084
R1 output_types ['function_call']

R2 input_tokens 5119
R2 output_types ['function_call']

R3 input_tokens 5154
R3 output_types ['message', 'compaction']

For comparison, a plain long request with the same threshold produces compaction immediately:

CREATE input_tokens 5007
CREATE output_types ['message', 'compaction']

To Reproduce

import asyncio
from openai import AsyncOpenAI
from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider

AZURE_ENDPOINT = "https://<your-resource>.openai.azure.com/openai/v1/"
MODEL = "gpt-5.4"

async def main():
    cred = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(
        cred,
        "https://cognitiveservices.azure.com/.default"
    )

    client = AsyncOpenAI(
        base_url=AZURE_ENDPOINT,
        api_key=token_provider,
    )

    long_text = "context " * 5000

    tools = [{
        "type": "function",
        "name": "echo_tool",
        "description": "Echo a short string",
        "parameters": {
            "type": "object",
            "properties": {
                "text": {"type": "string"}
            },
            "required": ["text"],
            "additionalProperties": False
        }
    }]

    cm = [{"type": "compaction", "compact_threshold": 1000}]

    conversation = [{
        "role": "user",
        "content": (
            long_text +
            "\n\nCall echo_tool twice in sequence. "
            "First with text=first. After I return the tool result, "
            "call echo_tool again with text=second. "
            "Only after the second tool result, answer DONE."
        )
    }]

    for step in range(1, 5):
        response = await client.responses.create(
            model=MODEL,
            input=conversation,
            tools=tools,
            store=False,
            context_management=cm,
        )

        print(f"R{step} input_tokens:", response.usage.input_tokens)
        print(f"R{step} output_types:", [getattr(i, 'type', None) for i in response.output])

        conversation.extend(response.output)

        function_calls = [i for i in response.output if getattr(i, "type", None) == "function_call"]
        if function_calls:
            for idx, fc in enumerate(function_calls, start=1):
                conversation.append({
                    "type": "function_call_output",
                    "call_id": fc.call_id,
                    "output": f"tool-result-{step}-{idx}",
                })
        else:
            break

    await client.close()
    await cred.close()

asyncio.run(main())

Code snippets

OS

Windows

Python version

3.11.5

Library version

openai 2.21.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions