Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
Describe the bug
I am using the Responses API through openai-python with:
context_management=[{"type": "compaction", "compact_threshold": 1000}]
store=False
gpt-5.4
I see different behavior depending on the output type of the turn:
-
For a long plain request, response.output contains:
-
For a long request that returns only function_call, response.output contains only:
-
If I continue the tool loop and the next turn is again only function_call, there is still no compaction.
-
Only when the model finally returns an assistant message does response.output include:
This means that in tool-heavy agent loops with several consecutive tool-call turns, context can continue growing without any emitted compaction item, and the loop can eventually hit
context_length_exceeded before compaction appears.
I reproduced this through openai-python using both client.responses.create(...) and client.responses.parse(...).
If this is expected backend/API behavior rather than a Python SDK issue, please let me know and I can move the report.
To Reproduce
- Create a long input that is clearly above the compaction threshold.
- Enable server-side compaction with a very low threshold, for example:
context_management=[{"type": "compaction", "compact_threshold": 1000}]
- Force the first turn to produce a
function_call.
- Send the corresponding
function_call_output.
- If the model produces another
function_call, observe that there is still no compaction item in response.output.
- Observe that
compaction only appears once the model finally emits an assistant message.
Observed output from my repro:
R1 input_tokens 5084
R1 output_types ['function_call']
R2 input_tokens 5119
R2 output_types ['function_call']
R3 input_tokens 5154
R3 output_types ['message', 'compaction']
For comparison, a plain long request with the same threshold produces compaction immediately:
CREATE input_tokens 5007
CREATE output_types ['message', 'compaction']
To Reproduce
import asyncio
from openai import AsyncOpenAI
from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider
AZURE_ENDPOINT = "https://<your-resource>.openai.azure.com/openai/v1/"
MODEL = "gpt-5.4"
async def main():
cred = DefaultAzureCredential()
token_provider = get_bearer_token_provider(
cred,
"https://cognitiveservices.azure.com/.default"
)
client = AsyncOpenAI(
base_url=AZURE_ENDPOINT,
api_key=token_provider,
)
long_text = "context " * 5000
tools = [{
"type": "function",
"name": "echo_tool",
"description": "Echo a short string",
"parameters": {
"type": "object",
"properties": {
"text": {"type": "string"}
},
"required": ["text"],
"additionalProperties": False
}
}]
cm = [{"type": "compaction", "compact_threshold": 1000}]
conversation = [{
"role": "user",
"content": (
long_text +
"\n\nCall echo_tool twice in sequence. "
"First with text=first. After I return the tool result, "
"call echo_tool again with text=second. "
"Only after the second tool result, answer DONE."
)
}]
for step in range(1, 5):
response = await client.responses.create(
model=MODEL,
input=conversation,
tools=tools,
store=False,
context_management=cm,
)
print(f"R{step} input_tokens:", response.usage.input_tokens)
print(f"R{step} output_types:", [getattr(i, 'type', None) for i in response.output])
conversation.extend(response.output)
function_calls = [i for i in response.output if getattr(i, "type", None) == "function_call"]
if function_calls:
for idx, fc in enumerate(function_calls, start=1):
conversation.append({
"type": "function_call_output",
"call_id": fc.call_id,
"output": f"tool-result-{step}-{idx}",
})
else:
break
await client.close()
await cred.close()
asyncio.run(main())
Code snippets
OS
Windows
Python version
3.11.5
Library version
openai 2.21.0
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
Describe the bug
I am using the Responses API through
openai-pythonwith:context_management=[{"type": "compaction", "compact_threshold": 1000}]store=Falsegpt-5.4I see different behavior depending on the output type of the turn:
For a long plain request,
response.outputcontains:messagecompactionFor a long request that returns only
function_call,response.outputcontains only:function_callIf I continue the tool loop and the next turn is again only
function_call, there is still nocompaction.Only when the model finally returns an assistant
messagedoesresponse.outputinclude:messagecompactionThis means that in tool-heavy agent loops with several consecutive tool-call turns, context can continue growing without any emitted
compactionitem, and the loop can eventually hitcontext_length_exceededbefore compaction appears.I reproduced this through
openai-pythonusing bothclient.responses.create(...)andclient.responses.parse(...).If this is expected backend/API behavior rather than a Python SDK issue, please let me know and I can move the report.
To Reproduce
context_management=[{"type": "compaction", "compact_threshold": 1000}]function_call.function_call_output.function_call, observe that there is still nocompactionitem inresponse.output.compactiononly appears once the model finally emits an assistantmessage.Observed output from my repro:
To Reproduce
Code snippets
OS
Windows
Python version
3.11.5
Library version
openai 2.21.0