Bug: server-side compaction is not emitted on Responses tool-call-only turns

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [x] This is an issue with the Python library

### Describe the bug

 ## Describe the bug
 
 I am using the Responses API through `openai-python` with:
 
 - `context_management=[{"type": "compaction", "compact_threshold": 1000}]`
 - `store=False`
 - `gpt-5.4`
 
 I see different behavior depending on the output type of the turn:
 
 - For a long plain request, `response.output` contains:
   - `message`
   - `compaction`
 
 - For a long request that returns only `function_call`, `response.output` contains only:
   - `function_call`
 
 - If I continue the tool loop and the next turn is again only `function_call`, there is still no `compaction`.
 
 - Only when the model finally returns an assistant `message` does `response.output` include:
   - `message`
   - `compaction`
 
 This means that in tool-heavy agent loops with several consecutive tool-call turns, context can continue growing without any emitted `compaction` item, and the loop can eventually hit 
`context_length_exceeded` before compaction appears.
 
 I reproduced this through `openai-python` using both `client.responses.create(...)` and `client.responses.parse(...)`.
 
 If this is expected backend/API behavior rather than a Python SDK issue, please let me know and I can move the report.
 
 ## To Reproduce
 
 1. Create a long input that is clearly above the compaction threshold.
 2. Enable server-side compaction with a very low threshold, for example:
    `context_management=[{"type": "compaction", "compact_threshold": 1000}]`
 3. Force the first turn to produce a `function_call`.
 4. Send the corresponding `function_call_output`.
 5. If the model produces another `function_call`, observe that there is still no `compaction` item in `response.output`.
 6. Observe that `compaction` only appears once the model finally emits an assistant `message`.
 
 Observed output from my repro:
 
 ```text
 R1 input_tokens 5084
 R1 output_types ['function_call']
 
 R2 input_tokens 5119
 R2 output_types ['function_call']
 
 R3 input_tokens 5154
 R3 output_types ['message', 'compaction']

For comparison, a plain long request with the same threshold produces compaction immediately:

 CREATE input_tokens 5007
 CREATE output_types ['message', 'compaction']
 ```


### To Reproduce


 ```python
 import asyncio
 from openai import AsyncOpenAI
 from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider
 
 AZURE_ENDPOINT = "https://<your-resource>.openai.azure.com/openai/v1/"
 MODEL = "gpt-5.4"
 
 async def main():
     cred = DefaultAzureCredential()
     token_provider = get_bearer_token_provider(
         cred,
         "https://cognitiveservices.azure.com/.default"
     )
 
     client = AsyncOpenAI(
         base_url=AZURE_ENDPOINT,
         api_key=token_provider,
     )
 
     long_text = "context " * 5000
 
     tools = [{
         "type": "function",
         "name": "echo_tool",
         "description": "Echo a short string",
         "parameters": {
             "type": "object",
             "properties": {
                 "text": {"type": "string"}
             },
             "required": ["text"],
             "additionalProperties": False
         }
     }]
 
     cm = [{"type": "compaction", "compact_threshold": 1000}]
 
     conversation = [{
         "role": "user",
         "content": (
             long_text +
             "\n\nCall echo_tool twice in sequence. "
             "First with text=first. After I return the tool result, "
             "call echo_tool again with text=second. "
             "Only after the second tool result, answer DONE."
         )
     }]
 
     for step in range(1, 5):
         response = await client.responses.create(
             model=MODEL,
             input=conversation,
             tools=tools,
             store=False,
             context_management=cm,
         )
 
         print(f"R{step} input_tokens:", response.usage.input_tokens)
         print(f"R{step} output_types:", [getattr(i, 'type', None) for i in response.output])
 
         conversation.extend(response.output)
 
         function_calls = [i for i in response.output if getattr(i, "type", None) == "function_call"]
         if function_calls:
             for idx, fc in enumerate(function_calls, start=1):
                 conversation.append({
                     "type": "function_call_output",
                     "call_id": fc.call_id,
                     "output": f"tool-result-{step}-{idx}",
                 })
         else:
             break
 
     await client.close()
     await cred.close()
 
 asyncio.run(main())
 ```


### Code snippets

```Python

```

### OS

Windows

### Python version

3.11.5

### Library version

openai 2.21.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: server-side compaction is not emitted on Responses tool-call-only turns #3075

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Describe the bug

To Reproduce

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: server-side compaction is not emitted on Responses tool-call-only turns #3075

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

Describe the bug

To Reproduce

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions