You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When EventsCompactionConfig is configured on an App, the compaction mechanism correctly summarizes old events and appends compaction events to the session. However, Runner._get_or_create_session still calls get_session without any GetSessionConfig, loading the full event history on every invocation.
This means compaction only reduces the LLM context window — it does nothing to reduce the session loading overhead. As conversations grow longer, get_session latency degrades linearly, even though compaction has already produced summaries that could make the old raw events unnecessary for the Runner.
The architectural disconnect
EventsCompactionConfig (configured on App)
→ Compacts events for LLM context ✅
→ Does NOT reduce get_session load ❌
Runner._get_or_create_session (runners.py L375)
→ Calls get_session() WITHOUT GetSessionConfig
→ Loads ALL events every time
→ Latency grows linearly with conversation length
The GetSessionConfig class already supports num_recent_events and after_timestamp filtering, and all session service implementations (InMemorySessionService, DatabaseSessionService, VertexAiSessionService) honor these parameters. But the Runner never uses them.
Use DatabaseSessionService (PostgreSQL) or any persistent session service
Have a multi-turn conversation (10+ rounds)
Observe that get_session becomes progressively slower on each new invocation
Real-world numbers
In our production setup with a 6-stage sequential pipeline agent (Science Navigator), each conversation turn generates ~480 events (streaming chunks, tool calls, tool responses, text finals). After 10 turns:
Compaction events exist in the session but the Runner still loads all 4,800 raw events
Expected behavior
When EventsCompactionConfig is configured, the Runner should be aware that compacted summaries exist and avoid loading the full event history. Possible approaches:
Runner should pass GetSessionConfig to get_session — After compaction has run, the Runner knows which events are summarized. It should only load the compaction event(s) + recent un-compacted events.
EventsCompactionConfig should inform session loading — The compaction config could derive appropriate GetSessionConfig parameters (e.g., only load events after the last compaction timestamp).
The root cause is that compaction and session loading are designed as two independent systems:
Compaction (in apps/compaction.py) runs post-invocation and appends summary events
Session loading (in runners.py) runs pre-invocation and loads everything
There is no feedback loop between them. Even after compaction has run successfully, the next invocation still loads all original events + the compaction event.
PR #3662 would help as a workaround (letting users manually set num_recent_events), but the ideal fix is for the Runner to automatically leverage compaction metadata to optimize session loading — making EventsCompactionConfig a true end-to-end optimization rather than just an LLM context optimization.
Describe the bug
When
EventsCompactionConfigis configured on anApp, the compaction mechanism correctly summarizes old events and appends compaction events to the session. However,Runner._get_or_create_sessionstill callsget_sessionwithout anyGetSessionConfig, loading the full event history on every invocation.This means compaction only reduces the LLM context window — it does nothing to reduce the session loading overhead. As conversations grow longer,
get_sessionlatency degrades linearly, even though compaction has already produced summaries that could make the old raw events unnecessary for the Runner.The architectural disconnect
The
GetSessionConfigclass already supportsnum_recent_eventsandafter_timestampfiltering, and all session service implementations (InMemorySessionService,DatabaseSessionService,VertexAiSessionService) honor these parameters. But the Runner never uses them.To Reproduce
EventsCompactionConfig:DatabaseSessionService(PostgreSQL) or any persistent session serviceget_sessionbecomes progressively slower on each new invocationReal-world numbers
In our production setup with a 6-stage sequential pipeline agent (Science Navigator), each conversation turn generates ~480 events (streaming chunks, tool calls, tool responses, text finals). After 10 turns:
get_sessiontakes 70+ seconds (viaHttpSessionService→ PostgreSQLDatabaseSessionService)Expected behavior
When
EventsCompactionConfigis configured, the Runner should be aware that compacted summaries exist and avoid loading the full event history. Possible approaches:Runner should pass
GetSessionConfigtoget_session— After compaction has run, the Runner knows which events are summarized. It should only load the compaction event(s) + recent un-compacted events.EventsCompactionConfigshould inform session loading — The compaction config could derive appropriateGetSessionConfigparameters (e.g., only load events after the last compaction timestamp).At minimum, expose
GetSessionConfigviaRunConfig— As proposed in Allow limiting num. of Session events fetched when calling Runner.run_async #3562 and PR feat(runners): Add get_session_config property to RunConfig #3662, allow users to control how many events are loaded. But ideally the framework should do this automatically when compaction is configured.Environment
DatabaseSessionService(PostgreSQL, accessed via HTTP microservice)SequentialAgentwithParallelAgentsub-stagesRelated issues
VertexAiSessionService.get_sessionbecomes super slow with many eventsRunner.run_asyncfeat(runners): Add get_session_config property to RunConfig(still open)Analysis
The root cause is that compaction and session loading are designed as two independent systems:
apps/compaction.py) runs post-invocation and appends summary eventsrunners.py) runs pre-invocation and loads everythingThere is no feedback loop between them. Even after compaction has run successfully, the next invocation still loads all original events + the compaction event.
PR #3662 would help as a workaround (letting users manually set
num_recent_events), but the ideal fix is for the Runner to automatically leverage compaction metadata to optimize session loading — makingEventsCompactionConfiga true end-to-end optimization rather than just an LLM context optimization.