Skip to content

feat(sessions): TTL enforcement and crash-safe cleanup#31

Merged
Danelegend merged 2 commits into
mainfrom
dane/ll-session-ttl
May 6, 2026
Merged

feat(sessions): TTL enforcement and crash-safe cleanup#31
Danelegend merged 2 commits into
mainfrom
dane/ll-session-ttl

Conversation

@Danelegend
Copy link
Copy Markdown
Contributor

Sessions now carry an explicit TTL and are guaranteed to be torn down at or before that deadline even if the API service crashes.

  • CreateSessionRequest gains ttl_seconds (default 900s, max 24h); CreateSessionResponse adds the absolute expires_at timestamp.
  • Docker session containers are launched with --rm and sleep <ttl> so the container self-destructs at the deadline regardless of whether the API service is alive. The TTL also goes onto the container as the code-interpreter.expires-at label for the reaper.
  • Kubernetes session pods set activeDeadlineSeconds=ttl so kubelet stops the executor container at the deadline; the deadline timestamp is also stored as a pod annotation so the reaper knows when to delete.
  • A new background reaper task (in main.py's lifespan) runs once at startup — handling crash recovery for any sessions whose TTL elapsed while the service was down — and then every 30s thereafter.
  • Both backends implement reap_expired_sessions (label-filtered list followed by per-session deletion) so the reaper is backend-agnostic.

Tests cover TTL bounds at the route layer, the active_deadline / expires-at metadata on each backend, and reaper behavior under happy path, missing annotation, invalid annotation, list failure, and partial delete-failure scenarios.

Comment thread code-interpreter/app/services/executor_docker.py Outdated
@Danelegend Danelegend force-pushed the dane/ll-session-create-delete branch from 9de943a to e7b077a Compare May 6, 2026 20:36
Base automatically changed from dane/ll-session-create-delete to main May 6, 2026 20:37
Sessions now carry an explicit TTL and are guaranteed to be torn down at
or before that deadline even if the API service crashes.

* CreateSessionRequest gains ``ttl_seconds`` (default 900s, max 24h);
  CreateSessionResponse adds the absolute ``expires_at`` timestamp.
* Docker session containers are launched with ``--rm`` and ``sleep <ttl>``
  so the container self-destructs at the deadline regardless of whether
  the API service is alive. The TTL also goes onto the container as the
  ``code-interpreter.expires-at`` label for the reaper.
* Kubernetes session pods set ``activeDeadlineSeconds=ttl`` so kubelet
  stops the executor container at the deadline; the deadline timestamp
  is also stored as a pod annotation so the reaper knows when to delete.
* A new background reaper task (in main.py's lifespan) runs once at
  startup — handling crash recovery for any sessions whose TTL elapsed
  while the service was down — and then every 30s thereafter.
* Both backends implement ``reap_expired_sessions`` (label-filtered
  list followed by per-session deletion) so the reaper is backend-agnostic.

Tests cover TTL bounds at the route layer, the active_deadline /
expires-at metadata on each backend, and reaper behavior under happy
path, missing annotation, invalid annotation, list failure, and partial
delete-failure scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Danelegend Danelegend force-pushed the dane/ll-session-ttl branch from 3e45ea8 to a9cbd51 Compare May 6, 2026 20:39
Replaces a string-concatenation expression with an f-string so the
Go-template format string can be read in one go. The doubled braces
(``{{{{`` / ``}}}}``) escape to literal ``{{`` / ``}}`` inside the
f-string, matching Docker's template syntax. Output is byte-identical
to the previous concatenation.

Addresses review feedback on PR #31.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Danelegend Danelegend merged commit 74eb494 into main May 6, 2026
3 checks passed
@Danelegend Danelegend deleted the dane/ll-session-ttl branch May 6, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants