feat(sessions): TTL enforcement and crash-safe cleanup#31
Merged
Conversation
evan-onyx
approved these changes
May 5, 2026
9de943a to
e7b077a
Compare
Sessions now carry an explicit TTL and are guaranteed to be torn down at or before that deadline even if the API service crashes. * CreateSessionRequest gains ``ttl_seconds`` (default 900s, max 24h); CreateSessionResponse adds the absolute ``expires_at`` timestamp. * Docker session containers are launched with ``--rm`` and ``sleep <ttl>`` so the container self-destructs at the deadline regardless of whether the API service is alive. The TTL also goes onto the container as the ``code-interpreter.expires-at`` label for the reaper. * Kubernetes session pods set ``activeDeadlineSeconds=ttl`` so kubelet stops the executor container at the deadline; the deadline timestamp is also stored as a pod annotation so the reaper knows when to delete. * A new background reaper task (in main.py's lifespan) runs once at startup — handling crash recovery for any sessions whose TTL elapsed while the service was down — and then every 30s thereafter. * Both backends implement ``reap_expired_sessions`` (label-filtered list followed by per-session deletion) so the reaper is backend-agnostic. Tests cover TTL bounds at the route layer, the active_deadline / expires-at metadata on each backend, and reaper behavior under happy path, missing annotation, invalid annotation, list failure, and partial delete-failure scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3e45ea8 to
a9cbd51
Compare
Replaces a string-concatenation expression with an f-string so the
Go-template format string can be read in one go. The doubled braces
(``{{{{`` / ``}}}}``) escape to literal ``{{`` / ``}}`` inside the
f-string, matching Docker's template syntax. Output is byte-identical
to the previous concatenation.
Addresses review feedback on PR #31.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sessions now carry an explicit TTL and are guaranteed to be torn down at or before that deadline even if the API service crashes.
ttl_seconds(default 900s, max 24h); CreateSessionResponse adds the absoluteexpires_attimestamp.--rmandsleep <ttl>so the container self-destructs at the deadline regardless of whether the API service is alive. The TTL also goes onto the container as thecode-interpreter.expires-atlabel for the reaper.activeDeadlineSeconds=ttlso kubelet stops the executor container at the deadline; the deadline timestamp is also stored as a pod annotation so the reaper knows when to delete.reap_expired_sessions(label-filtered list followed by per-session deletion) so the reaper is backend-agnostic.Tests cover TTL bounds at the route layer, the active_deadline / expires-at metadata on each backend, and reaper behavior under happy path, missing annotation, invalid annotation, list failure, and partial delete-failure scenarios.