Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/open-api-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ openapi: 3.0.3
info:
title: The Agent's user-facing API
description: The user-facing parts of The Agent's API service (excluding system-level endpoints, chat completion, maintenance endpoints, etc.)
version: 5.8.1
version: 5.9.0
license:
name: MIT
url: https://opensource.org/licenses/MIT
Expand Down
59 changes: 59 additions & 0 deletions openspec/changes/archive/2026-04-14-unified-cleanup/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
## Context

The backend currently has one cleanup endpoint (`POST /task/clear-expired-cache`) that calls `ToolsCacheCRUD.delete_expired()`. Other data — messages, attachments, usage records, price alerts, unaccepted sponsorships — has no expiry. All CRUD classes use individual `delete()` methods; only `ToolsCacheCRUD` has a bulk `delete_expired()`.

The `chat_message_attachments` table has an FK to `chat_messages(chat_id, message_id)` with **no ON DELETE CASCADE**, meaning attachments must be deleted before their parent messages.

Users are never deleted because their profiles carry phone number verification ("real human" proof) from Telegram/WhatsApp onboarding.

## Goals / Non-Goals

**Goals:**

- Single `POST /task/cleanup` endpoint replaces `/task/clear-expired-cache`
- Configurable retention periods via environment variables
- Bulk deletes for performance (SQLAlchemy `query.filter(...).delete()`)
- Per-phase error isolation — one phase failing doesn't block others
- Attachment-then-message ordering to respect FK constraints

**Non-Goals:**

- User deletion (identity anchors)
- Notifications on cleanup (decided against — too noisy)
- Background/async processing (start synchronous, revisit if needed)
- Purchase record cleanup (financial audit trail, kept forever)
- Chat config cleanup (tied to users, stay forever)

## Decisions

### Unified cleanup service with ordered phases

A single `CleanupService` class runs 5 phases in a fixed order. Phases 1a/1b (attachments + messages) are wrapped in one try/except block because they have an FK dependency. Phases 2-5 are each independently wrapped.

**Why ordered phases instead of parallel**: Simplicity. No threading complexity, no partial-failure coordination. The total volume per CRON run should be small if run regularly. If performance becomes an issue, backgrounding can be added later without changing the service interface.

### Retention periods shared between messages and usage records

Both messages and usage records use `cleanup_message_retention_days` (default 30). This keeps the config surface small. Usage records aren't financial records (purchase_records are), and 30 days matches message retention.

**Alternative considered**: Separate `cleanup_usage_retention_days`. Rejected because usage records are internal activity logs, not regulated data. GDPR mandates data minimization. Purchase records (kept forever) are the financial audit trail.

### Price alert staleness at 360 days

`last_price_time` is updated only on alert creation and when the threshold is crossed (trigger). It is NOT updated on routine CRON price checks. This makes it a reliable "last relevant activity" indicator. 360 days gives wide-threshold alerts (e.g., "notify me if BTC drops 50%") a full year to trigger before cleanup.

### Bulk delete via SQLAlchemy filter + delete

New CRUD methods use `query.filter(...).delete()` returning the count of deleted rows. This matches the existing pattern in `ToolsCacheCRUD.delete_expired()`. No need to load rows into memory.

For the attachment cleanup, we use a subquery: delete attachments whose `(chat_id, message_id)` matches messages with `sent_at` older than the cutoff. This avoids a two-step load-then-delete.

### No DB migration needed

All cleanup is row deletion based on existing date columns. No new columns, no schema changes. The new config values are environment variables with defaults.

## Risks / Trade-offs

- **Bulk delete lock contention**: Large deletes can hold table-level locks on Postgres. → Mitigation: if run regularly (daily), each batch is small. If backlog is large on first run, may cause brief lock contention. Acceptable for a CRON-triggered maintenance task.
- **Attachment subquery complexity**: Joining attachments to messages for bulk delete is more complex than simple date filtering. → Mitigation: straightforward subquery on `sent_at`; Postgres handles this efficiently with the existing index.
- **Usage records at 30 days**: Users lose visibility into spending after 30 days. → Mitigation: the backoffice usage stats endpoint aggregates on-the-fly; users wanting longer history can export or the retention can be increased via config.
42 changes: 42 additions & 0 deletions openspec/changes/archive/2026-04-14-unified-cleanup/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Why

The backend has a single cleanup endpoint (`/task/clear-expired-cache`) that only clears expired tool cache entries. Meanwhile, messages, attachments, usage records, price alerts, and unaccepted sponsorships accumulate indefinitely. This wastes storage, keeps stale data around, and means multiple CRON jobs would be needed as cleanup concerns grow.

A unified `/task/cleanup` endpoint consolidates all data retention into one place, called by a single external CRON job.

## What Changes

- Replace `/task/clear-expired-cache` with `POST /task/cleanup` (same API key auth)
- Add a cleanup service that runs 5 phases in order:
1. Delete attachments + messages older than 30 days (atomic — attachments first due to FK, no cascade)
2. Clear expired cache (reuse existing `ToolsCacheCRUD.delete_expired()`)
3. Delete usage records older than 30 days
4. Delete price alerts not created/triggered in 360 days
5. Delete unaccepted sponsorships older than 30 days
- Add 3 new config values: `cleanup_message_retention_days`, `cleanup_price_alert_staleness_days`, `cleanup_sponsorship_staleness_days`
- Add bulk-delete methods to CRUDs that don't already have them
- No notifications on any cleanup action
- No background processing — runs synchronously
- Users are never deleted (identity anchors with phone number verification)

## Capabilities

### New Capabilities

- `cleanup-service`: Unified data cleanup service with configurable retention periods, bulk deletes, and per-phase error isolation

### Modified Capabilities

## Impact

- `src/main.py` — replace `/task/clear-expired-cache` with `/task/cleanup`
- `src/util/config.py` — add 3 new config values
- `src/db/crud/chat_message.py` — add `delete_older_than()`
- `src/db/crud/chat_message_attachment.py` — add `delete_by_old_messages()`
- `src/db/crud/usage_record_repo.py` or new CRUD — add `delete_older_than()`
- `src/db/crud/price_alert.py` — add `delete_stale()`
- `src/db/crud/sponsorship.py` — add `delete_unaccepted_older_than()`
- New cleanup service file under `src/features/`
- New test file(s) for the cleanup service and new CRUD methods
- No DB migrations (no schema changes)
- No API contract changes beyond the endpoint rename
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## ADDED Requirements

### Requirement: Unified cleanup endpoint
The system SHALL expose `POST /task/cleanup` authenticated via API key. It SHALL replace the existing `POST /task/clear-expired-cache` endpoint. The response SHALL be a JSON object with counts of deleted items per phase.

#### Scenario: Successful cleanup run
- **WHEN** `POST /task/cleanup` is called with a valid API key
- **THEN** the system SHALL execute all cleanup phases and return a JSON response with keys: `messages_deleted`, `attachments_deleted`, `cache_entries_cleared`, `usage_records_deleted`, `price_alerts_deleted`, `sponsorships_deleted`

#### Scenario: Unauthorized cleanup attempt
- **WHEN** `POST /task/cleanup` is called without a valid API key
- **THEN** the system SHALL return HTTP 401

### Requirement: Message and attachment cleanup (Phase 1)
The system SHALL delete chat message attachments and then chat messages where `sent_at` is older than `cleanup_message_retention_days` (default 30). Attachments SHALL be deleted before messages to satisfy FK constraints. Both deletions SHALL be wrapped in a single error handling block — if attachment deletion fails, message deletion SHALL be skipped.

#### Scenario: Old messages and their attachments are deleted
- **WHEN** cleanup runs and messages exist with `sent_at` older than 30 days
- **THEN** attachments belonging to those messages SHALL be deleted first, then the messages themselves

#### Scenario: Attachment deletion fails
- **WHEN** attachment deletion raises an exception
- **THEN** message deletion SHALL be skipped, remaining phases SHALL still execute

### Requirement: Expired cache cleanup (Phase 2)
The system SHALL delete tool cache entries where `expires_at < now()`. This SHALL reuse the existing `ToolsCacheCRUD.delete_expired()` method.

#### Scenario: Expired cache cleared
- **WHEN** cleanup runs and expired cache entries exist
- **THEN** those entries SHALL be deleted

### Requirement: Usage record cleanup (Phase 3)
The system SHALL delete usage records where `timestamp` is older than `cleanup_message_retention_days` (default 30). This shares the message retention config value.

#### Scenario: Old usage records are deleted
- **WHEN** cleanup runs and usage records exist with `timestamp` older than 30 days
- **THEN** those records SHALL be deleted

### Requirement: Price alert cleanup (Phase 4)
The system SHALL delete price alerts where `last_price_time` is older than `cleanup_price_alert_staleness_days` (default 360).

#### Scenario: Stale price alerts are deleted
- **WHEN** cleanup runs and price alerts have `last_price_time` older than 360 days
- **THEN** those alerts SHALL be deleted

### Requirement: Unaccepted sponsorship cleanup (Phase 5)
The system SHALL delete sponsorships where `accepted_at IS NULL` and `sponsored_at` is older than `cleanup_sponsorship_staleness_days` (default 30).

#### Scenario: Stale unaccepted sponsorships are deleted
- **WHEN** cleanup runs and sponsorships exist where `accepted_at` is NULL and `sponsored_at` is older than 30 days
- **THEN** those sponsorships SHALL be deleted

### Requirement: Per-phase error isolation
Phases 2, 3, 4, and 5 SHALL each be independently wrapped in error handling. A failure in any one phase SHALL NOT prevent execution of subsequent phases. Phase 1 (attachments + messages) is a single atomic block.

#### Scenario: One independent phase fails
- **WHEN** phase 3 (usage records) raises an exception
- **THEN** phases 4 and 5 SHALL still execute, and the response SHALL include 0 for the failed phase's count

### Requirement: Configurable retention periods
The system SHALL support 3 new environment-variable-backed config values with the following defaults:
- `CLEANUP_MESSAGE_RETENTION_DAYS` = 30 (used for messages, attachments, and usage records)
- `CLEANUP_PRICE_ALERT_STALENESS_DAYS` = 360
- `CLEANUP_SPONSORSHIP_STALENESS_DAYS` = 30

#### Scenario: Custom retention via environment variable
- **WHEN** `CLEANUP_MESSAGE_RETENTION_DAYS` is set to 60
- **THEN** messages, attachments, and usage records older than 60 days SHALL be deleted

### Requirement: Bulk deletes
All cleanup phases SHALL use bulk `DELETE WHERE` queries (not row-by-row loading and deletion). This follows the existing pattern in `ToolsCacheCRUD.delete_expired()`.

#### Scenario: Bulk message deletion
- **WHEN** 1000 messages are older than the retention period
- **THEN** they SHALL be deleted in a single SQL statement, not 1000 individual deletes
37 changes: 37 additions & 0 deletions openspec/changes/archive/2026-04-14-unified-cleanup/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
## 1. Add config values

- [x] 1.1 Add `cleanup_message_retention_days` (env `CLEANUP_MESSAGE_RETENTION_DAYS`, default 30) to `Config`
- [x] 1.2 Add `cleanup_price_alert_staleness_days` (env `CLEANUP_PRICE_ALERT_STALENESS_DAYS`, default 360) to `Config`
- [x] 1.3 Add `cleanup_sponsorship_staleness_days` (env `CLEANUP_SPONSORSHIP_STALENESS_DAYS`, default 30) to `Config`
- [x] 1.4 Update config test with new values

## 2. Add bulk-delete CRUD methods

- [x] 2.1 `ChatMessageAttachmentCRUD.delete_by_old_messages(cutoff: datetime) -> int` — delete attachments via subquery joining messages on `sent_at < cutoff`
- [x] 2.2 `ChatMessageCRUD.delete_older_than(cutoff: datetime) -> int` — bulk delete messages with `sent_at < cutoff`
- [x] 2.3 Add `delete_older_than(cutoff: datetime) -> int` to usage record repo — bulk delete usage records with `timestamp < cutoff`
- [x] 2.4 `PriceAlertCRUD.delete_stale(cutoff: datetime) -> int` — bulk delete alerts with `last_price_time < cutoff`
- [x] 2.5 `SponsorshipCRUD.delete_unaccepted_older_than(cutoff: datetime) -> int` — bulk delete where `accepted_at IS NULL AND sponsored_at < cutoff`
- [x] 2.6 Write tests for each new CRUD method

## 3. Create cleanup service

- [x] 3.1 Create `src/features/cleanup/cleanup_service.py` with `CleanupService` class
- [x] 3.2 Implement phase 1: attachments + messages (atomic block)
- [x] 3.3 Implement phase 2: expired cache (reuse `ToolsCacheCRUD.delete_expired()`)
- [x] 3.4 Implement phase 3: old usage records
- [x] 3.5 Implement phase 4: stale price alerts
- [x] 3.6 Implement phase 5: unaccepted sponsorships
- [x] 3.7 Return summary dict with per-phase counts
- [x] 3.8 Write tests for `CleanupService`

## 4. Wire up endpoint

- [x] 4.1 Replace `POST /task/clear-expired-cache` with `POST /task/cleanup` in `main.py`
- [x] 4.2 Endpoint calls `CleanupService`, returns summary JSON
- [x] 4.3 Verify API key auth works on the new endpoint

## 5. Verify

- [x] 5.1 Run full test suite — no regressions
- [x] 5.2 Run pre-commit linting
63 changes: 0 additions & 63 deletions openspec/specs/binary-search-resize/spec.md

This file was deleted.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "the-agent"
version = "5.8.1"
version = "5.9.0"

[tool.setuptools]
package-dir = {"" = "src"}
Expand Down
8 changes: 8 additions & 0 deletions src/db/crud/chat_message.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from datetime import datetime
from uuid import UUID

from sqlalchemy import desc
Expand Down Expand Up @@ -64,3 +65,10 @@ def delete(self, chat_id: UUID, message_id: str) -> ChatMessageDB | None:
self._db.delete(chat_message)
self._db.commit()
return chat_message

def delete_older_than(self, cutoff: datetime) -> int:
deleted = self._db.query(ChatMessageDB).filter(
ChatMessageDB.sent_at < cutoff,
).delete(synchronize_session = False)
self._db.commit()
return deleted
13 changes: 13 additions & 0 deletions src/db/crud/chat_message_attachment.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from datetime import datetime
from uuid import UUID

from sqlalchemy import select, tuple_
from sqlalchemy.orm import Session

from db.model.chat_message import ChatMessageDB
from db.model.chat_message_attachment import ChatMessageAttachmentDB
from db.schema.chat_message_attachment import ChatMessageAttachmentSave
from util.functions import generate_short_uuid
Expand Down Expand Up @@ -67,3 +70,13 @@ def delete(self, attachment_id: str) -> ChatMessageAttachmentDB | None:
self._db.delete(attachment)
self._db.commit()
return attachment

def delete_by_old_messages(self, cutoff: datetime) -> int:
old_message_pairs = select(ChatMessageDB.chat_id, ChatMessageDB.message_id).where(
ChatMessageDB.sent_at < cutoff,
)
deleted = self._db.query(ChatMessageAttachmentDB).filter(
tuple_(ChatMessageAttachmentDB.chat_id, ChatMessageAttachmentDB.message_id).in_(old_message_pairs),
).delete(synchronize_session = False)
self._db.commit()
return deleted
8 changes: 8 additions & 0 deletions src/db/crud/price_alert.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from datetime import datetime
from uuid import UUID

from sqlalchemy.orm import Session
Expand Down Expand Up @@ -58,3 +59,10 @@ def delete(self, chat_id: UUID, base_currency: str, desired_currency: str) -> Pr
self._db.delete(price_alert)
self._db.commit()
return price_alert

def delete_stale(self, cutoff: datetime) -> int:
deleted = self._db.query(PriceAlertDB).filter(
PriceAlertDB.last_price_time < cutoff,
).delete(synchronize_session = False)
self._db.commit()
return deleted
9 changes: 9 additions & 0 deletions src/db/crud/sponsorship.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from datetime import datetime
from uuid import UUID

from sqlalchemy.orm import Session
Expand Down Expand Up @@ -70,3 +71,11 @@ def delete_all_by_receiver(self, receiver_id: UUID) -> int:
).delete(synchronize_session = False) # optimizes by assuming no other session will use these objects
self._db.commit()
return result

def delete_unaccepted_older_than(self, cutoff: datetime) -> int:
deleted = self._db.query(SponsorshipDB).filter(
SponsorshipDB.accepted_at.is_(None),
SponsorshipDB.sponsored_at < cutoff,
).delete(synchronize_session = False)
self._db.commit()
return deleted
Loading
Loading