Skip to content

Tech debt: EdgeLearner cursors are ephemeral — lost on restart causes full re-consumption #2210

@mrveiss

Description

@mrveiss

Problem

EdgeLearner._cursors is an in-memory dict[str, str] that tracks the last processed Redis stream entry ID per stream key. On process restart, all cursors are lost and every stream is re-consumed from `0-0`.

With the stream TTL now at 30 days (#2102), a restart means re-processing up to 30 days of feedback events. The `on_retrieval` operations (EMA weight update, access count increment) are not fully idempotent — re-applying EMA decay corrupts edge weights.

Discovered During

Code review of PR #2179 (issue #2102)

Fix

Persist cursors to Redis using a simple hash:
```python
HSET rag:cursors:edge_learner <stream_key> <last_entry_id>
```

Read on startup, write after each successful batch.

Impact

Medium — edge weight corruption after restart. Low frequency (restarts are infrequent) but high impact on RAG quality when it occurs.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions