feat: write-time dedup gate (ADD/UPDATE/SUPERSEDE/NOOP)#87
feat: write-time dedup gate (ADD/UPDATE/SUPERSEDE/NOOP)#87jescalan wants to merge 3 commits intoverygoodplugins:mainfrom
Conversation
Add LLM-based deduplication at write time, inspired by Helixir's decision engine. Before storing a new memory, the system: 1. Generates an embedding for the new content 2. Searches Qdrant for semantically similar existing memories (>0.70) 3. If candidates found, asks a fast LLM (gpt-4o-mini) to classify: - ADD: genuinely new, store normally - UPDATE: merge into existing memory (updates both FalkorDB + Qdrant) - SUPERSEDE: delete outdated memory, store new one - NOOP: already known, skip entirely Disabled by default. Enable with MEMORY_DEDUP_ENABLED=true. Configurable model (MEMORY_DEDUP_MODEL) and similarity threshold (MEMORY_DEDUP_SIMILARITY_THRESHOLD). The gate adds ~1-2s per write (one embedding + one LLM call) but eliminates the need for post-hoc dedup cleanup passes.
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis change introduces a write-time deduplication feature for memory storage. It adds configuration options to control the feature, implements a dedup module that uses embeddings and LLM-based classification to identify and handle semantically similar memories, and integrates dedup logic into the memory API to process results. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Memory API
participant Dedup Module
participant Embeddings
participant Qdrant
participant OpenAI
participant FalkorDB
Client->>Memory API: Write memory
Memory API->>Dedup Module: check_dedup(new_content, ...)
Dedup Module->>Embeddings: Generate embedding for new content
Embeddings-->>Dedup Module: embedding
Dedup Module->>Qdrant: Search for similar memories
Qdrant-->>Dedup Module: candidate memories
Dedup Module->>OpenAI: Classify action (ADD/UPDATE/SUPERSEDE/NOOP)
OpenAI-->>Dedup Module: action + optional target_id
alt NOOP
Dedup Module-->>Memory API: action=NOOP, reason
Memory API-->>Client: skipped
else UPDATE
Dedup Module-->>Memory API: action=UPDATE, merged_content, target_id
Memory API->>FalkorDB: Update existing memory
Memory API->>Qdrant: Re-embed and update vector
Memory API-->>Client: updated
else SUPERSEDE
Dedup Module-->>Memory API: action=SUPERSEDE, target_id
Memory API->>FalkorDB: Delete old memory
Memory API->>Qdrant: Remove old vector
Memory API->>Memory API: Continue ADD path
Memory API->>FalkorDB: Store new memory
Memory API->>Qdrant: Add new vector
Memory API-->>Client: added (superseded_id)
else ADD
Dedup Module-->>Memory API: action=ADD
Memory API->>FalkorDB: Store new memory
Memory API->>Qdrant: Add new vector
Memory API-->>Client: added
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
automem/config.py (1)
102-111: Configuration additions look correct; note threshold discrepancy with PR description.The boolean coercion for
MEMORY_DEDUP_ENABLEDfollows the established pattern in this file. The default of"false"correctly evaluates toFalse.One discrepancy to be aware of: the PR description states a threshold of ≥ 0.85, but the code defaults to
0.70. This is a very permissive cosine-similarity cutoff — many loosely related memories will become candidates, triggering an LLM call on each write. If that's intentional (letting the LLM decide), document it; otherwise, consider raising the default closer to the 0.85 mentioned in the PR description to reduce unnecessary LLM invocations and latency.Also, the CI pipeline reports a Black formatting warning on this block. Please run Black to fix the formatting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/config.py` around lines 102 - 111, The current defaults set MEMORY_DEDUP_SIMILARITY_THRESHOLD to 0.70 which conflicts with the PR description (≥0.85) and may cause excessive LLM calls; update the default to "0.85" (i.e., change the env default passed into float(os.getenv(...)) for MEMORY_DEDUP_SIMILARITY_THRESHOLD) or add a comment documenting that 0.70 is intentionally permissive, and then run Black on this block to fix the CI formatting warning; verify the related symbols MEMORY_DEDUP_ENABLED and MEMORY_DEDUP_MODEL remain unchanged.automem/api/memory.py (1)
241-254: Consider omittingcandidatesfrom the NOOP response in production.The
candidateslist includes full content and IDs of existing memories. This is helpful for debugging but could expose sensitive stored content to API callers. Consider gating this behind adebugflag or stripping content from candidates in the response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/api/memory.py` around lines 241 - 254, In the NOOP branch where dedup_result is returned (when dedup_result["action"] == "NOOP") avoid returning full dedup_result["candidates"] to callers in production; change the response logic in that block (the jsonify return) to either omit the "candidates" key or replace each candidate's full content with a safe summary/only IDs unless a debug flag is enabled (e.g., check a debug/config flag passed into the handler or an environment variable like DEBUG_MODE), so production responses do not expose stored memory content.automem/dedup.py (1)
28-57: Prompt injection surface:new_contentis user-controlled input interpolated into the LLM prompt.A malicious authenticated user could craft memory content to manipulate the LLM into returning a SUPERSEDE/UPDATE action targeting a specific memory. This is largely mitigated if
target_idis validated against candidate IDs (as suggested in the memory.py review). As an additional hardening measure, consider using a system message to separate instructions from user data:messages=[ {"role": "system", "content": DEDUP_SYSTEM_PROMPT}, {"role": "user", "content": f"NEW MEMORY:\n{new_content}\n\nEXISTING MEMORIES:\n{existing_text}"}, ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automem/dedup.py` around lines 28 - 57, The dedup prompt embeds user-controlled new_content directly into DEDUP_PROMPT which creates a prompt-injection surface; extract the instructional text into a new DEDUP_SYSTEM_PROMPT constant and send the user data separately in the user message (e.g., build messages=[{"role":"system","content":DEDUP_SYSTEM_PROMPT}, {"role":"user","content":f"NEW MEMORY:\\n{new_content}\\n\\nEXISTING MEMORIES:\\n{existing_memories}"}]) so the LLM treats the rules as system instructions and new_content as data; keep the original DEDUP_PROMPT text intact when creating DEDUP_SYSTEM_PROMPT and update any call sites that reference DEDUP_PROMPT to use the new messages format and continue validating target_id values as done in memory.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automem/api/memory.py`:
- Around line 256-260: The LLM-supplied dedup_result["target_id"] must be
validated against the candidate IDs from the similarity search before performing
UPDATE or SUPERSEDE; in the dedup handling path (where dedup_result, target_id,
merged_content are used) check that target_id is present in the list/set of
candidate IDs returned by the similarity search (or returned by check_dedup) and
only rewrite memory_id / perform the merge if it matches; if it does not match,
discard the target_id (treat as CREATE or fallback to no-op), log a warning
including the invalid target_id and context, and ensure check_dedup (or the
caller) enforces the same guard to avoid operating on unrelated memories.
- Around line 331-346: The SUPERSEDE branch currently uses graph.query("MATCH
(m:Memory {id: $id}) DELETE m", ...) which will fail if the Memory node has
relationships; change it to use DETACH DELETE (i.e., "MATCH (m:Memory {id: $id})
DETACH DELETE m") and preserve the try/except behavior around graph.query. For
the Qdrant deletion via qdrant_cl.delete in this block, make the points_selector
construction defensive like the delete endpoint: attempt to construct
http_models.PointIdsList(points=[old_id]) and pass that, and if that
raises/doesn't apply, fall back to {"points": [old_id]} before calling
qdrant_cl.delete(collection_name=collection_name, points_selector=...) while
keeping the existing exception handling and logger.warning referencing old_id;
use get_qdrant_client, qdrant_cl, collection_name, and old_id to locate the code
to update.
- Around line 256-329: The UPDATE branch currently overwrites Qdrant payload
fields from the incoming request and redundantly enqueues an embedding while
also generating one synchronously; fix it by (1) removing the
enqueue_embedding(target_id, merged) call and (2) before calling
get_qdrant_client()/qdrant_cl.upsert(), fetch the existing point/payload for
target_id (via the qdrant client’s get/point API or equivalent), merge payload
fields so that tags, tag_prefixes, type, type_confidence, metadata and any
fields not present in the incoming request are preserved (only replace fields
that were explicitly provided), then generate_real_embedding(merged) and upsert
the merged payload and new vector; use the same symbols shown (dedup_result,
target_id, merged, graph.query, get_qdrant_client, generate_real_embedding,
qdrant_cl.upsert) to locate and update the code.
In `@automem/dedup.py`:
- Around line 94-106: Remove the unused import of Filter from
qdrant_client.models (and the trailing "# noqa: F401") in the dedup flow: the
import on the try block in dedup.py is dead code and should be deleted so only
the qdrant_client.search call, MAX_CANDIDATES, similarity_threshold, logger
warning, and return result remain; ensure no other references to Filter exist in
the file before committing.
- Around line 146-148: Guard against response.choices[0].message.content being
None before calling .strip(): check the value (e.g., content =
response.choices[0].message.content) and if it's None, log a clear warning
(including any helpful context like the model/response id) and set decision to
the default (e.g., {"decision": "ADD"}) instead of calling json.loads; only call
raw = content.strip() and decision = json.loads(raw) when content is not None.
Update the logic near the variables response, raw, and decision in dedup.py so
the json.loads call is skipped for None content and a clear log message is
emitted.
---
Nitpick comments:
In `@automem/api/memory.py`:
- Around line 241-254: In the NOOP branch where dedup_result is returned (when
dedup_result["action"] == "NOOP") avoid returning full
dedup_result["candidates"] to callers in production; change the response logic
in that block (the jsonify return) to either omit the "candidates" key or
replace each candidate's full content with a safe summary/only IDs unless a
debug flag is enabled (e.g., check a debug/config flag passed into the handler
or an environment variable like DEBUG_MODE), so production responses do not
expose stored memory content.
In `@automem/config.py`:
- Around line 102-111: The current defaults set
MEMORY_DEDUP_SIMILARITY_THRESHOLD to 0.70 which conflicts with the PR
description (≥0.85) and may cause excessive LLM calls; update the default to
"0.85" (i.e., change the env default passed into float(os.getenv(...)) for
MEMORY_DEDUP_SIMILARITY_THRESHOLD) or add a comment documenting that 0.70 is
intentionally permissive, and then run Black on this block to fix the CI
formatting warning; verify the related symbols MEMORY_DEDUP_ENABLED and
MEMORY_DEDUP_MODEL remain unchanged.
In `@automem/dedup.py`:
- Around line 28-57: The dedup prompt embeds user-controlled new_content
directly into DEDUP_PROMPT which creates a prompt-injection surface; extract the
instructional text into a new DEDUP_SYSTEM_PROMPT constant and send the user
data separately in the user message (e.g., build
messages=[{"role":"system","content":DEDUP_SYSTEM_PROMPT},
{"role":"user","content":f"NEW MEMORY:\\n{new_content}\\n\\nEXISTING
MEMORIES:\\n{existing_memories}"}]) so the LLM treats the rules as system
instructions and new_content as data; keep the original DEDUP_PROMPT text intact
when creating DEDUP_SYSTEM_PROMPT and update any call sites that reference
DEDUP_PROMPT to use the new messages format and continue validating target_id
values as done in memory.py.
- Validate target_id against candidate IDs to prevent LLM hallucination - Remove redundant enqueue_embedding call (sync path handles it) - Preserve existing tags/metadata on UPDATE (fetch before overwrite) - Fix SUPERSEDE: DELETE → DETACH DELETE for nodes with relationships - Use defensive Qdrant PointIdsList selector (matches delete endpoint) - Remove unused Filter import - Guard against None LLM response content
## Summary - Adds evaluation results from the initial experiment round to `benchmarks/EXPERIMENT_LOG.md` - #73 min_score threshold: neutral (needs #78 for score differentiation first) - PR #80 enhanced recall: blocked by merge conflicts with main, needs rebase - PR #87 write-time dedup: neutral on recall (expected — dedup is a write-path change) ## Test plan - [x] No code changes, documentation only Made with [Cursor](https://cursor.com)
🤖 I have created a release *beep* *boop* --- ## [0.13.0](v0.12.0...v0.13.0) (2026-03-02) ### Features * **bench:** benchmark testing infrastructure for rapid iteration ([#97](#97)) ([80a6f93](80a6f93)) * **recall:** add min_score threshold and adaptive floor filtering ([#73](#73)) ([#101](#101)) ([8df3c08](8df3c08)) * **viewer:** add standalone graph-viewer runtime files ([5bcb6db](5bcb6db)) * **viewer:** consolidate stable core and split-ready compatibility ([#94](#94)) ([958da72](958da72)) * **viewer:** externalize visualizer with /viewer compatibility routes ([29bafcf](29bafcf)) * **viewer:** merge visualizer stable core branch ([96b27bf](96b27bf)) ### Bug Fixes * FalkorDB data not persisting across restarts ([3bbc834](3bbc834)) * FalkorDB data not persisting across restarts ([#99](#99)) ([8490d36](8490d36)) * **mcp-sse:** sync tool schemas for SSE/MCP parity ([#104](#104)) ([d99b86d](d99b86d)) ### Documentation * **bench:** add PR [#73](#73), [#80](#80), and [#87](#87) experiment results ([#103](#103)) ([8533fac](8533fac)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
Looks like this was landed separately - shall I close the PR here? |
|
@jescalan Not yet. We've got a lot of good ideas pending in the PR and Issues list, so I wanted to set up baseline benchmark score across LoCoMo, LongMemEval, and an internal benchmark system. Then we can A/B test against PRs, or tweak defaults, to see what gets the best result. #103 adds that test suite, and I ran preliminary benchmarks against this PR, but the scores are currently skewed due to the issues with the decay rate described in #78 , and then there's the start of a fix in #105 which looks promising... I'm running that in production now, for the next 5 days, to confirm it fixes the decay issue. Then will re-run benchmarks to get a baseline, re-run them against this and the other open PRs that affect storage / recall quality / general noise, and tweak and merge based on the results. You can track all that in EXPERIMENT_LOG.md, which will carry between the tests and get updated with each result. So.... It's a big set of changes that should significantly improve AutoMem's performance and recall quality, and also help us build out the docs with hard numbers on things like— "why did we choose Voyage 1024d as the default embedding model?" -> "Here's why, and here are when and why you might want to select alternatives X, Y, or Z, and what to expect.". But it's a big-ish project 😅, and I want to make sure it all remains backwards compatible, especially with older MCP clients, and any changes we make are not just "that sounds sensible," but we can back them up with hard numbers. Keep the PR open ✅. I will post back here with test results once the underlying issue with the decay curve is worked out. Thanks for contributing! 🧡 I'll be updating the testing docs with the new tools in the next day or two. If you want to jump in and try some A/B tests against experiments, it will be a lot faster (and more fun 🤓), after this round of updates. |
|
Very reasonable, sounds good! Appreciate the detailed reply. Let me know if there's anything else I can help out with here 🚀 |
Summary
Adds a write-time deduplication gate that classifies incoming memories before storage using vector similarity search + LLM classification.
How it works
/memoryendpoint, the gate generates an embedding and searches Qdrant for similar existing memories (cosine similarity ≥ 0.85)Files changed
automem/dedup.py— Core dedup logic: similarity search, LLM classification, action executionautomem/api/memory.py— Integration into the store endpoint (pre-write gate)automem/config.py— Config flags:MEMORY_DEDUP_ENABLED,MEMORY_DEDUP_MODEL, similarity thresholdConfiguration
Testing
Tested in production with verified ADD, UPDATE, SUPERSEDE, and NOOP behaviors.