Skip to content

feat: write-time dedup gate (ADD/UPDATE/SUPERSEDE/NOOP)#87

Open
jescalan wants to merge 3 commits intoverygoodplugins:mainfrom
jescalan:feat/write-time-dedup
Open

feat: write-time dedup gate (ADD/UPDATE/SUPERSEDE/NOOP)#87
jescalan wants to merge 3 commits intoverygoodplugins:mainfrom
jescalan:feat/write-time-dedup

Conversation

@jescalan
Copy link
Copy Markdown
Contributor

Summary

Adds a write-time deduplication gate that classifies incoming memories before storage using vector similarity search + LLM classification.

How it works

  1. When a new memory arrives at the /memory endpoint, the gate generates an embedding and searches Qdrant for similar existing memories (cosine similarity ≥ 0.85)
  2. If candidates are found, an LLM classifies the action:
    • ADD — genuinely new information, write normally
    • UPDATE — new memory extends/refines an existing one → merge content, update in place
    • SUPERSEDE — new memory replaces outdated info → delete old, write new
    • NOOP — near-duplicate of existing memory → skip write entirely
  3. If no candidates found, defaults to ADD

Files changed

  • automem/dedup.py — Core dedup logic: similarity search, LLM classification, action execution
  • automem/api/memory.py — Integration into the store endpoint (pre-write gate)
  • automem/config.py — Config flags: MEMORY_DEDUP_ENABLED, MEMORY_DEDUP_MODEL, similarity threshold

Configuration

MEMORY_DEDUP_ENABLED=true
MEMORY_DEDUP_MODEL=gpt-4o-mini  # or any OpenAI-compatible model

Testing

Tested in production with verified ADD, UPDATE, SUPERSEDE, and NOOP behaviors.

Add LLM-based deduplication at write time, inspired by Helixir's
decision engine. Before storing a new memory, the system:

1. Generates an embedding for the new content
2. Searches Qdrant for semantically similar existing memories (>0.70)
3. If candidates found, asks a fast LLM (gpt-4o-mini) to classify:
   - ADD: genuinely new, store normally
   - UPDATE: merge into existing memory (updates both FalkorDB + Qdrant)
   - SUPERSEDE: delete outdated memory, store new one
   - NOOP: already known, skip entirely

Disabled by default. Enable with MEMORY_DEDUP_ENABLED=true.
Configurable model (MEMORY_DEDUP_MODEL) and similarity threshold
(MEMORY_DEDUP_SIMILARITY_THRESHOLD).

The gate adds ~1-2s per write (one embedding + one LLM call) but
eliminates the need for post-hoc dedup cleanup passes.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 18, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added intelligent memory deduplication that detects semantically similar memories before storing new content
    • System automatically decides to add, update, supersede, or skip storage based on similarity analysis
  • Configuration

    • New settings to enable/disable deduplication
    • Configurable deduplication model and similarity threshold for fine-tuning behavior

Walkthrough

This change introduces a write-time deduplication feature for memory storage. It adds configuration options to control the feature, implements a dedup module that uses embeddings and LLM-based classification to identify and handle semantically similar memories, and integrates dedup logic into the memory API to process results.

Changes

Cohort / File(s) Summary
Deduplication Configuration
automem/config.py
Added three new configuration constants: MEMORY_DEDUP_ENABLED (bool, default false), MEMORY_DEDUP_MODEL (string, default "gpt-4o-mini"), and MEMORY_DEDUP_SIMILARITY_THRESHOLD (float, default 0.70).
Deduplication Engine
automem/dedup.py
New module implementing write-time dedup logic. Generates embeddings, searches Qdrant for similar memories, formats candidates, and queries OpenAI to classify action (ADD/UPDATE/SUPERSEDE/NOOP). Includes robust error handling with fallbacks to ADD on failures.
Memory API Integration
automem/api/memory.py
Integrated dedup gate into memory write path. Handles NOOP (return early with skipped status), UPDATE (merge and re-embed content), SUPERSEDE (delete old memory), and ADD (continue normal store path). Response includes dedup action and superseded ID when applicable.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Memory API
    participant Dedup Module
    participant Embeddings
    participant Qdrant
    participant OpenAI
    participant FalkorDB

    Client->>Memory API: Write memory
    Memory API->>Dedup Module: check_dedup(new_content, ...)
    
    Dedup Module->>Embeddings: Generate embedding for new content
    Embeddings-->>Dedup Module: embedding
    
    Dedup Module->>Qdrant: Search for similar memories
    Qdrant-->>Dedup Module: candidate memories
    
    Dedup Module->>OpenAI: Classify action (ADD/UPDATE/SUPERSEDE/NOOP)
    OpenAI-->>Dedup Module: action + optional target_id
    
    alt NOOP
        Dedup Module-->>Memory API: action=NOOP, reason
        Memory API-->>Client: skipped
    else UPDATE
        Dedup Module-->>Memory API: action=UPDATE, merged_content, target_id
        Memory API->>FalkorDB: Update existing memory
        Memory API->>Qdrant: Re-embed and update vector
        Memory API-->>Client: updated
    else SUPERSEDE
        Dedup Module-->>Memory API: action=SUPERSEDE, target_id
        Memory API->>FalkorDB: Delete old memory
        Memory API->>Qdrant: Remove old vector
        Memory API->>Memory API: Continue ADD path
        Memory API->>FalkorDB: Store new memory
        Memory API->>Qdrant: Add new vector
        Memory API-->>Client: added (superseded_id)
    else ADD
        Dedup Module-->>Memory API: action=ADD
        Memory API->>FalkorDB: Store new memory
        Memory API->>Qdrant: Add new vector
        Memory API-->>Client: added
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: a write-time deduplication gate with the four action types (ADD/UPDATE/SUPERSEDE/NOOP) that are central to the changeset.
Description check ✅ Passed The description comprehensively explains the dedup gate feature, how it works, which files are changed, configuration options, and testing status—all directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
automem/config.py (1)

102-111: Configuration additions look correct; note threshold discrepancy with PR description.

The boolean coercion for MEMORY_DEDUP_ENABLED follows the established pattern in this file. The default of "false" correctly evaluates to False.

One discrepancy to be aware of: the PR description states a threshold of ≥ 0.85, but the code defaults to 0.70. This is a very permissive cosine-similarity cutoff — many loosely related memories will become candidates, triggering an LLM call on each write. If that's intentional (letting the LLM decide), document it; otherwise, consider raising the default closer to the 0.85 mentioned in the PR description to reduce unnecessary LLM invocations and latency.

Also, the CI pipeline reports a Black formatting warning on this block. Please run Black to fix the formatting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/config.py` around lines 102 - 111, The current defaults set
MEMORY_DEDUP_SIMILARITY_THRESHOLD to 0.70 which conflicts with the PR
description (≥0.85) and may cause excessive LLM calls; update the default to
"0.85" (i.e., change the env default passed into float(os.getenv(...)) for
MEMORY_DEDUP_SIMILARITY_THRESHOLD) or add a comment documenting that 0.70 is
intentionally permissive, and then run Black on this block to fix the CI
formatting warning; verify the related symbols MEMORY_DEDUP_ENABLED and
MEMORY_DEDUP_MODEL remain unchanged.
automem/api/memory.py (1)

241-254: Consider omitting candidates from the NOOP response in production.

The candidates list includes full content and IDs of existing memories. This is helpful for debugging but could expose sensitive stored content to API callers. Consider gating this behind a debug flag or stripping content from candidates in the response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/api/memory.py` around lines 241 - 254, In the NOOP branch where
dedup_result is returned (when dedup_result["action"] == "NOOP") avoid returning
full dedup_result["candidates"] to callers in production; change the response
logic in that block (the jsonify return) to either omit the "candidates" key or
replace each candidate's full content with a safe summary/only IDs unless a
debug flag is enabled (e.g., check a debug/config flag passed into the handler
or an environment variable like DEBUG_MODE), so production responses do not
expose stored memory content.
automem/dedup.py (1)

28-57: Prompt injection surface: new_content is user-controlled input interpolated into the LLM prompt.

A malicious authenticated user could craft memory content to manipulate the LLM into returning a SUPERSEDE/UPDATE action targeting a specific memory. This is largely mitigated if target_id is validated against candidate IDs (as suggested in the memory.py review). As an additional hardening measure, consider using a system message to separate instructions from user data:

messages=[
    {"role": "system", "content": DEDUP_SYSTEM_PROMPT},
    {"role": "user", "content": f"NEW MEMORY:\n{new_content}\n\nEXISTING MEMORIES:\n{existing_text}"},
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automem/dedup.py` around lines 28 - 57, The dedup prompt embeds
user-controlled new_content directly into DEDUP_PROMPT which creates a
prompt-injection surface; extract the instructional text into a new
DEDUP_SYSTEM_PROMPT constant and send the user data separately in the user
message (e.g., build messages=[{"role":"system","content":DEDUP_SYSTEM_PROMPT},
{"role":"user","content":f"NEW MEMORY:\\n{new_content}\\n\\nEXISTING
MEMORIES:\\n{existing_memories}"}]) so the LLM treats the rules as system
instructions and new_content as data; keep the original DEDUP_PROMPT text intact
when creating DEDUP_SYSTEM_PROMPT and update any call sites that reference
DEDUP_PROMPT to use the new messages format and continue validating target_id
values as done in memory.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automem/api/memory.py`:
- Around line 256-260: The LLM-supplied dedup_result["target_id"] must be
validated against the candidate IDs from the similarity search before performing
UPDATE or SUPERSEDE; in the dedup handling path (where dedup_result, target_id,
merged_content are used) check that target_id is present in the list/set of
candidate IDs returned by the similarity search (or returned by check_dedup) and
only rewrite memory_id / perform the merge if it matches; if it does not match,
discard the target_id (treat as CREATE or fallback to no-op), log a warning
including the invalid target_id and context, and ensure check_dedup (or the
caller) enforces the same guard to avoid operating on unrelated memories.
- Around line 331-346: The SUPERSEDE branch currently uses graph.query("MATCH
(m:Memory {id: $id}) DELETE m", ...) which will fail if the Memory node has
relationships; change it to use DETACH DELETE (i.e., "MATCH (m:Memory {id: $id})
DETACH DELETE m") and preserve the try/except behavior around graph.query. For
the Qdrant deletion via qdrant_cl.delete in this block, make the points_selector
construction defensive like the delete endpoint: attempt to construct
http_models.PointIdsList(points=[old_id]) and pass that, and if that
raises/doesn't apply, fall back to {"points": [old_id]} before calling
qdrant_cl.delete(collection_name=collection_name, points_selector=...) while
keeping the existing exception handling and logger.warning referencing old_id;
use get_qdrant_client, qdrant_cl, collection_name, and old_id to locate the code
to update.
- Around line 256-329: The UPDATE branch currently overwrites Qdrant payload
fields from the incoming request and redundantly enqueues an embedding while
also generating one synchronously; fix it by (1) removing the
enqueue_embedding(target_id, merged) call and (2) before calling
get_qdrant_client()/qdrant_cl.upsert(), fetch the existing point/payload for
target_id (via the qdrant client’s get/point API or equivalent), merge payload
fields so that tags, tag_prefixes, type, type_confidence, metadata and any
fields not present in the incoming request are preserved (only replace fields
that were explicitly provided), then generate_real_embedding(merged) and upsert
the merged payload and new vector; use the same symbols shown (dedup_result,
target_id, merged, graph.query, get_qdrant_client, generate_real_embedding,
qdrant_cl.upsert) to locate and update the code.

In `@automem/dedup.py`:
- Around line 94-106: Remove the unused import of Filter from
qdrant_client.models (and the trailing "# noqa: F401") in the dedup flow: the
import on the try block in dedup.py is dead code and should be deleted so only
the qdrant_client.search call, MAX_CANDIDATES, similarity_threshold, logger
warning, and return result remain; ensure no other references to Filter exist in
the file before committing.
- Around line 146-148: Guard against response.choices[0].message.content being
None before calling .strip(): check the value (e.g., content =
response.choices[0].message.content) and if it's None, log a clear warning
(including any helpful context like the model/response id) and set decision to
the default (e.g., {"decision": "ADD"}) instead of calling json.loads; only call
raw = content.strip() and decision = json.loads(raw) when content is not None.
Update the logic near the variables response, raw, and decision in dedup.py so
the json.loads call is skipped for None content and a clear log message is
emitted.

---

Nitpick comments:
In `@automem/api/memory.py`:
- Around line 241-254: In the NOOP branch where dedup_result is returned (when
dedup_result["action"] == "NOOP") avoid returning full
dedup_result["candidates"] to callers in production; change the response logic
in that block (the jsonify return) to either omit the "candidates" key or
replace each candidate's full content with a safe summary/only IDs unless a
debug flag is enabled (e.g., check a debug/config flag passed into the handler
or an environment variable like DEBUG_MODE), so production responses do not
expose stored memory content.

In `@automem/config.py`:
- Around line 102-111: The current defaults set
MEMORY_DEDUP_SIMILARITY_THRESHOLD to 0.70 which conflicts with the PR
description (≥0.85) and may cause excessive LLM calls; update the default to
"0.85" (i.e., change the env default passed into float(os.getenv(...)) for
MEMORY_DEDUP_SIMILARITY_THRESHOLD) or add a comment documenting that 0.70 is
intentionally permissive, and then run Black on this block to fix the CI
formatting warning; verify the related symbols MEMORY_DEDUP_ENABLED and
MEMORY_DEDUP_MODEL remain unchanged.

In `@automem/dedup.py`:
- Around line 28-57: The dedup prompt embeds user-controlled new_content
directly into DEDUP_PROMPT which creates a prompt-injection surface; extract the
instructional text into a new DEDUP_SYSTEM_PROMPT constant and send the user
data separately in the user message (e.g., build
messages=[{"role":"system","content":DEDUP_SYSTEM_PROMPT},
{"role":"user","content":f"NEW MEMORY:\\n{new_content}\\n\\nEXISTING
MEMORIES:\\n{existing_memories}"}]) so the LLM treats the rules as system
instructions and new_content as data; keep the original DEDUP_PROMPT text intact
when creating DEDUP_SYSTEM_PROMPT and update any call sites that reference
DEDUP_PROMPT to use the new messages format and continue validating target_id
values as done in memory.py.

Comment thread automem/api/memory.py
Comment thread automem/api/memory.py
Comment thread automem/api/memory.py
Comment thread automem/dedup.py
Comment thread automem/dedup.py
- Validate target_id against candidate IDs to prevent LLM hallucination
- Remove redundant enqueue_embedding call (sync path handles it)
- Preserve existing tags/metadata on UPDATE (fetch before overwrite)
- Fix SUPERSEDE: DELETE → DETACH DELETE for nodes with relationships
- Use defensive Qdrant PointIdsList selector (matches delete endpoint)
- Remove unused Filter import
- Guard against None LLM response content
jack-arturo added a commit that referenced this pull request Mar 2, 2026
Records evaluation outcomes from initial experiment round:
- #73 min_score threshold: neutral (needs #78 for score differentiation)
- PR #80 enhanced recall: blocked by merge conflicts, needs rebase
- PR #87 write-time dedup: neutral on recall as expected

Made-with: Cursor
jack-arturo added a commit that referenced this pull request Mar 2, 2026
## Summary

- Adds evaluation results from the initial experiment round to
`benchmarks/EXPERIMENT_LOG.md`
- #73 min_score threshold: neutral (needs #78 for score differentiation
first)
- PR #80 enhanced recall: blocked by merge conflicts with main, needs
rebase
- PR #87 write-time dedup: neutral on recall (expected — dedup is a
write-path change)

## Test plan

- [x] No code changes, documentation only

Made with [Cursor](https://cursor.com)
jack-arturo added a commit that referenced this pull request Mar 2, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.13.0](v0.12.0...v0.13.0)
(2026-03-02)


### Features

* **bench:** benchmark testing infrastructure for rapid iteration
([#97](#97))
([80a6f93](80a6f93))
* **recall:** add min_score threshold and adaptive floor filtering
([#73](#73))
([#101](#101))
([8df3c08](8df3c08))
* **viewer:** add standalone graph-viewer runtime files
([5bcb6db](5bcb6db))
* **viewer:** consolidate stable core and split-ready compatibility
([#94](#94))
([958da72](958da72))
* **viewer:** externalize visualizer with /viewer compatibility routes
([29bafcf](29bafcf))
* **viewer:** merge visualizer stable core branch
([96b27bf](96b27bf))


### Bug Fixes

* FalkorDB data not persisting across restarts
([3bbc834](3bbc834))
* FalkorDB data not persisting across restarts
([#99](#99))
([8490d36](8490d36))
* **mcp-sse:** sync tool schemas for SSE/MCP parity
([#104](#104))
([d99b86d](d99b86d))


### Documentation

* **bench:** add PR
[#73](#73),
[#80](#80), and
[#87](#87) experiment
results ([#103](#103))
([8533fac](8533fac))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@jescalan
Copy link
Copy Markdown
Contributor Author

jescalan commented Mar 2, 2026

Looks like this was landed separately - shall I close the PR here?

@jack-arturo
Copy link
Copy Markdown
Member

@jescalan Not yet. We've got a lot of good ideas pending in the PR and Issues list, so I wanted to set up baseline benchmark score across LoCoMo, LongMemEval, and an internal benchmark system. Then we can A/B test against PRs, or tweak defaults, to see what gets the best result.

#103 adds that test suite, and I ran preliminary benchmarks against this PR, but the scores are currently skewed due to the issues with the decay rate described in #78 , and then there's the start of a fix in #105 which looks promising... I'm running that in production now, for the next 5 days, to confirm it fixes the decay issue.

Then will re-run benchmarks to get a baseline, re-run them against this and the other open PRs that affect storage / recall quality / general noise, and tweak and merge based on the results.

You can track all that in EXPERIMENT_LOG.md, which will carry between the tests and get updated with each result.

So.... It's a big set of changes that should significantly improve AutoMem's performance and recall quality, and also help us build out the docs with hard numbers on things like— "why did we choose Voyage 1024d as the default embedding model?" -> "Here's why, and here are when and why you might want to select alternatives X, Y, or Z, and what to expect.".

But it's a big-ish project 😅, and I want to make sure it all remains backwards compatible, especially with older MCP clients, and any changes we make are not just "that sounds sensible," but we can back them up with hard numbers.

Keep the PR open ✅. I will post back here with test results once the underlying issue with the decay curve is worked out.

Thanks for contributing! 🧡 I'll be updating the testing docs with the new tools in the next day or two. If you want to jump in and try some A/B tests against experiments, it will be a lot faster (and more fun 🤓), after this round of updates.

@jescalan
Copy link
Copy Markdown
Contributor Author

jescalan commented Mar 3, 2026

Very reasonable, sounds good! Appreciate the detailed reply. Let me know if there's anything else I can help out with here 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants