fix(hermes): mirror builtin memory writes through on_memory_write#207
Open
ferminquant wants to merge 1 commit into
Open
fix(hermes): mirror builtin memory writes through on_memory_write#207ferminquant wants to merge 1 commit into
ferminquant wants to merge 1 commit into
Conversation
The provider's on_memory_write hook was a no-op with a TODO. The intent — feed Hermes's builtin MEMORY.md / USER.md writes into the L1 index so dedup and L3 persona building can see them — was scaffolded but never wired. The two memory systems ran in parallel with no cross-pollination: a user preference the builtin layer recorded was invisible to the L1 store, so a later conversation about the same preference would not be deduped and would not show up in the L3 persona the agent sees. This change routes the builtin write through client.capture() as a synthetic turn. The L1 extractor sees the content, the dedup layer can match against it, and L3 picks it up on the next persona refresh. The (memory_write action=... target=...) prefix lets a future LLM-side filter recognize and skip these entries during scene extraction if the natural-turn framing becomes a problem. The hook is fire-and-forget: failures are swallowed at the debug log level. The builtin write already succeeded in Hermes's own store; a mirror failure is a soft-loss of cross-system visibility, not a write failure, and the next sync_turn() in the same session will eventually re-surface the same content through normal conversation capture. Refs TencentCloud#205 (sub-bullet 1 only; sub-bullet 2 is out of scope) Signed-off-by: Fermin Quant <ferminquant@users.noreply.github.com>
Collaborator
|
Thanks for the PR! We'll take a look at the on_memory_write hook implementation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The provider's
on_memory_writehook was a no-op with a TODO. The intent — feed Hermes's builtinMEMORY.md/USER.mdwrites into the L1 index so dedup and L3 persona building can see them — was scaffolded but never wired. The two memory systems ran in parallel with no cross-pollination.This routes the builtin write through
client.capture()as a synthetic turn. The L1 extractor sees the content, the dedup layer can match against it, and L3 picks it up on the next persona refresh.Refs #205 (sub-bullet 1 only; sub-bullet 2 is out of scope)
Change Type
Root Cause
plugin.yamladvertises two hooks:The provider's
on_session_endis implemented (it callsclient.end_session()to flush the L1/L2/L3 pipeline on session end). The provider'son_memory_writewas:The TODO comment is the giveaway — unfinished work, not a deliberate "do nothing." The hook was scaffolded and abandoned. Effect:
USER.mdis invisible to the TencentDB L1 index.So paying the L1/L2/L3 compute while the built-in memory layer is also writing means the two systems run in parallel with no cross-pollination — worst of both worlds. Same issue reporter flagged this as sub-bullet 1 of #205.
Changes
hermes-plugin/memory/memory_tencentdb/__init__.py— replace thepassbody with a fire-and-forget call toclient.capture(). The write is encoded as a synthetic turn with a(memory_write action=… target=…)prefix onuser_content, which gives the L1 extractor enough signal to treat it as a memory write and a future LLM-side filter a marker to recognize and skip these entries during scene extraction.hermes-plugin/memory/memory_tencentdb/tests/test_on_memory_write_mirror.py— 5 new tests:test_writes_are_forwarded_to_capture— happy path: action, target, and content all reachclient.capture()in the right shape, with session/user context threaded throughtest_no_capture_when_gateway_unavailable— Gateway down → no call, no exceptiontest_no_capture_when_client_is_none— supervisor never started → no call, no exceptiontest_capture_failure_is_swallowed— capture raises → caller never sees the exceptiontest_pre_fix_hook_did_nothing— negative control: the test would have failed onmainbecause apassbody never callsclient.capture()No new public API surface. No client-side change (the existing
client.capture()is the right primitive). No gateway-side change. No new dependencies.Defensive shape
The hook is fire-and-forget for two reasons:
MEMORY.md/USER.mdby the time the hook fires. We are mirroring, not replicating. A mirror failure is a soft-loss of cross-system visibility, not a write failure.on_memory_writeraises, Hermes's host wiring may surface an error to the user, or retry, or roll back the builtin write. None of those are right — the user only ever asked Hermes to remember, and Hermes did remember.The fallback is automatic: the same content will re-surface through normal
sync_turn()capture the next time it comes up in conversation, and the L1 extractor will dedup or extend as appropriate.Self-test Checklist
test_gateway_shutdown_leak.pywas not run in this branch — pre-existing failures onmainare unrelated to this hook (they concern supervisor teardown, not the L1 mirror path). Same scope discipline as #206.Additional Notes
On the
(memory_write …)prefix: The L1 extractor runs on the natural-turn framing by default. A synthetic entry with no marker could pollute the L1 store with what looks like a real conversation. The prefix gives downstream code a stable string to grep on if the natural-turn framing ever needs to be filtered out — e.g. if the L2 scene extractor starts grouping synthetic turns into scene blocks in unwanted ways. None of that filtering is implemented today; the marker is just future-proofing.On skipping the L3 inference cost: This PR does not call
/recallfromon_memory_write. That is intentional. The L1 dedup / extraction pipeline runs on a schedule (pipeline.everyNConversations, default every 5 turns), and L3 runs every 50 new L1 memories. The write goes into L1 immediately on capture; L3 picks it up on its own schedule. Calling/recallhere would burn an LLM call per write with no benefit, since L3 has its own trigger.On the test pattern: The test file mirrors the sys.path injection used by
test_l3_persona_injection.pyandtest_memory_tencentdb_recovery.py— it works under a hermes-agent checkout and skips cleanly in a plugin-only repo. Five focused tests, all mock-based, no real network or process spawning.