fix(hermes): inject L3 persona into system_prompt_block#206
Open
ferminquant wants to merge 1 commit into
Open
Conversation
The Hermes provider's system_prompt_block() returned a static string
describing the memory layers, ignoring the L3 persona that the Gateway
had already generated and surfaced via the core's auto-recall
pipeline. L3 compute was paid but the agent never saw the result, so
every conversation started from cold on the persona dimension.
This change wires the existing data path end-to-end:
* The Gateway's /recall response now includes recalledL3_persona,
populated from the auto-recall pipeline (already populated on the
core side, just not exposed over the wire).
* The Hermes provider's system_prompt_block() fetches it via the
existing /recall client, caches for 60s keyed on the provider
instance (which Hermes re-creates per session), and appends it
to the system prompt as a '## User Persona' block when present.
* On the very first call, a failed /recall leaves the cache empty
and the next call retries — so a Gateway that is down at startup
does not poison the cache. Once a fetch succeeds, subsequent
failures within the TTL fall back to the cached value rather
than retrying on every turn.
No new external dependencies. No new public API surface beyond the
one new optional field on the Gateway /recall response. Older
Gateways that predate the field are tolerated: clients read the
field with .get() and treat its absence as 'no persona available'.
Refs TencentCloud#205 (partial — fixes the main bullet only; sub-bullets 1 and 2 are tracked separately and out of scope for this PR)
Signed-off-by: Fermin Quant <ferminquant@users.noreply.github.com>
3332df1 to
e18283b
Compare
6 tasks
Collaborator
|
Thanks for the contribution! We'll review the L3 persona injection approach. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The Hermes provider's
system_prompt_block()returned a static string and ignored the L3 persona that the Gateway had already generated and exposed via the core's auto-recall pipeline. L3 compute was paid but the agent never saw the result.This wires the existing data path end-to-end: the Gateway's
/recallresponse now carriesrecalledL3_persona, and the Hermes provider fetches it (with a short TTL cache) and appends it to the system prompt as a## User Personablock when present.Refs #205 (partial — fixes the main bullet only; sub-bullets 1 and 2 are tracked separately and intentionally out of scope for this PR)
Change Type
Root Cause
Two missing links in the L3 delivery chain:
src/gateway/server.ts:handleRecallbuilt theRecallResponsefromappendSystemContextand the L1 count, but never copiedresult.recalledL3Persona(which the core had already populated insrc/core/hooks/auto-recall.ts:238) into the wire response. The field existed in the core'sRecallResulttype and was being thrown away at the HTTP boundary.hermes-plugin/memory/memory_tencentdb/__init__.py:system_prompt_blockreturned a hard-coded string describing the four layers. Even if the Gateway had returned the persona, the provider would have ignored it. Verified by reading the function and by adding the negative-control test (TestNegativeControl::test_pre_fix_static_block_does_not_contain_persona), which would have failed onmainbefore this fix.Changes
src/gateway/types.ts— add optionalrecalledL3_persona?: string | nulltoRecallResponse. Documented asundefined-tolerant so older clients keep working.src/gateway/server.ts— populate the new field inhandleRecallfromresult.recalledL3Persona ?? null. One-line change.hermes-plugin/memory/memory_tencentdb/__init__.py—system_prompt_block()now fetches/recall(empty query, user_id-keyed), caches the result for 60s, and appends a## User Personasection when non-empty. Cache key is implicit on the provider instance, which Hermes re-creates per session, so a session switch cannot serve stale persona to the wrong scope.__init__gainsself._persona_cache: Dict[str, Any]._get_cached_persona()and_fetch_persona(). Both swallow exceptions sosystem_prompt_block()remains a pure function.hermes-plugin/memory/memory_tencentdb/tests/test_l3_persona_injection.py— 9 new tests:main)Cache Semantics
ts— next call retriestsThe asymmetry on failure (don't advance
tson the first failure, but do fall back to a stale value on later failures) is intentional: a fresh-empty cache with a sick Gateway should not be stuck for 60 seconds before retrying; a warm cache with a sick Gateway should not retry on every turn.Self-test Checklist
Pre-existing failures:
test_gateway_shutdown_leak.pyfails onmainwith'NoneType' object has no attribute 'client'from__init__.py:785(now line 793 after this PR). Confirmed by stashing this branch's changes and running the test onmain— same failure, same line. Not in scope for #205.Additional Notes
On the field name
recalledL3_persona: I deliberately used snake_case (Python-style) on the wire, matching the existing Python client's reading style and the rest of theRecallResponsekeys (memory_count,session_key). The core's internal field is camelCase (recalledL3Persona) —server.tsis the boundary that translates.On
query=""in_fetch_persona: The persona is keyed onuser_idand the L3 pipeline runs on its own schedule, so the query field is irrelevant for the persona portion of the response. The Gateway's/recallhandler accepts an emptyqueryand will return whatever persona content the L3 pipeline has produced. If you'd rather have a no-op-only endpoint, the cleanest follow-up is a dedicatedGET /personaroute — happy to do that as a follow-up PR if you want, but it's not required for this fix.On the 60s TTL: Arbitrary; the L3 pipeline regenerates every 50 new L1 memories by default, so 60s is a comfortable lower bound. Configurable via
self._PERSONA_CACHE_TTL_SECSif you want a different default — left as a class attribute rather than a config-schema change to keep this PR narrow.