Skip to content

fix(hermes): inject L3 persona into system_prompt_block#206

Open
ferminquant wants to merge 1 commit into
TencentCloud:mainfrom
ferminquant:fix/205-hermes-l3-persona-system-prompt
Open

fix(hermes): inject L3 persona into system_prompt_block#206
ferminquant wants to merge 1 commit into
TencentCloud:mainfrom
ferminquant:fix/205-hermes-l3-persona-system-prompt

Conversation

@ferminquant

@ferminquant ferminquant commented Jun 13, 2026

Copy link
Copy Markdown

Description

The Hermes provider's system_prompt_block() returned a static string and ignored the L3 persona that the Gateway had already generated and exposed via the core's auto-recall pipeline. L3 compute was paid but the agent never saw the result.

This wires the existing data path end-to-end: the Gateway's /recall response now carries recalledL3_persona, and the Hermes provider fetches it (with a short TTL cache) and appends it to the system prompt as a ## User Persona block when present.

Refs #205 (partial — fixes the main bullet only; sub-bullets 1 and 2 are tracked separately and intentionally out of scope for this PR)

Change Type

  • Bug fix
  • New feature
  • Documentation update
  • Code optimization

Root Cause

Two missing links in the L3 delivery chain:

  1. Gateway sidesrc/gateway/server.ts:handleRecall built the RecallResponse from appendSystemContext and the L1 count, but never copied result.recalledL3Persona (which the core had already populated in src/core/hooks/auto-recall.ts:238) into the wire response. The field existed in the core's RecallResult type and was being thrown away at the HTTP boundary.
  2. Provider sidehermes-plugin/memory/memory_tencentdb/__init__.py:system_prompt_block returned a hard-coded string describing the four layers. Even if the Gateway had returned the persona, the provider would have ignored it. Verified by reading the function and by adding the negative-control test (TestNegativeControl::test_pre_fix_static_block_does_not_contain_persona), which would have failed on main before this fix.

Changes

  • src/gateway/types.ts — add optional recalledL3_persona?: string | null to RecallResponse. Documented as undefined-tolerant so older clients keep working.
  • src/gateway/server.ts — populate the new field in handleRecall from result.recalledL3Persona ?? null. One-line change.
  • hermes-plugin/memory/memory_tencentdb/__init__.pysystem_prompt_block() now fetches /recall (empty query, user_id-keyed), caches the result for 60s, and appends a ## User Persona section when non-empty. Cache key is implicit on the provider instance, which Hermes re-creates per session, so a session switch cannot serve stale persona to the wrong scope.
    • __init__ gains self._persona_cache: Dict[str, Any].
    • New helpers _get_cached_persona() and _fetch_persona(). Both swallow exceptions so system_prompt_block() remains a pure function.
  • hermes-plugin/memory/memory_tencentdb/tests/test_l3_persona_injection.py — 9 new tests:
    • 4 for the core behavior (static fallback, persona injected, empty-string treated as absent, null treated as absent)
    • 3 for the cache (TTL hits, TTL expires, no-cache-on-failure for empty cache)
    • 1 for "Gateway died mid-session" (fall back to cached value when a later fetch fails)
    • 1 negative control (the regression test that would have failed on main)

Cache Semantics

Scenario Behavior
First call, Gateway up, persona present Fetch, cache for 60s, return persona
First call, Gateway up, no persona yet Fetch, cache empty for 60s, return static block
First call, Gateway down Return static block, do not advance ts — next call retries
Subsequent call within TTL, Gateway up Return cached value, no network call
Subsequent call within TTL, Gateway down Return cached value (no network)
Subsequent call after TTL, Gateway down Return previous cached value, do not advance ts

The asymmetry on failure (don't advance ts on the first failure, but do fall back to a stale value on later failures) is intentional: a fresh-empty cache with a sick Gateway should not be stuck for 60 seconds before retrying; a warm cache with a sick Gateway should not retry on every turn.

Self-test Checklist

  • Verified locally
  • No existing features affected
hermes-plugin/memory/memory_tencentdb/tests/test_l3_persona_injection.py .......  9 passed
hermes-plugin/memory/memory_tencentdb/tests/test_memory_tencentdb_recovery.py ......  17 passed
hermes-plugin/memory/memory_tencentdb/tests/test_gateway_shutdown_leak.py ...  2 failed, 16 passed (pre-existing on main, not caused by this PR)

Pre-existing failures: test_gateway_shutdown_leak.py fails on main with 'NoneType' object has no attribute 'client' from __init__.py:785 (now line 793 after this PR). Confirmed by stashing this branch's changes and running the test on main — same failure, same line. Not in scope for #205.

Additional Notes

On the field name recalledL3_persona: I deliberately used snake_case (Python-style) on the wire, matching the existing Python client's reading style and the rest of the RecallResponse keys (memory_count, session_key). The core's internal field is camelCase (recalledL3Persona) — server.ts is the boundary that translates.

On query="" in _fetch_persona: The persona is keyed on user_id and the L3 pipeline runs on its own schedule, so the query field is irrelevant for the persona portion of the response. The Gateway's /recall handler accepts an empty query and will return whatever persona content the L3 pipeline has produced. If you'd rather have a no-op-only endpoint, the cleanest follow-up is a dedicated GET /persona route — happy to do that as a follow-up PR if you want, but it's not required for this fix.

On the 60s TTL: Arbitrary; the L3 pipeline regenerates every 50 new L1 memories by default, so 60s is a comfortable lower bound. Configurable via self._PERSONA_CACHE_TTL_SECS if you want a different default — left as a class attribute rather than a config-schema change to keep this PR narrow.

The Hermes provider's system_prompt_block() returned a static string
describing the memory layers, ignoring the L3 persona that the Gateway
had already generated and surfaced via the core's auto-recall
pipeline. L3 compute was paid but the agent never saw the result, so
every conversation started from cold on the persona dimension.

This change wires the existing data path end-to-end:

  * The Gateway's /recall response now includes recalledL3_persona,
    populated from the auto-recall pipeline (already populated on the
    core side, just not exposed over the wire).
  * The Hermes provider's system_prompt_block() fetches it via the
    existing /recall client, caches for 60s keyed on the provider
    instance (which Hermes re-creates per session), and appends it
    to the system prompt as a '## User Persona' block when present.
  * On the very first call, a failed /recall leaves the cache empty
    and the next call retries — so a Gateway that is down at startup
    does not poison the cache. Once a fetch succeeds, subsequent
    failures within the TTL fall back to the cached value rather
    than retrying on every turn.

No new external dependencies. No new public API surface beyond the
one new optional field on the Gateway /recall response. Older
Gateways that predate the field are tolerated: clients read the
field with .get() and treat its absence as 'no persona available'.

Refs TencentCloud#205 (partial — fixes the main bullet only; sub-bullets 1 and 2 are tracked separately and out of scope for this PR)

Signed-off-by: Fermin Quant <ferminquant@users.noreply.github.com>
@Maxwell-Code07

Copy link
Copy Markdown
Collaborator

Thanks for the contribution! We'll review the L3 persona injection approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants