Summary
When TencentDB-Agent-Memory is used as a Hermes memory provider, the gateway supervisor exhibits two related lifecycle problems:
- Repeated
EADDRINUSE: 127.0.0.1:8420 failures when multiple supervisors / threads race to (re)spawn the gateway, producing log spam and a recovery storm.
- The TencentDB gateway subprocess tree can outlive a host gateway restart/termination event — in one later cleanup event, the already-spawned gateway tree continued listening on
:8420 after the host gateway process had exited, leaving subsequent host starts to either hit EADDRINUSE or silently reuse a process no longer supervised by that host instance.
Together these produce sustained gateway churn even when nothing else has changed.
Environment
- OS: macOS (
launchd-managed Hermes gateway)
- Host agent: Hermes agent (multi-profile setup)
- TencentDB-Agent-Memory gateway: local Node/tsx gateway invoked via
npx tsx src/gateway/server.ts
- Bind address:
127.0.0.1:8420
- Hermes config:
memory.provider: memory_tencentdb
Observed behavior
Symptom 1 — EADDRINUSE storm
Over a multi-day observation window, host's agent.log contained:
WARNING memory-tencentdb: memory-tencentdb watchdog: Gateway unreachable; attempting to resurrect.
WARNING memory-tencentdb: memory-tencentdb Gateway appears down; attempting to resurrect.
INFO memory-tencentdb.supervisor: Starting memory-tencentdb Gateway: sh -c '... npx tsx src/gateway/server.ts'
ERROR memory-tencentdb.supervisor: memory-tencentdb Gateway process exited with code 1 during startup. ...
code: 'EADDRINUSE'
errno: -48
syscall: 'listen'
address: '127.0.0.1'
port: 8420
INFO memory-tencentdb.supervisor: memory-tencentdb Gateway already running at http://127.0.0.1:8420
INFO memory-tencentdb: memory-tencentdb Gateway recovery succeeded.
This sequence repeats. In one observation window:
agent.log contains 55 occurrences of EADDRINUSE
- TencentDB stderr log contains 118 occurrences of
EADDRINUSE
agent.log contains 59 occurrences of Gateway appears down
The watchdog and recovery succeed in the steady state, but on each restart cycle the storm reoccurs.
Symptom 2 — Gateway tree outliving host restart/termination
In one incident, the TencentDB gateway tree was first observed as a normal child tree of the Hermes host gateway:
$ ps -p 39754,39769,39770 -o pid,ppid,lstart,command
PID PPID STARTED COMMAND
39754 60654 Mon May 25 22:37:03 2026 npm exec tsx src/gateway/server.ts
39769 39754 Mon May 25 22:37:03 2026 node .../.bin/tsx src/gateway/server.ts
39770 39769 Mon May 25 22:37:04 2026 node ... src/gateway/server.ts
$ lsof -nP -iTCP:8420 -sTCP:LISTEN
node 39770 <user> 24u IPv4 TCP 127.0.0.1:8420 (LISTEN)
The host gateway (PID 60654) had spawned this TencentDB gateway tree at 22:37:03. At this point the tree was not yet orphaned; it was still correctly parented under the host gateway.
During a later launchctl kickstart -k / restart cleanup of the host gateway, the host process exited (ps -p 60654 returned no rows), while the same TencentDB gateway process family continued to run and kept :8420 occupied. In that later cleanup state, the surviving subprocesses were no longer owned by the host gateway instance and had to be cleaned up explicitly via:
pkill -TERM -f 'tdai-memory-openclaw-plugin.*src/gateway/server.ts'
A subsequent host gateway restart would either:
- Detect "Gateway already running at http://127.0.0.1:8420" via supervisor's reuse-existing path (silent reuse, but no actual supervision)
- Or hit
EADDRINUSE if it tried to spawn directly
This report is intentionally separating the two observations: the 22:37 process listing proves the host-spawned tree and port owner; the later restart/termination observation is what motivates the "outlives host" lifecycle claim.
Expected behavior
The Hermes integration / gateway supervisor should:
- Pre-spawn health check: treat an already-healthy gateway on
127.0.0.1:8420 as reusable instead of spawning another process.
- Single-flight startup: use a startup lock, pidfile, or single-flight guard to prevent concurrent gateway spawns from multiple supervisor threads.
- Ownership tracking: distinguish between gateways spawned by this supervisor instance and externally-owned ones.
- Subprocess group cleanup: spawn the Node/tsx gateway with a clear ownership model and ensure host restart/termination takes down or deliberately detaches the entire subprocess tree (e.g. by sending a signal to the process group, or by using a parent process that monitors the host and propagates termination).
- Recall/capture non-blocking: ensure none of recall/capture/recovery can prevent the host agent from sending its main response (see related issue/enhancement to be filed separately).
- Clearer diagnostics: surface distinct log messages for:
- gateway already running and healthy (no action)
- gateway owned by this supervisor (action: nothing or graceful restart)
- gateway owned by another process / orphan (action: warn, do not silently reuse)
- gateway unhealthy (action: terminate + respawn)
- gateway startup failed (action: backoff + retry with reason)
Impact
In a real Hermes-agent deployment over ~7 days, this caused:
- Repeated gateway restart/recovery attempts (hundreds of log lines)
- A local gateway process continuing to listen on
:8420 after the host gateway restart/termination cleanup path
- Memory provider appearing "active" (port held) even after host-side memory was supposed to be disabled; this part is primarily a Hermes config semantics problem and is only included here as impact context
- Suspected response pipeline stalls after model completion but before message delivery (time-correlation only, needs upstream reproduction to confirm causality)
The last two points may be host-integration specific, but the gateway lifecycle and EADDRINUSE behavior are directly observable TencentDB-Agent-Memory-side symptoms.
Suggested fix sketch
// pseudocode in src/gateway/supervisor.ts
async function ensureGatewayRunning() {
// 1. Health probe first
if (await probeHealthy("127.0.0.1:8420")) {
return existingHealthy;
}
// 2. Single-flight lock (pidfile or atomic file lock)
using lock = await tryAcquireStartLock();
if (!lock) return await waitForOther();
// 3. If port is held but unhealthy, identify owner
const ownerPid = await findPortOwner(8420);
if (ownerPid && !isOurChild(ownerPid)) {
log.warn(`port 8420 held by external PID ${ownerPid} — refusing to spawn`);
return null;
}
// 4. Spawn with an explicit ownership model so host restart/termination cleans or detaches the tree intentionally
const child = spawn("npx", ["tsx", "src/gateway/server.ts"], {
detached: false, // child dies with parent
stdio: ["ignore", logFile, logFile],
});
process.on("SIGTERM", () => {
try { process.kill(-child.pid, "SIGTERM"); } catch {}
});
return child;
}
Related
- This issue focuses on gateway lifecycle. A separate issue will be filed for "recall path should be best-effort sidecar / non-blocking" since that is a distinct surface area.
- This issue does NOT include the host-side
memory_enabled config naming confusion, which is upstream to the host (hermes-agent) and not to TencentDB-Agent-Memory.
- A related case of L1 over-generalization producing persona pollution is documented in #48.
Summary
When
TencentDB-Agent-Memoryis used as a Hermes memory provider, the gateway supervisor exhibits two related lifecycle problems:EADDRINUSE: 127.0.0.1:8420failures when multiple supervisors / threads race to (re)spawn the gateway, producing log spam and a recovery storm.:8420after the host gateway process had exited, leaving subsequent host starts to either hitEADDRINUSEor silently reuse a process no longer supervised by that host instance.Together these produce sustained gateway churn even when nothing else has changed.
Environment
launchd-managed Hermes gateway)npx tsx src/gateway/server.ts127.0.0.1:8420memory.provider: memory_tencentdbObserved behavior
Symptom 1 — EADDRINUSE storm
Over a multi-day observation window, host's
agent.logcontained:This sequence repeats. In one observation window:
agent.logcontains 55 occurrences ofEADDRINUSEEADDRINUSEagent.logcontains 59 occurrences ofGateway appears downThe watchdog and recovery succeed in the steady state, but on each restart cycle the storm reoccurs.
Symptom 2 — Gateway tree outliving host restart/termination
In one incident, the TencentDB gateway tree was first observed as a normal child tree of the Hermes host gateway:
The host gateway (PID 60654) had spawned this TencentDB gateway tree at 22:37:03. At this point the tree was not yet orphaned; it was still correctly parented under the host gateway.
During a later
launchctl kickstart -k/ restart cleanup of the host gateway, the host process exited (ps -p 60654returned no rows), while the same TencentDB gateway process family continued to run and kept:8420occupied. In that later cleanup state, the surviving subprocesses were no longer owned by the host gateway instance and had to be cleaned up explicitly via:A subsequent host gateway restart would either:
EADDRINUSEif it tried to spawn directlyThis report is intentionally separating the two observations: the 22:37 process listing proves the host-spawned tree and port owner; the later restart/termination observation is what motivates the "outlives host" lifecycle claim.
Expected behavior
The Hermes integration / gateway supervisor should:
127.0.0.1:8420as reusable instead of spawning another process.Impact
In a real Hermes-agent deployment over ~7 days, this caused:
:8420after the host gateway restart/termination cleanup pathThe last two points may be host-integration specific, but the gateway lifecycle and
EADDRINUSEbehavior are directly observable TencentDB-Agent-Memory-side symptoms.Suggested fix sketch
Related
memory_enabledconfig naming confusion, which is upstream to the host (hermes-agent) and not to TencentDB-Agent-Memory.