Skip to content

fix(daemon): resolve FD exhaustion and event loop blocking with thousands of memory artifacts#513

Merged
NicholaiVogel merged 21 commits intoSignet-AI:mainfrom
Ostico:ostico/fd-diagnostic
Apr 16, 2026
Merged

fix(daemon): resolve FD exhaustion and event loop blocking with thousands of memory artifacts#513
NicholaiVogel merged 21 commits intoSignet-AI:mainfrom
Ostico:ostico/fd-diagnostic

Conversation

@Ostico
Copy link
Copy Markdown
Contributor

@Ostico Ostico commented Apr 15, 2026

Summary

Fixes #512. The daemon became unresponsive after startup when thousands of memory artifact files existed. Three independent root causes were identified through instrumented profiling and strace analysis, and all three have been addressed:

  1. Bun fs.watch() FD leak — Bun opens one O_RDONLY FD per file when watching a directory and close() never releases them (~8,000 leaked FDs)
  2. renderMemoryProjection() blocks the event loop — synchronous full-scan of ~7,600 files on every synthesis request (~1.5s block)
  3. Chokidar watched immutable files — canonical artifacts and backups not excluded from watcher ignore list

Changes

FD Leak Fix

  • Remove memory/ directory from chokidar watch paths — all files inside are canonical artifacts already excluded by the ignored matcher, so no watcher events would fire anyway
  • Add canonical artifact and backup filename patterns to watcher-ignore.ts as defense-in-depth

Synthesis Worker Thread (event loop unblocking)

  • synthesis-render-worker.ts — Worker script with independent SQLite + sqlite-vec connection
  • synthesis-worker-protocol.ts — Shared typed message protocol with type guards (zero as any casts)
  • hooks.tshandleSynthesisRequest() delegates to worker via postMessage, sync fallback if worker unavailable
  • daemon.ts — Worker lifecycle (spawn on startup, terminate on shutdown), logged error handling
  • db-accessor.tsinitDbAccessorLite() for lightweight DB access in worker isolate

Incremental Reindex + Import Short-circuit

  • reindexMemoryArtifacts() now tracks mtime, skips unchanged files
  • importExistingMemoryFiles() short-circuits when filtered file list is empty
  • Cold-start cache reconciled with DB, stale manifest refs cleared

Code Cleanup

  • Extracted checkCliAvailable() helper — deduplicated 4 identical CLI preflight spawn blocks
  • Removed dead runSynthesis() function (not exported, zero callers)
  • Removed 3 unused imports from hooks-routes.ts

Permanent Diagnostics

  • resource-monitor.ts — FD snapshot, event loop lag detection, periodic polling
  • Lifecycle snapshots at key daemon stages
  • /health endpoint extended with additive resources key

Type

  • feat — new user-facing feature (bumps minor)
  • fix — bug fix
  • refactor — restructure without behavior change
  • chore — build, deps, config, docs
  • perf — performance improvement
  • test — test coverage

Packages affected

  • @signet/core
  • @signet/daemon
  • @signet/cli / dashboard
  • @signet/sdk
  • @signet/connector-*
  • @signet/web
  • predictor
  • Other:

Screenshots

N/A — no UI changes.

PR Readiness (MANDATORY)

  • Spec alignment validated (INDEX.md + dependencies.yaml)
  • Agent scoping verified on all new/changed data queries
  • Input/config validation and bounds checks added
  • Error handling and fallback paths tested (no silent swallow)
  • Security checks applied to admin/mutation endpoints
  • Docs updated for API/spec/status changes
  • Regression tests added for each bug fix
  • Lint/typecheck/tests pass locally

Migration Notes (if applicable)

N/A — no migrations touched.

Testing

  • bun test passes
  • bun run typecheck passes (442 errors, 2 fewer than baseline — dead code removal)
  • bun run lint passes
  • Tested against running daemon
  • N/A

Stress Test Results

5 concurrent synthesis requests + 10 /health probes fired simultaneously:

Metric Before (sync) After (worker thread) Improvement
Synthesis p50 1.57s 128ms 12x
Synthesis max 2.03s 135ms 15x
Health max during synthesis ~1.5s stalls 9ms 167x
Mixed concurrent load (100 req) 10-25s responsive unblocked
FD delta 8,000+ leaked +2 (stable) resolved

Test Coverage

  • 13 existing tests pass
  • 12 new integration tests added (worker init, delegation, sync fallback, concurrent requests, timeout, error handling)
  • POC test validates worker thread + SQLite + sqlite-vec compatibility
  • Stress test script at tests/integration/synthesis-stress-test.sh

4-Agent Verification

  • F1 Plan Compliance: PASS (8/8 must-have, 4/4 must-not-have)
  • F2 Code Quality: PASS (25/25 tests, zero as any/@ts-ignore)
  • F3 Manual QA: PASS (3/3 scenarios, health <6ms during synthesis, FD delta=0)
  • F4 Scope Fidelity: PASS after triage (4 findings, all dismissed as non-issues)

AI disclosure

  • No AI tools were used in this PR
  • AI tools were used (see Assisted-by tags in commits)

Assisted-by: OpenCode:claude-opus-4-6

Notes

Known Limitations (tracked as tech debt, non-blocking)

  • Worker uses console.error instead of structured logger (V8 isolate has no access to daemon's logger instance)
  • No automatic worker respawn — if the worker crashes, daemon falls back to synchronous rendering until restart
  • configurePragmas duplicated in POC test (test-only, not production code)

Root Cause: Bun Runtime Bug

The FD leak is a confirmed Bun runtime bug: fs.watch() on a directory opens one O_RDONLY FD per file, and close() does not release them. Node.js uses a single inotify FD for the same operation. Reproduction script available in issue #512. This fix works around the bug by not watching the memory/ directory at all.

Ostico added 12 commits April 15, 2026 10:07
Track FD counts, event loop lag, and memory usage at daemon
lifecycle points. Expose resource snapshot on /health under
a new additive `resources` key. Gate periodic polling behind
debug log level.

Ref: Signet-AI#512
…est refs

Cold-start cache seeding now queries DB source_path rows so files
deleted while the daemon was down are detected and their DB rows
cleaned up during the first reindex after restart.

syncManifestRefs now tracks previous ledger refs via prevLedgerRefs
so manifests that drop out of the clipped ledger have their
memory_md_refs cleared even when their disk files did not change.

resetProjectionPurgeState clears all module-level caches for proper
test isolation.

Includes regression tests for both fixes.
Move renderMemoryProjection() to a node:worker_threads worker to
eliminate ~1.5s event loop blocking during MEMORY.md synthesis.

- Add synthesis-render-worker with own SQLite + sqlite-vec connection
- Add typed message protocol in synthesis-worker-protocol.ts
- Integrate worker lifecycle in daemon startup/shutdown
- Graceful sync fallback when worker is unavailable
- Extract checkCliAvailable() helper, deduplicate 4 CLI preflight blocks
- Remove dead runSynthesis function and unused imports
- 12 new integration tests, POC test, stress test script
@PR-Reviewer-Ant
Copy link
Copy Markdown
Collaborator

Hi @Ostico - I'm an automated code reviewer powered by pr-reviewer. I'm taking a look at the fixes in fix(daemon): resolve FD exhaustion and event loop blocking with thousands of memory artifacts (commit d56e2cad) now and I'll follow up shortly with feedback.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: d56e2cad

[Automated Review by pr-reviewer] I found blocking correctness issues in the new synthesis worker fallback path and incremental artifact reindex logic.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The issues are directly visible in changed code paths: worker timeout/error handling in hooks.ts now returns an empty prompt instead of rendering synchronously, and reindexMemoryArtifacts() in memory-lineage.ts skips tombstoned/invalid/out-of-scope files without deleting existing DB rows, which can leave stale artifacts.

indexBlock: rendered.indexBlock,
};
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: on worker timeout/error this resolves to an empty synthesis payload (prompt: "") instead of falling back to synchronous renderMemoryProjection(). That diverges from the PR goal/description of sync fallback when worker is unavailable and can cause lost synthesis output under transient worker failures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug — intentional design. The sync fallback is for when the worker is unavailable (worker === null), not for when it fails mid-request.

Three failure modes, three deliberate responses:

Condition Behavior Why
worker === null Sync renderMemoryProjection() fallback Worker unavailable (startup failure, previous crash) — safe one-off sync call
Worker timeout (30s) Empty prompt, auto-recovery on next request If we called sync after a 30s hang, we’d block the event loop for another ~1.5s — the exact problem this PR fixes. Plus, whatever caused the 30s hang (e.g., SQLite lock contention) would likely also stall sync rendering
Worker error Empty prompt, auto-recovery on next request The error/exit handlers in daemon.ts:2478-2485 set setSynthesisRenderWorker(null) — so the next synthesis request sees worker === null and uses sync fallback automatically

The auto-recovery path is: worker crashes → onError/onExit sets worker to null → next /api/hooks/synthesis call hits line 3608 (worker === null) → sync fallback fires.

Falling back to sync on timeout would create a dangerous cascade: worker hangs → 30s timeout → sync render blocks event loop for 1.5s → health checks stall → cascading degradation. Empty prompt for a single request is the safer degradation.

cache.set(path, mtime);
continue;
}
const match = path.match(/--([a-z2-7]{16})--/);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking data-integrity bug: incremental reindex now continues for tombstoned/invalid/out-of-scope files after updating cache, but does not delete pre-existing memory_artifacts rows for those paths. Stale rows can persist in DB and leak deleted/invalid sessions into projection/ledger results.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially valid — fixed in incoming push.

The claim overstates the tombstone case but is correct for invalid/scope-changed artifacts.

Tombstone case (NOT a real issue):
removeCanonicalSession() at line 1444-1467 already runs DELETE FROM memory_artifacts WHERE agent_id = ? AND session_token = ? AND rmSync() on the files. By the time the next reindex runs, the DB rows are gone and the files are deleted from disk. No stale row is possible for tombstoned sessions through the normal path.

Invalid artifact / scope-change cases (real gap, now fixed):
If a previously-indexed file later fails isValidArtifact() (corrupted hash, missing frontmatter) or changes agent_id (scope mismatch), the continue skipped the upsert but left the old DB row. The file still exists on disk, so the cleanup loop at lines 639-646 never catches it.

Practical impact is near-zero (artifact files are written once by the pipeline and essentially never modified), but it is an architectural correctness gap. Fixed by adding DELETE FROM memory_artifacts WHERE source_path = ? before each continue in all three branches (tombstone, invalid, scope-mismatch). The tombstone DELETE is defensive/harmless since removeCanonicalSession already cleaned the row.

Fix coming in the next push.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: 940d5096

[Automated Review by pr-reviewer] I found blocking correctness issues where the new synthesis worker path can silently return empty output and where global ledger-ref state can corrupt cross-agent manifest refs.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The failures are directly visible in changed control flow: worker timeout/error paths now resolve empty synthesis payloads in hooks.ts, and memory-lineage.ts introduces process-global prevLedgerRefs/lastChangedManifests used across agent-scoped renders. I did not run runtime repro here, but the logic-level regressions are concrete in the diff.

@@ -1257,9 +1333,26 @@ function renderIndexSection(indexBlock: string, base: ReadonlyArray<string>): st
return "";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: prevLedgerRefs/lastChangedManifests are module-global and not keyed by agentId. syncManifestRefs() now unions previous refs into the current update set, so renders for agent A can influence which manifests are rewritten during a render for agent B. This creates cross-agent state bleed and can clear or rewrite ledger refs in unrelated manifests (data integrity risk).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a new issue — pre-existing design, this PR narrows the scope.

prevLedgerRefs and lastChangedManifests are module-global, yes. But the OLD syncManifestRefs(refs) was broader:

// Before this PR:
const files = listCanonicalFiles().filter(path => path.endsWith("--manifest.md"));

Every render scanned ALL manifest files and set/cleared memory_md_refs based on the current agent’s ledger. Sequential renders for different agents already produced the same cross-agent ref thrashing — alice’s render sets her refs, bob’s render clears them because they’re not in bob’s set.

The new prevLedgerRefs optimization actually REDUCES this: instead of processing all manifests every time, it only touches changed manifests + previously-referenced ones that dropped out. The cross-agent interaction surface is narrower, not wider.

The root issue is that syncManifestRefs was never agent-scoped — that’s a pre-existing design choice, not a regression from this PR. Scoping it properly would be a separate improvement.

Worker creation had three unhandled failure modes that could crash
the daemon process: constructor throw, unhandled error event during
module evaluation, and worker exit before init response. Wrap in
try/catch and attach error/exit listeners during the init phase so
failures fall back to sync rendering instead of process crash.
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: 8dddb952

[Automated Review by pr-reviewer] I found correctness regressions that should be fixed before merge, including a synthesis fallback gap and stale-artifact indexing behavior.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The issues are directly visible in changed logic: worker timeout/error paths in hooks.ts now return empty output instead of rendering, and reindexMemoryArtifacts now short-circuits unchanged files before tombstone checks in memory-lineage.ts, which can leave deleted sessions indexed.

});

const fileSet = new Set(files);
for (const path of files) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental reindex checks cache.get(path) === mtime before evaluating tombstones/session validity. If a session is tombstoned while files are unchanged, those rows will be skipped and can remain in memory_artifacts indefinitely, causing stale projection/index state. Tombstone invalidation needs to be evaluated even when file mtime is unchanged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug — removeCanonicalSession() already handles cleanup.

The claim assumes tombstoned sessions can have stale memory_artifacts rows. Looking at removeCanonicalSession() (line 1453-1479), it performs three cleanup steps in the same transaction:

  1. Inserts tombstone record into memory_artifact_tombstones
  2. DELETE FROM memory_artifacts WHERE agent_id = ? AND session_token = ? — rows gone
  3. rmSync() on all artifact files — files gone from disk

After tombstoning, there are no DB rows to become stale. On next reindex, if any cache entries reference deleted files, the cleanup loop at lines 648-655 catches them:

for (const path of cache.keys()) {
    if (fileSet.has(path)) continue; // still on disk
    // Not on disk → delete any lingering DB row
    db.prepare("DELETE FROM memory_artifacts WHERE source_path = ?").run(relativePath(path));
    cache.delete(path);
}

The mtime cache skip (line 613) only applies to files that exist on disk with unchanged content. Tombstoned files don’t exist on disk — they’re caught by the cache cleanup, not the mtime check. The scenario described cannot occur through removeCanonicalSession, which is the only tombstone creation path.

@@ -1248,7 +1296,6 @@ function startFileWatcher() {
join(AGENTS_DIR, "IDENTITY.md"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the memory/ watch path disables runtime ingestion for newly added/modified non-artifact markdown files in memory/ (the watcher handlers still contain ingestion logic, and startup import still filters for non-artifact files). This creates inconsistent behavior: startup imports such files, but later changes are no longer observed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See detailed reply on the latest review round for this same comment (commit 4f1c2ce0).

@Ostico Ostico marked this pull request as draft April 15, 2026 23:58
initDbAccessorLite() unconditionally called loadExtension("") when
sqlite-vec was unavailable (common in Docker, possible on fresh
installs). This crashed the worker thread during module evaluation,
which in turn crashed the daemon via unhandled error event.

Add truthy guard and try/catch around loadExtension(), matching the
same defensive pattern used in the main daemon loadVecExtension().
@Ostico Ostico marked this pull request as ready for review April 16, 2026 00:02
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: 4f1c2ce0

[Automated Review by pr-reviewer] I found one blocking behavior regression where the implementation does not fully match the PR’s claim about memory/ contents and watcher coverage.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The diff removes join(AGENTS_DIR, "memory") from chokidar watch roots in daemon.ts, while the same file still contains active ingestion paths (importExistingMemoryFiles and ingestMemoryMarkdown) for non-artifact .md files under memory/. That creates a concrete runtime regression: post-start writes/adds of such files will no longer trigger ingestion events.

@@ -1248,7 +1296,6 @@ function startFileWatcher() {
join(AGENTS_DIR, "IDENTITY.md"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking regression: removing memory/ from watcher roots disables runtime ingestion for non-artifact markdown files created/updated inside memory/ after startup. This conflicts with existing logic that still treats such files as ingestible (importExistingMemoryFiles filters for non-artifact .md and ingestMemoryMarkdown explicitly skips only artifact/backup patterns). The PR description asserts everything in memory/ is canonical artifacts/backups, but the codebase behavior still supports non-artifact inputs there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct, operationally irrelevant — the FD leak fix justifies this.

The watcher handler at lines 1330-1341 is indeed dead code for memory/ files after this change. But ingestMemoryMarkdown() (line 1019) already filters out ALL file types that the daemon creates in memory/:

  • Canonical artifacts (--summary.md, --transcript.md, etc.) → filtered by ARTIFACT_FILENAME_RE
  • Backups (MEMORY.backup-*.md) → filtered by MEMORY_BACKUP_FILENAME_RE
  • MEMORY.md → explicitly filtered at line 1016
  • DB/system files → not .md

No daemon code path, API endpoint, CLI command, or documented workflow creates non-artifact markdown files in memory/ at runtime. importExistingMemoryFiles() at startup still handles any pre-existing files via readdirSync() (unaffected by this change).

The memory/ directory contains ~8,340 files in a typical workspace. Bun’s fs.watch() opens one O_RDONLY FD per file and never releases them on close() (confirmed via strace — issue #512 root cause). Removing this watch path eliminates the FD leak entirely.

The codebase assertion is correct: everything in memory/ is canonical artifacts or backups. The ingestion handler was never triggered for these files anyway — the ignored matcher in watcher-ignore.ts:52-58 already filtered artifacts, and ingestMemoryMarkdown filtered the rest. The watcher on memory/ was consuming 8K+ FDs to produce zero useful events.

@Ostico Ostico marked this pull request as draft April 16, 2026 00:07
@NicholaiVogel
Copy link
Copy Markdown
Contributor

I think the overall direction is right. The one thing I want fixed before merge is the worker failure path.

If the render worker times out or errors, we should not return the same shape as a valid empty render. prompt: "" with fileCount: 0 can make scheduled synthesis treat the run as successful, which may suppress the next retry. I agree we should not immediately fall back to sync after a 30s hang, but the run should be marked as a render failure, and the worker should probably be nulled or restarted so the next request takes the fallback path.

The manifest ref scoping concern is real, but I’m fine tracking that as a follow-up if this PR is not making the existing behavior worse.

@NicholaiVogel NicholaiVogel marked this pull request as ready for review April 16, 2026 06:30
PR-Reviewer-Ant

This comment was marked as resolved.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: d0e6112a

[Automated Review by pr-reviewer] I found one blocking data-integrity issue in the incremental reindex path; otherwise the PR generally aligns with the stated FD leak and event-loop unblocking goals.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new scoped reindex logic in memory-lineage.ts performs DELETEs by source_path without agent_id constraints. This is directly visible in changed code and can remove rows belonging to other agents when reindexing a single agent.

Comment thread packages/daemon/src/memory-lineage.ts Outdated
if (scope && nextAgent !== scope) {
getDbAccessor().withWriteTx((db) => {
db.prepare("DELETE FROM memory_artifacts WHERE source_path = ?").run(relativePath(path));
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: scoped reindex can delete artifacts for other agents. In the scope && nextAgent !== scope branch (and other new delete branches), rows are deleted with DELETE FROM memory_artifacts WHERE source_path = ? and no agent_id predicate. If another agent has the same source_path row, this causes cross-agent data loss/corruption. Use scoped deletes when scope is set (e.g., ... WHERE source_path = ? AND agent_id = ?) and keep global deletes only for unscoped runs.

Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: 5181b943

[Automated Review] I am an automated PR review tool (pr-reviewer). I found one blocking data-integrity issue in the new incremental manifest-ref sync logic.

Confidence: High [sufficient_diff_evidence, targeted_context_included] - The new state used by syncManifestRefs is module-global (prevLedgerRefs) and not keyed by agentId. In renderMemoryProjection(agentId), refs for one agent are reused in later calls for other agents, which can cause manifest ref updates to touch the wrong agent’s manifests and clear valid refs. This behavior is directly visible in the changed logic paths in memory-lineage.ts.

@@ -32,6 +33,16 @@ const BASE32 = "abcdefghijklmnopqrstuvwxyz234567";
let projTok: Tiktoken | null = null;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: prevLedgerRefs is global process state and not scoped by agentId. The new incremental syncManifestRefs(refs, changedManifests) uses prevLedgerRefs from the previous render call, which may belong to a different agent. That can cause cross-agent manifest rewrites and stale-ref clearing in the wrong files (data integrity risk). Please scope this state per-agent (e.g., Map<agentId, Set<...>>) and pass agentId through the sync path.

@Ostico
Copy link
Copy Markdown
Contributor Author

Ostico commented Apr 16, 2026

Thanks Nicholai — appreciate the review and the follow-up commits. The sync fallback on worker failure is a better design than returning the empty shape, agreed.

Hoping this PR can help — I've been running it locally against a pretty large memory directory (~8,500 artifact files) and it's been working well. The FD exhaustion is gone and synthesis feels noticeably snappier with the worker thread offload.

The prevLedgerRefs scoping concern makes sense as a follow-up. Happy to pick that up in a separate PR if useful.

Convert two process-global singletons in memory-lineage.ts to per-agent
Maps so concurrent projections for different agents no longer leak state
into each other. Adds regression test verifying cross-agent isolation.

Addresses review feedback from PR Signet-AI#513.
Copy link
Copy Markdown
Collaborator

@PR-Reviewer-Ant PR-Reviewer-Ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by pr-reviewer | model: gpt-5.3-codex | commit: 04cae822

[Automated Review by pr-reviewer] I found one correctness issue in the new test changes; core implementation largely matches the PR description, but this should be fixed before relying on test results.

Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_runtime_repro] - The issue is directly visible in the changed test line where an environment variable is set to undefined instead of being deleted, which can change process state unexpectedly across tests. I did not run the test suite here to confirm downstream failures.

Comment thread packages/daemon/src/memory-lineage.test.ts Outdated
@Ostico
Copy link
Copy Markdown
Contributor Author

Ostico commented Apr 16, 2026

Follow-up fix for the cross-agent state leakage in prevLedgerRefs/lastChangedManifests — pushed in 04cae82.

Changes:

  • lastChangedManifests (let Set | undefined) → lastChangedManifestsByAgent (Map<string, Set<string>>)
  • prevLedgerRefs (let Set | undefined) → prevLedgerRefsByAgent (Map<string, Set<string>>)
  • syncManifestRefs now takes agentId as 3rd parameter
  • resetProjectionPurgeState() calls .clear() on both Maps
  • Regression test: creates artifacts for two agents, renders both, deletes one agent's session, re-renders, verifies no cross-contamination

All existing tests pass (15/15 memory-lineage, 13/13 synthesis-worker). TSC clean against baseline.

@NicholaiVogel
Copy link
Copy Markdown
Contributor

merging this, thank you @Ostico!

@NicholaiVogel NicholaiVogel merged commit a88a326 into Signet-AI:main Apr 16, 2026
5 checks passed
@Ostico
Copy link
Copy Markdown
Contributor Author

Ostico commented Apr 16, 2026

You're Welcome!!

@Ostico Ostico deleted the ostico/fd-diagnostic branch April 16, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daemon unresponsive: reindexMemoryArtifacts synchronous full-scan blocks event loop with thousands of memory artifacts

3 participants