Skip to content

refactor: eliminate synchronous I/O and process spawn bottlenecks#3

Merged
vilmire merged 4 commits into
mainfrom
refactor/async-io-foundation
May 25, 2026
Merged

refactor: eliminate synchronous I/O and process spawn bottlenecks#3
vilmire merged 4 commits into
mainfrom
refactor/async-io-foundation

Conversation

@vilmire

@vilmire vilmire commented May 25, 2026

Copy link
Copy Markdown
Owner

Phase 1: Async Batch Writer, CPU Lock fix, and Sync API elimination.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@vilmire

vilmire commented May 25, 2026

Copy link
Copy Markdown
Owner Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 25, 2026
@vilmire

vilmire commented May 25, 2026

Copy link
Copy Markdown
Owner Author

I have read the CLA Document and I hereby sign the CLA

@vilmire

vilmire commented May 25, 2026

Copy link
Copy Markdown
Owner Author

recheck

@vilmire

vilmire commented May 25, 2026

Copy link
Copy Markdown
Owner Author

I have read the CLA Document and I hereby sign the CLA

@vilmire vilmire merged commit 116a185 into main May 25, 2026
3 checks passed
vilmire added a commit that referenced this pull request Jun 1, 2026
…s file, pending guard, cooldown sweep

appendRemoteLedgerEntries (#1):
- dedup now uses tail:1000 instead of full O(n) readLedgerEntries;
  P2P replication is cursor-based so duplicates appear in the recent tail

Archive rotation (#4):
- compactLedger now rotates .archive.jsonl to .archive.N.jsonl (max 5)
  when the archive exceeds 50MB before appending new entries

Worker JSON extraction validation (#5):
- extractJsonObjectFromSummary now requires at least one mesh worker result
  field (changedFiles|errors|gitStatus|nextAction|validationResults) before
  accepting a JSON block; prevents false positives from tool/log JSON in
  the final summary

Archived counts file (#6):
- compactLedger writes cumulative counts to <meshId>.archived-counts.json
- getLedgerSummary reads and merges archived counts so taskCompleted/Failed/
  Stalled and totalEntries are accurate even after compaction

Pending events size guard (#2):
- queuePendingMeshCoordinatorEvent trims the pending-events.jsonl to the
  last 50 events when it exceeds 100KB before appending; prevents unbounded
  growth when coordinator stops draining

autoLaunchCooldownUntil cleanup (#3):
- sweepExpiredCooldowns() removes expired entries from the cooldown Map
  whenever a new cooldown is set; inline check at read site also cleans up
  the checked key; prevents long-lived daemon Map accumulation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vilmire added a commit that referenced this pull request Jun 1, 2026
…ntext tail bound

Atomic drain (#1):
- drainPendingMeshCoordinatorEvents: rename-then-read with atomicDrainFile()
  so concurrent coordinators cannot both consume the same pending events file.
  renameSync is atomic on POSIX: only one caller wins; the other returns null
  and gets an empty result, eliminating duplicate event delivery.

Ledger read cache (#3):
- readLedgerEntries: 100ms TTL in-process cache via getCachedRawEntries().
  Absorbs repeated reads within a single event-processing burst (agent:stopped
  triggers 4+ ledger reads for the same file in one synchronous cycle).
- invalidateLedgerCache() called on every write: appendLedgerEntry,
  appendRemoteLedgerEntries, compactLedger — ensures readers always see fresh
  data after any mutation.
- Cache is keyed by meshId (not opts), with filters applied to the cached slice.

getSessionRecoveryContext tail bound (#2):
- Bounded to tail:500 instead of full O(n) scan. task_dispatched is never
  archived (ARCHIVABLE_KINDS excludes it), so dispatch history is always
  present in the active file. The 30-min failure window means consecutiveNode-
  Failures never needs deep history. tail:500 covers all realistic use cases.

Tests: 7 new tests (4 cache, 3 atomic drain), all 92 mesh tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant