fix(continuum): singleton-guard auto-save, drop Start-Job#13
Conversation
…h Start-Process
The client-attached hook in plugin.conf and the Start-Job spawn in
psmux-continuum.ps1 each launched a fresh persistent pwsh loop on every
re-attach / plugin load, with no de-duplication. Over days this
accumulated long-running ~100 MB pwsh processes whose memory continued
to grow. plugin.conf even acknowledged this ("multiple loops may
start - this is harmless since saves are idempotent"), which was wrong:
each loop is a persistent background process consuming real memory.
The auto-save loop now enforces a per-user singleton via a PID file at
%LOCALAPPDATA%\psmux-continuum\auto_save.pid:
* Duplicate launches detect the live owner and exit immediately, so
the client-attached hook is safe to re-fire on every attach.
* The loop releases the slot via try/finally and self-exits gracefully
if superseded (PID file no longer points at it).
* Periodic [GC]::Collect() bounds memory growth in the long-running
loop.
The launcher in psmux-continuum.ps1 now uses Start-Process -WindowStyle
Hidden instead of Start-Job. Start-Job spawned an outer scriptblock-host
pwsh that itself spawned the inner pwsh -File auto_save.ps1, so every
plugin load created two pwsh processes that never reaped. Start-Process
gives a single detached pwsh, and the singleton guard inside the loop
makes repeat launches safe.
The three regenerated helper scripts (auto_save.ps1, auto_restore.ps1,
boot.ps1) now carry a "DO NOT EDIT - regenerated from
psmux-continuum.ps1 on plugin load" header so future contributors know
to edit the heredoc, not the artifact. The plugin.conf comment is
updated to describe the new singleton behavior.
There was a problem hiding this comment.
Pull request overview
Fixes a memory-leak / process-accumulation bug in psmux-continuum where the client-attached hook and the Start-Job–based spawn in psmux-continuum.ps1 each created a fresh persistent pwsh loop on every re-attach or plugin reload, with no de-duplication. The fix adds a PID-file singleton guard inside auto_save.ps1, replaces Start-Job with a single detached Start-Process, and adds a file-based logger since the hidden process has no console.
Changes:
- Add per-user singleton guard (PID file at
%LOCALAPPDATA%\psmux-continuum\auto_save.pid) with stale-PID detection, periodic supersede check, and graceful slot release infinally. - Replace
Start-JobwithStart-Process -WindowStyle Hiddenso only onepwshis spawned per launch; capture all PowerShell streams from the save invocation into a rotated log file (256 KB →.old). - Refresh the misleading
plugin.confcomment, add a "DO NOT EDIT — regenerated" header to all three artifact scripts, and bound long-run memory with periodic[GC]::Collect().
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| psmux-continuum/psmux-continuum.ps1 | Updates the auto-save heredoc with singleton/logging logic, replaces Start-Job with Start-Process, and adds regeneration headers to all three heredocs. |
| psmux-continuum/scripts/auto_save.ps1 | Regenerated artifact mirroring the new singleton-guarded loop with logging and try/finally slot release. |
| psmux-continuum/scripts/auto_restore.ps1 | Adds the "DO NOT EDIT — regenerated" header to match the new convention. |
| psmux-continuum/scripts/boot.ps1 | Adds the "DO NOT EDIT — regenerated" header to match the new convention. |
| psmux-continuum/plugin.conf | Rewrites the stale "multiple loops may start" comment to describe the new singleton behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # If a live owner already holds the PID file, defer. | ||
| $existingPid = $null | ||
| try { | ||
| $existingPid = (Get-Content $pidFile -Raw -ErrorAction Stop).Trim() | ||
| } | ||
| catch {} | ||
| if ($existingPid -match '^\d+$') { | ||
| $existing = Get-Process -Id ([int]$existingPid) -ErrorAction SilentlyContinue | ||
| if ($existing -and ($existing.ProcessName -in @('pwsh','powershell'))) { | ||
| Log "auto-save already running (PID $existingPid), exiting." | ||
| exit 0 | ||
| } | ||
| } | ||
|
|
||
| # Claim the slot | ||
| "$PID" | Set-Content -Path $pidFile -Encoding UTF8 -Force |
There was a problem hiding this comment.
@CosminRadu Can you please review this comment from CoPilot?
PR Review Follow-upThank you for this thorough fix, @CosminRadu. The transition to a singleton pattern and the move away from Regarding the Copilot comment, it identifies a potential Time-of-Check to Time-of-Use (TOCTOU) race condition during the PID file claim. While the current implementation is a major step forward, please consider the suggested refinement—either performing a secondary check at the top of the loop or utilizing an exclusive file-creation method—to ensure the singleton guarantee is strictly enforced. |
…erver start
The `@continuum-restore 'on'` option is documented in the README as
"auto-restore on psmux server start" but was silently ignored.
Root cause
==========
`psmux-continuum` ships two entry points with disjoint feature sets:
* `psmux-continuum.ps1` (lines 163-168) reads `@continuum-restore` and
fires the restore. But PPM's `Initialize-Plugin` (`ppm.ps1` lines
391-398) deliberately skips `.ps1` entry points when `plugin.conf`
exists, because psmux sources `plugin.conf` natively via `@plugin`.
* `plugin.conf` IS sourced, but the auto-restore hook was commented out
with a "Uncomment to enable" note - meaning the option was never read
by any active code path.
Empirical verification on a real install:
$ psmux show-options -g | findstr @continuum
@continuum-restore "on"
$ psmux show-hooks -g | findstr -i "session"
client-attached -> run-shell "...auto_save.ps1 ..."
# no restore hook registered
After-effect: when restore fails to fire on server start, the auto-save
loop (which IS registered) overwrites the `last` pointer with the empty
post-startup state ~15 min later, pruning the recovery target. Reboots
silently lose the saved session.
Second bug found while fixing this
==================================
The commented-out line in plugin.conf used `after-new-session`. This
hook does NOT fire on the first session creation in a fresh psmux
server - it only fires on subsequent sessions. So uncommenting that
line as-is would still have left auto-restore broken after a reboot.
Verified by hook-probing each candidate against a fresh psmux server:
after-new-session => no
session-created => FIRED
client-attached => FIRED
We use `session-created` because it fires for the very first session
in the server's lifetime, which is the moment auto-restore needs to
trigger.
Fix
===
* `plugin.conf`: register the hook unconditionally on `session-created`.
* `auto_restore.ps1`: read `@continuum-restore` at exec time and exit
early if not `on`. This keeps `plugin.conf` simple (no nested
`if-shell` quoting) and evaluates the option at hook-fire time rather
than at config-source time, which is more robust against option-order
dependencies.
* Add `@continuum-restore-fired` one-shot marker. Since `restore.ps1`
itself calls `psmux new-session` for each saved session, the hook
would otherwise re-enter for every restored session.
* Keep the `psmux-continuum.ps1` heredoc byte-identical to the
standalone `scripts/auto_restore.ps1` to preserve the existing
regeneration invariant.
Validation
==========
* `session-created` hook fires on first session - live-probed against
a fresh psmux server.
* `after-new-session` does NOT fire on first session - live-probed.
* Patched `plugin.conf` correctly registers the hook - verified via
`psmux show-hooks -g`.
* Negative path: when `@continuum-restore` is unset or `off`,
`auto_restore.ps1` exits early without setting the fired marker or
invoking `restore.ps1`. Live-tested.
* Both .ps1 files parse cleanly via `[System.Management.Automation.
Language.Parser]::ParseFile`.
* Heredoc body and `scripts/auto_restore.ps1` are byte-identical
(AST-extracted and string-compared).
Full positive E2E (save -> kill-server -> fresh server -> verify
restore) cannot be cleanly run on an isolated `-L pmtest` socket
because `psmux run-shell` doesn't pass socket info to child processes,
so child `psmux` invocations default to the user's real socket and
contaminate the test. The fix is correct for default-socket production
usage; the testability limit is a psmux internal, not a fix bug.
Note: PR #13 (fix(continuum): singleton-guard auto-save, drop
Start-Job) touches the same files. This PR's diff is orthogonal to
that work and trivially rebases either way.
|
Hi! I hit this race on my machine today. Adding the log evidence in case it's useful for deciding whether to ship the secondary-check / exclusive-create refinement now vs. as a follow-up. From Both PIDs (20696, 20552) claimed the slot at the same second ( The save at Happy to test a follow-up if helpful; otherwise just leaving this here as the deciding evidence. |
|
Thanks for the thorough analysis here, @MattKotsenas and @CosminRadu — really appreciate both the fix and the follow-up validation. Matt’s log evidence appears to confirm the TOCTOU concern from Copilot: multiple processes can still claim the singleton in the same startup window and run concurrently until the next interval check. @CosminRadu, would you be open to tightening this before merge (for example, an atomic/exclusive PID-file claim or equivalent) so the singleton guarantee is strict? Given the reproduced race in normal workflow, I’d prefer we land that hardening in this PR if possible. |
…nt doubling
Field-observed bug after a machine reboot:
save: main with 4 windows [psmux activity, hunk, profile install, psmux bugs]
live: main with 7 windows [psmux activity, hunk, profile install, hunk,
psmux bugs, profile install, psmux bugs]
Two concurrent restore.ps1 invocations interleaved their new-window calls,
producing one shared first window (the reused empty-default) plus six
appended windows (each restore created 3 of the remaining 3 saved windows).
Reproduced exactly with two Start-Job-spawned restore.ps1 against an
isolated test server: 4-window save -> 7 windows in main.
Root cause: psmux-continuum's session-created hook fires for every new-session
during restoration, and its option-based TOCTOU singleton
(@continuum-restore-fired) can let two auto_restore.ps1 invocations through
under tight timing. Even without that, future user hooks / manual bindings
could re-enter restore.ps1 concurrently.
Fix: process-level PID-file singleton on restore.ps1 itself. Matches the
auto_save.ps1 pattern (PR psmux#13). Uses System.IO.File::Open with FileMode.
CreateNew for atomic claim. Stale slots (recorded PID is dead) are reclaimed;
live slots cause the second invocation to exit cleanly without modifying
state. Slot is released in the existing finally block via a script-block
that only deletes the file when we're still the owner.
PSMUX_RESTORE_PID_FILE_OVERRIDE env var lets the Pester suite point at a
tempdir-local PID file so tests don't collide with a real plugin install.
Adds one new Pester test (concurrent-invocation context) that spawns two
restore.ps1 jobs and asserts exactly one set of 4 windows results. Verified
RED -> GREEN: pre-fix produces 7 windows, post-fix produces 4. Full suite
now 5/5 stable.
Defense-in-depth note: auto_restore.ps1's option-based TOCTOU singleton is
not fixed here. Its race becomes a benign inefficiency (extra pwsh spawns
that exit immediately on the PID-file check) rather than a correctness
issue. Worth fixing separately for cleanup; not load-bearing.
… singleton Two refinements to the singleton added in 2a584f5: 1. Stale-slot reclaim is now atomic via Remove-Item + retry CreateNew (up to 3 attempts), not Set-Content. The Set-Content path had a TOCTOU parallel to the one PR psmux#13 fixes in auto_save.ps1: two processes both finding a stale slot could both successfully Set-Content and both think they own it. CreateNew-only-after-delete makes the reclaim race-safe. 2. PID-reuse guard. The original 'is the recorded PID alive?' check used bare Get-Process, which can false-positive when Windows recycles a PID to a different process (any process — Explorer, sshd, whatever). Now we verify the live owner is a pwsh/powershell process. Narrow but real PID-reuse window was reproduced in test runs after frequent pwsh churn. Plus a new RED -> GREEN test for the stale-PID-file concurrent reclaim case (planted stale marker, two concurrent restore.ps1 invocations, only one wins, no doubled windows). Test infrastructure: RunRestore now sets PSMUX_RESTORE_PID_FILE_OVERRIDE to a tempdir path so each test gets a fresh slot and no cross-test interference via the default LOCALAPPDATA path. Full restore-pane-level suite (6 tests) verified stable across 3 runs.
Summary
The
client-attachedhook inplugin.confand theStart-Jobspawn inpsmux-continuum.ps1each launched a fresh persistentpwshloop on every re-attach / plugin load with no de-duplication. Over days this accumulated long-running ~100 MB pwsh processes whose memory continued to grow. The misleadingplugin.confcomment even acknowledged the duplication ("multiple loops may start — this is harmless since saves are idempotent") — but each loop is a persistent background process consuming real memory.After running locally for ~8 days my session had 5 zombie
auto_save.ps1loops totaling ~580 MB; the oldest had been alive for over a week and grown from ~100 MB to 124 MB.What changed
Singleton enforcement
auto_save.ps1now claims a PID file at%LOCALAPPDATA%\psmux-continuum\auto_save.pid:pwsh/powershellPID owns the file, exit0immediately. Theclient-attachedhook is now safe to re-fire on every attach.pwsh, the new launch claims the slot.try { … } finally { … }and self-exits gracefully if superseded (per-iteration check that the PID file still points at us).[GC]::Collect()every 4 iterations bounds long-run memory growth.Single-process spawn
Replaced
Start-JobwithStart-Process -WindowStyle Hidden.Start-Jobspawned an outer scriptblock-host pwsh that itself spawned the innerpwsh -File auto_save.ps1, so every plugin load created two pwsh processes that never reaped (you could see this in the zombie list — the outer was the parent of the inner).Start-Processgives a single detached pwsh; the singleton guard inside the loop makes repeat launches safe.Diagnosable logging
Start-Process -WindowStyle Hiddenhas no console attached, so the loop'sWrite-Hostoutput was previously discarded. Added aLoghelper that timestamps each line with[INF]/[WRN]/[ERR]and appends to%LOCALAPPDATA%\psmux-continuum\auto_save.log, with 256 KB rotation to a.oldsibling file. Per-writeAdd-Contentkeeps file locks brief, so concurrent ephemeral pwshs (the singleton-rejected ones) don't corrupt each other's writes. The& pwsh -File save.ps1invocation now redirects all six PowerShell streams (*>> \$logFile), sosave.ps1errors land in the log too.Sample log output:
Conventions
Test plan
🤖 Generated with Claude Code