Skip to content

Rework the WSL2 guest memory reduction thread around explicit reclaim helpers#40777

Open
benhillis wants to merge 5 commits into
microsoft:masterfrom
benhillis:benhill/mem-reclaim-1-helpers
Open

Rework the WSL2 guest memory reduction thread around explicit reclaim helpers#40777
benhillis wants to merge 5 commits into
microsoft:masterfrom
benhillis:benhill/mem-reclaim-1-helpers

Conversation

@benhillis

@benhillis benhillis commented Jun 11, 2026

Copy link
Copy Markdown
Member

First in a series of incremental changes reworking the WSL2 guest memory reduction thread. Each change is submitted one PR at a time; the next will follow once this merges. This PR does not change the default reclaim mode.

What this does

Replaces the ring-buffer idle detector and user-CPU-only sampling in the mini-init memory reduction thread with a clearer, helper-based design:

  • Samples aggregate non-idle CPU time (user, system, irq, softirq, steal) so kernel-bound work keeps the VM out of the idle state, instead of looking at user time alone.
  • ReadProcFile reads a full procfs snapshot into a caller buffer (close-on-exec, partial-read safe); GetReclaimableCacheBytes reads the cache counters through it. GetFreeMemoryBytes uses sysinfo() for the free-page total.
  • Gradual reclaims cold page cache (cgroup memory.reclaim) above a fixed floor while CPU-idle, with a hysteresis trigger threshold so it does not churn near the floor. Each interval reclaims at most a fixed step (256 MB) so the cache bleeds down over several intervals rather than being stripped to the floor in a single pass. This keeps gradual gentle, a brief idle pause does not evict a whole working set, and keeps it meaningfully distinct from DropCache.
  • DropCache stays gated on sustained CPU idle, drops once, and re-drops only after the reclaimable cache grows meaningfully.
  • Compaction is gated on free-memory growth so it runs only when there are newly-freed pages worth coalescing.

RequestCgroupReclaim performs the memory.reclaim write best-effort: it treats the kernel's expected EAGAIN (some, but not all, pages evicted) as success without logging, and never throws, so a transient write error cannot tear down the long-lived reduction thread.

Series (submitted sequentially)

  1. this PR - rework around explicit reclaim helpers, with gradual bleeding in capped steps
  2. drive gradual reclaim by memory pressure (PSI)
  3. adaptive working-set floor via refaults
  4. make gradual the default

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the WSL2 mini-init memory reduction thread to use explicit helper functions for procfs reads and to base “idle” detection on aggregate non-idle CPU time, enabling gradual reclaim/drop-cache/compaction policies to be expressed more clearly while aiming to preserve existing idle-gated behavior.

Changes:

  • Introduces ReadProcFile and ReadAggregateCpuTimes to read procfs reliably and compute busy-vs-idle CPU deltas across intervals.
  • Adds helpers to compute reclaimable cache bytes (/proc/meminfo) and free memory bytes (sysinfo) and uses them to drive gradual reclaim, drop_caches, and compaction policies.
  • Replaces the prior ring-buffer/user-time-only logic with a stateful, tick-based policy loop using explicit thresholds and hysteresis.

Comment thread src/linux/init/util.cpp
Comment thread src/linux/init/util.cpp
Comment thread src/linux/init/util.cpp Outdated
@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from 12dd190 to 51f1c8e Compare June 11, 2026 18:10
Copilot AI review requested due to automatic review settings June 11, 2026 19:23
@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from 51f1c8e to 4730acd Compare June 11, 2026 19:23
@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from 4730acd to 8e6f845 Compare June 11, 2026 19:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from 8e6f845 to 3ed82ea Compare June 11, 2026 20:06
@benhillis benhillis marked this pull request as ready for review June 12, 2026 02:55
@benhillis benhillis requested a review from a team as a code owner June 12, 2026 02:55
Copilot AI review requested due to automatic review settings June 12, 2026 02:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread src/linux/init/util.cpp
Comment thread src/linux/init/util.cpp
Comment thread src/linux/init/util.cpp Outdated
@benhillis benhillis changed the title Rework the memory reduction thread around explicit reclaim helpers Rework the WSL2 guest memory reduction thread around explicit reclaim helpers Jun 12, 2026
Copilot AI review requested due to automatic review settings June 12, 2026 19:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/linux/init/util.cpp Outdated
Comment thread src/linux/init/util.cpp Outdated
@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from ad76aea to f9edda2 Compare June 17, 2026 16:36
Ben Hillis and others added 4 commits June 24, 2026 15:49
Replace the ring-buffer idle detector and user-CPU-only sampling in the
mini-init memory reduction thread with a clearer, helper-based design:

- Sample aggregate non-idle CPU time (user, system, irq, softirq, steal)
  so kernel-bound work keeps the VM out of the idle state, instead of
  looking at user time alone.
- ReadProcFile reads a full procfs snapshot into a caller buffer
  (close-on-exec, partial-read safe); GetReclaimableCacheBytes /
  GetFreeMemoryBytes read the relevant counters through it.
- Gradual mode reclaims cold page cache (cgroup memory.reclaim) above a
  fixed floor while CPU-idle, with a hysteresis margin so it does not
  churn near the floor.
- DropCache mode stays gated on sustained CPU idle, drops once, and
  re-drops only after the reclaimable cache grows meaningfully.
- Compaction is gated on free-memory growth so it runs only when there
  are newly-freed pages worth coalescing.

RequestCgroupReclaim performs the memory.reclaim write best-effort: it
treats the kernel's expected EAGAIN (some, but not all, pages evicted)
as success without logging, and never throws so a transient write error
cannot tear down the long-lived reduction thread.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RunGradualTick reclaimed the entire excess above the floor in one pass
when idle, stripping the page cache to the floor almost instantly. That
made gradual behave like an eager drop_caches and could evict a working
set on a brief idle pause. Cap each interval to c_gradualStepBytes so
the cache bleeds down over several intervals, keeping reclaim gentle and
distinct from the DropCache policy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…comment

- RunCompactionTick: seed FreeAtLastCompaction at the first CPU sample so
  the first tick measures free-memory growth from startup instead of from
  zero, which previously triggered an unconditional initial compaction.
- Clarify that the gradual hysteresis value is a trigger threshold, not a
  retained margin.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…laim comments

ReadProcFile no longer reads into a fixed stack buffer that could silently
truncate large /proc/meminfo (undercounting reclaimable cache). It now reads
until EOF into a std::string that grows as needed and returns std::optional.

Also correct the RunGradualTick and StartMemoryReductionThread doc comments to
match the implementation: gradual reclaim is gated on per-interval CPU idle (no
sustained-idle streak) and c_gradualHysteresisBytes is a trigger threshold above
the floor, not a retained margin.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 24, 2026 22:50
@benhillis benhillis force-pushed the benhill/mem-reclaim-1-helpers branch from f9edda2 to d1c9ca7 Compare June 24, 2026 22:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/linux/init/util.cpp Outdated
Comment thread src/linux/init/util.cpp Outdated
Make the proc-file read helper and the cgroup reclaim write helper resilient
to EINTR, matching the existing TEMP_FAILURE_RETRY(read(...)) logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants