Skip to content

fix: scope runc CgroupsPath to container cgroup subtree on cgroup v2#219

Merged
rkoster merged 5 commits into
masterfrom
fix/cgroup-v2-container-path-scoping
Jun 8, 2026
Merged

fix: scope runc CgroupsPath to container cgroup subtree on cgroup v2#219
rkoster merged 5 commits into
masterfrom
fix/cgroup-v2-container-path-scoping

Conversation

@rkoster

@rkoster rkoster commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Problem

On cgroup v2 hosts, BPM's runc container and the host system can share the same cgroup scope name (e.g. bpm-uaa.scope). When a BOSH deployment creates a job with the same name as one already running on the host (as happens in bosh-lite, where the director itself runs uaa), runc fails with:

runc run failed: container's cgroup is not empty: N process(es) found

This occurs because both the host BPM process and the container BPM process share the same cgroup namespace (no NEWCGROUP), so they see each other's cgroup scopes.

The fix using NEWCGROUP was proposed in guardian but rejected because it breaks CPU throttling (same reason as the revert in guardian#495).

Solution

BPM reads its own cgroup path from /proc/self/cgroup at startup and uses it to scope the container's CgroupsPath to a subtree unique to the current host context:

  • cgroupfs driver: sets CgroupsPath to filepath.Join(selfPath, containerID) — nests the container under the host process's cgroup subtree
  • systemd cgroup driver (--systemd-cgroup): converts to slice:prefix:name format, embedding a unique identifier from the host's garden scope (e.g. system.slice:garden-abc-scope-bpm:bpm-uaa) — creates a scope name that cannot collide across warden containers

If cgroup path detection fails (e.g. not on cgroup v2, or not in a garden scope), BPM falls back to the original behaviour silently.

Changes

  • src/bpm/cgroups/cgroup.go: add SelfCgroupPath() (reads /proc/self/cgroup, returns unified-mode path) and ToSystemdCgroupsPath() (converts absolute path + container ID to systemd slice:prefix:name format)
  • src/bpm/runc/specbuilder/specbuilder.go: add WithCgroupsPath(path string) SpecOption
  • src/bpm/runc/adapter/adapter.go: accept injected cgroupsPathFor func(containerID string) (string, error) and apply it in BuildSpec
  • src/bpm/commands/root.go: implement cgroupsPathForContainer wiring SelfCgroupPath + ToSystemdCgroupsPath / filepath.Join depending on cgroup driver

Testing

  • Unit tests added for SelfCgroupPath, ToSystemdCgroupsPath, WithCgroupsPath, and BuildSpec cgroup path injection
  • Verified against a bosh-lite deployment: deploying uaa (which collides with the director's own uaa job) succeeds without cgroup errors using local dev release.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR implements cgroup v2 path support for the BPM release. It adds helpers to read the current process's unified cgroup v2 path from /proc/self/cgroup and convert it to systemd-compatible slice:prefix:name format. A container-specific cgroup path callback is computed in the root lifecycle, wired through the RuncAdapter, and conditionally applied to the OCI spec during BuildSpec. Path conversions handle systemd vs non-systemd environments and normalize path components for systemd unit-name constraints. On path resolution failure, the build proceeds without applying a cgroups path.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: scoping runc's CgroupsPath to a container-specific subtree on cgroup v2, which directly addresses the core problem of cgroup name collisions.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the problem, solution, specific code changes across all modified files, and testing approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cgroup-v2-container-path-scoping

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rkoster rkoster force-pushed the fix/cgroup-v2-container-path-scoping branch from ae530a3 to dbf38b6 Compare June 4, 2026 11:12
coderabbitai[bot]
coderabbitai Bot previously requested changes Jun 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bpm/cgroups/cgroup.go`:
- Around line 146-163: The function that formats the cgroup name currently falls
back to slice="system.slice" and leaves uniquePart empty when no parts end with
".slice", which loses host-specific uniqueness; update the logic (in the block
using variables slice, uniquePart, parts and the helper normalizeForSystemdName)
so that if the loop finds no ".slice" (uniquePart still ""), you pick the first
non-empty parts element, call normalizeForSystemdName on it, and if that returns
non-empty set uniquePart = normalized + "-" before returning
fmt.Sprintf("%s:%sbpm:%s", slice, uniquePart, containerID).

In `@src/bpm/runc/adapter/adapter.go`:
- Around line 54-68: The file has import/formatting issues causing golangci-lint
to fail; run goimports (or gofmt) on src/bpm/runc/adapter/adapter.go to fix
import ordering and formatting, then reformat the file so declarations like the
RuncAdapter struct and the NewRuncAdapter constructor (and their fields:
features, glob, shareMount, locker, cgroupsPathFor) are properly formatted and
imports are organized; commit the resulting changes to satisfy the linter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40c4f741-bef3-43d3-bcf4-d58b8c487715

📥 Commits

Reviewing files that changed from the base of the PR and between 12df73c and ae530a3.

📒 Files selected for processing (7)
  • src/bpm/cgroups/cgroup.go
  • src/bpm/cgroups/cgroup_test.go
  • src/bpm/commands/root.go
  • src/bpm/runc/adapter/adapter.go
  • src/bpm/runc/adapter/adapter_test.go
  • src/bpm/runc/specbuilder/specbuilder.go
  • src/bpm/runc/specbuilder/specbuilder_test.go

Comment thread src/bpm/cgroups/cgroup.go
Comment thread src/bpm/runc/adapter/adapter.go Outdated
@github-project-automation github-project-automation Bot moved this from Inbox to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Jun 4, 2026
@rkoster

rkoster commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jochenehret

Copy link
Copy Markdown

I've tested this fix on the cf-deployment BOSH Lite environment and it worked!

stemcell: bosh-warden-boshlite-ubuntu-noble/1.383
cf-deployment: v57.0.0

Can you please cut a new bpm release soon?

@beyhan beyhan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me in general. The one question I can't answer fully is the impact of the cgroup path change. That's a one-time migration effect on the first restart, not an ongoing problem, but do we break any external tooling that references cgroup paths by name? @aramprice, @mariash do you know such tooling which could be impacted by the changed container path?

@aramprice aramprice dismissed coderabbitai[bot]’s stale review June 8, 2026 18:24

Does not appear to have picked up code updates.

@aramprice aramprice requested a review from Copilot June 8, 2026 18:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses runc failures on cgroup v2 hosts caused by cgroup name collisions between BPM running on the host and BPM running inside a Garden container (shared cgroup namespace). It does this by scoping the container’s CgroupsPath under the caller’s own cgroup subtree (cgroupfs) or by generating a systemd slice:prefix:name cgroup path intended to be unique per Garden container.

Changes:

  • Add cgroup v2 self-path discovery from /proc/self/cgroup plus a helper to convert a unified-mode path into systemd slice:prefix:name format.
  • Add a WithCgroupsPath spec option and inject CgroupsPath into the runc spec via an adapter callback.
  • Wire cgroup path scoping into the BPM root command when constructing the runc adapter.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/bpm/runc/specbuilder/specbuilder.go Adds a WithCgroupsPath spec option to set spec.Linux.CgroupsPath.
src/bpm/runc/specbuilder/specbuilder_test.go Adds unit test coverage for WithCgroupsPath.
src/bpm/runc/adapter/adapter.go Injects a cgroupsPathFor callback into RuncAdapter and applies WithCgroupsPath during BuildSpec.
src/bpm/runc/adapter/adapter_test.go Adds tests covering successful, error, and nil-callback behavior for cgroup path injection.
src/bpm/commands/root.go Wires SelfCgroupPath + systemd/cgroupfs-specific conversions into adapter construction.
src/bpm/cgroups/cgroup.go Adds SelfCgroupPath, systemd path conversion (ToSystemdCgroupsPath), and normalization helper.
src/bpm/cgroups/cgroup_test.go Adds tests for SelfCgroupPath parsing and ToSystemdCgroupsPath conversion behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bpm/runc/adapter/adapter.go
Comment thread src/bpm/cgroups/cgroup.go
@mkocher

mkocher commented Jun 8, 2026

Copy link
Copy Markdown
Member

Seems reasonable to me, I don't know of anything external that depends on the cgroups path.

aramprice
aramprice previously approved these changes Jun 8, 2026

@aramprice aramprice left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, maybe evaluate Copilot's input?

@github-project-automation github-project-automation Bot moved this from Waiting for Changes | Open for Contribution to Pending Merge | Prioritized in Foundational Infrastructure Working Group Jun 8, 2026
mkocher
mkocher previously approved these changes Jun 8, 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@rkoster rkoster dismissed stale reviews from mkocher and aramprice via f1286f3 June 8, 2026 18:52

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bpm/commands/root.go`:
- Around line 207-216: cgroupsPathForContainer currently returns silently on
error from cgroups.SelfCgroupPath, making it hard to debug why cgroup v2 scoping
didn't activate; update cgroupsPathForContainer to log the error at debug level
when cgroups.SelfCgroupPath() fails (include the error and some context like
containerID and that fallback will be used) by accepting a logger parameter (or
capturing it via closure) and calling logger.Debug or equivalent before
returning the fallback empty path/error; ensure the change references the
existing cgroupsPathForContainer and cgroups.SelfCgroupPath symbols and keeps
existing return behavior.

In `@src/bpm/runc/adapter/adapter.go`:
- Around line 303-307: The code silently ignores errors from the cgroup path
callback (a.cgroupsPathFor) which hampers observability; update the block in
adapter.go to log the error at debug level when
a.cgroupsPathFor(bpmCfg.ContainerID()) returns an error while preserving the
current fallback behavior—use the existing logger (or a.local logger variable)
to emit a debug message including the container id from bpmCfg.ContainerID() and
the error before continuing, keeping the specbuilder.Apply call unchanged when
err == nil.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9b79d7c0-5f23-422e-a5e0-a4796f77a3cc

📥 Commits

Reviewing files that changed from the base of the PR and between ae530a3 and f1286f3.

📒 Files selected for processing (5)
  • src/bpm/cgroups/cgroup.go
  • src/bpm/cgroups/cgroup_test.go
  • src/bpm/commands/root.go
  • src/bpm/runc/adapter/adapter.go
  • src/bpm/runc/adapter/adapter_test.go

Comment thread src/bpm/commands/root.go
Comment thread src/bpm/runc/adapter/adapter.go
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Jun 8, 2026
@github-project-automation github-project-automation Bot moved this from Waiting for Changes | Open for Contribution to Pending Merge | Prioritized in Foundational Infrastructure Working Group Jun 8, 2026
@rkoster rkoster merged commit f8e0e29 into master Jun 8, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Jun 8, 2026
@rkoster rkoster deleted the fix/cgroup-v2-container-path-scoping branch June 8, 2026 19:05
@mariash

mariash commented Jun 8, 2026

Copy link
Copy Markdown
Member

Garden should be using go CgroupPath, which dynamically gets the cgroup path, I can't think of anything else. It would be a good practice to get cgroup path from /proc/self/cgroup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

7 participants