Skip to content

Harden Docker job workspace isolation#4518

Merged
YuanTingHsieh merged 9 commits intoNVIDIA:mainfrom
YuanTingHsieh:codex/docker-job-workspace-isolation
May 4, 2026
Merged

Harden Docker job workspace isolation#4518
YuanTingHsieh merged 9 commits intoNVIDIA:mainfrom
YuanTingHsieh:codex/docker-job-workspace-isolation

Conversation

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh commented May 4, 2026

Summary

  • Tighten DockerJobLauncher workspace mounts so job containers no longer receive a bind mount of the whole NVFlare workspace.
  • Use a mount-only isolated view for SJ/CJ containers: an empty tmpfs workspace root with 0555 permissions, read-only startup/ and local/ bind mounts, and a read-write bind mount for only the current job workspace so job outputs persist on the host.
  • Run SJ/CJ containers from the current job workspace so relative job outputs and library-created files land in the persistent host job directory.
  • Keep Docker mode simple: no workspace transfer, copying, cleanup, or file-level filtering. This means local/ is visible read-only, including files such as study_data.yaml; sibling job workspaces and the parent/site workspace root are not mounted.
  • Update focused Docker launcher tests and the Docker launcher design doc for the new isolation contract.

Validation

  • python3 -m py_compile nvflare/app_opt/job_launcher/docker_launcher.py tests/unit_test/app_opt/job_launcher/docker_launcher_test.py
  • black==25.9.0 --check nvflare/app_opt/job_launcher/docker_launcher.py tests/unit_test/app_opt/job_launcher/docker_launcher_test.py
  • isort==5.13.2 --check-only nvflare/app_opt/job_launcher/docker_launcher.py tests/unit_test/app_opt/job_launcher/docker_launcher_test.py
  • flake8==7.1.1 nvflare/app_opt/job_launcher/docker_launcher.py tests/unit_test/app_opt/job_launcher/docker_launcher_test.py
  • git diff --check

pytest was not run locally because this environment does not have pytest installed (No module named pytest).

@YuanTingHsieh YuanTingHsieh changed the title [codex] Harden Docker job workspace isolation Harden Docker job workspace isolation May 4, 2026
@YuanTingHsieh YuanTingHsieh marked this pull request as ready for review May 4, 2026 21:12
Copilot AI review requested due to automatic review settings May 4, 2026 21:12
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR replaces the previous whole-workspace bind mount for SJ/CJ containers with a three-layer isolated view: an empty tmpfs at the workspace root (mode 0555), read-only bind mounts for startup/ and local/, and a read-write bind mount for only the current job workspace directory. A new _safe_workspace_child_path helper validates that the job workspace name cannot escape the workspace via path traversal or nested paths before it is passed to Docker.

Confidence Score: 4/5

Safe to merge; the isolation logic is sound and well-tested, with one low-severity limitation in a secondary security guard.

Only P2 findings present. The single issue is that the os.path.islink guard inside _safe_workspace_child_path is inert when executed inside the SP/CP container (host paths are not visible there), so host-side symlinks on the job workspace name would not be caught. The primary string-based path-traversal protections all work correctly, and exploiting the gap requires prior write access to the host workspace directory.

nvflare/app_opt/job_launcher/docker_launcher.py — the _safe_workspace_child_path symlink/realpath guards (lines 99–105)

Important Files Changed

Filename Overview
nvflare/app_opt/job_launcher/docker_launcher.py Adds _safe_workspace_child_path path guard and replaces the single whole-workspace bind mount with a tmpfs root + selective read-only bind mounts for startup/local + read-write bind mount for only the current job workspace; the symlink guard inside the new helper is ineffective when running in container context.
tests/unit_test/app_opt/job_launcher/docker_launcher_test.py Expands _Mount mock to forward **kwargs (needed for tmpfs_mode), adds TestSafeWorkspaceChildPath covering traversal, reserved names, allow_reserved=True, and symlink rejection, and updates all mount-assertion tests to the new four-mount layout.
docs/design/docker_job_launcher_design.md Updated workspace/storage section, mount diagrams, and sequence diagram to reflect the new isolated workspace contract (tmpfs root + read-only startup/local + read-write job workspace); adds mounts to the reserved-keys list.

Sequence Diagram

sequenceDiagram
    participant SP as SP/CP Container
    participant L as DockerJobLauncher
    participant V as _safe_workspace_child_path
    participant D as Docker Daemon
    participant SJ as SJ/CJ Container

    SP->>L: launch_job(job_meta, fl_ctx)
    L->>V: validate job_workspace_name (job_id)
    V-->>L: host_job_workspace path
    L->>V: validate startup (allow_reserved=True)
    V-->>L: host_startup_dir path
    L->>V: validate local (allow_reserved=True)
    V-->>L: host_local_dir path
    L->>D: containers.run(mounts=[tmpfs at /ws (0555), bind startup ro, bind local ro, bind job_ws rw], working_dir=/ws/job_id)
    D->>SJ: start container with isolated workspace view
    SJ-->>SJ: reads startup/ and local/ (read-only)
    SJ-->>SJ: writes job outputs to /ws/job_id/ (read-write)
    SJ-->>SP: connect via PARENT_URL (Docker DNS)
Loading

Reviews (5): Last reviewed commit: "Use job workspace as Docker working dire..." | Re-trigger Greptile

Comment thread nvflare/app_opt/job_launcher/docker_launcher.py
Comment thread tests/unit_test/app_opt/job_launcher/docker_launcher_test.py
Comment thread nvflare/app_opt/job_launcher/docker_launcher.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the DockerJobLauncher’s workspace isolation by changing how the NVFlare workspace is exposed to job containers, aiming to limit job visibility to only startup/local (read-only) and the job’s own workspace directory (read-write), and updates unit tests and design documentation to reflect the new contract.

Changes:

  • Introduces _safe_workspace_child_path and uses it to validate job/startup/local workspace child paths before creating bind mounts.
  • Replaces the previous “bind-mount entire workspace” behavior with an isolated mount layout (tmpfs workspace root + RO startup/local + RW job directory) for SJ/CJ containers.
  • Updates Docker launcher unit tests and the Docker launcher design doc to match the new workspace mount contract.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
nvflare/app_opt/job_launcher/docker_launcher.py Adds safe workspace child path validation and switches job container workspace mounts to an isolated tmpfs + selective bind-mount model.
tests/unit_test/app_opt/job_launcher/docker_launcher_test.py Adds tests for _safe_workspace_child_path and updates mount assertions for the new mount layout.
docs/design/docker_job_launcher_design.md Documents the updated isolation model, mount structure, and updated “reserved keys” wording (mounts vs volumes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread nvflare/app_opt/job_launcher/docker_launcher.py Outdated
Comment thread nvflare/app_opt/job_launcher/docker_launcher.py
Comment thread nvflare/app_opt/job_launcher/docker_launcher.py
@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

Copy link
Copy Markdown
Collaborator Author

Addressed the latest Greptile P2 note about working_dir in 985791645: Docker job containers now start in the per-job read-write workspace mount (/var/tmp/nvflare/workspace/<job_id>) instead of the 0555 tmpfs workspace root. This keeps the root non-writable while allowing relative job/library outputs to persist in the host job workspace.

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator Author

/build

@YuanTingHsieh YuanTingHsieh merged commit 853bed2 into NVIDIA:main May 4, 2026
29 checks passed
@YuanTingHsieh YuanTingHsieh deleted the codex/docker-job-workspace-isolation branch May 4, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants