Security hardening: pinned versions, locked deps, CUDA/PyTorch upgrades#13
Security hardening: pinned versions, locked deps, CUDA/PyTorch upgrades#13MadiatorLabs wants to merge 2 commits intorunpod-workers:mainfrom
Conversation
12.6 is not a widely supported version and not well represented in our fleet. 12.4 was the last LTS before 13.0, which is why we are using that.
Please don't use 12.9 for anything, we are avoiding it across the board, so 12.8 or 13.0 are preferable. |
| fi | ||
|
|
||
| # Tail the log file | ||
| tail -f /workspace/runpod-slim/comfyui.log |
There was a problem hiding this comment.
The removal is safe. Here's why:
What the old loop did: On every boot, iterate all custom_nodes/*/ and re-run pip install -r requirements.txt, python install.py, pip install -e . for each.
Why it's not needed anymore:
- Baked nodes: All pre-installed nodes have their deps in the image already (hash-verified lock file). No boot-time install needed.
- User-installed nodes via Manager: When a user installs a node through ComfyUI-Manager at runtime, Manager handles the
pip installandinstall.pyduring that session. Those packages go into the persistent venv on/workspace, so they survive container restarts. - The old loop was wasteful: Re-running
install.pyon every boot could re-download models, overwrite user configs, or cause other side effects. Most well-written nodes check if setup is already done on import.
The only edge case is a node with a poorly-written install.py that doesn't persist its state — but that's a node bug, not something we should paper over with a boot-time loop that slows down every start.
Recommendation: The removal is correct. Document in the README that runtime-installed custom nodes are the user's responsibility and their deps persist in the venv across reboots.
TimPietruskyRunPod
left a comment
There was a problem hiding this comment.
Review Summary
Solid PR — unified Dockerfile, pinned deps with hash verification, and faster cold starts are all significant improvements. A few items to address before merging:
- PR description has wrong CUDA versions — description says 12.6/12.9 but code uses 12.8/13.0. Please update the description to match.
- Venv path rename (
.venv→.venv-cu128) is a breaking change for existing deployments with persistent/workspace. Should be documented in the PR description and README so users know a re-setup will be triggered on next boot. fetch-hashes.shuses GNU-onlygrep -oP— breaks on macOS where developers run it.prebake-manager-cache.pyregistry fetch should be non-fatal since it's an optimization.- Verify
opensslis available in the runtime stage — needed bystart.shfor SSH password generation.
See inline comments for details.
| } | ||
| variable "MANAGER_SHA" { | ||
| default = "f41365abe957" | ||
| } |
There was a problem hiding this comment.
PR description mismatch: The PR description says regular image goes from CUDA 12.4 → 12.6 (cu126) and 5090 from 12.8 → 12.9 (cu129), but the actual code here sets 12-8/cu128 for regular and 13-0/cu130 for 5090. Please update the PR description to match the code.
| set -e # Exit the script if any statement returns a non-true return value | ||
|
|
||
| COMFYUI_DIR="/workspace/runpod-slim/ComfyUI" | ||
| VENV_DIR="$COMFYUI_DIR/.venv" |
There was a problem hiding this comment.
Breaking change for existing deployments: Renaming from .venv to .venv-cu128 means existing containers with a persistent /workspace will not find VENV_DIR on next boot, triggering a full re-setup.
This is probably intentional (unifying with the old 5090 path), but it should be:
- Called out explicitly in the PR description as a migration impact
- Documented in the README or a migration note so users know to expect a one-time re-setup on first boot after upgrading
| get_current_hash() { | ||
| local var_name="$1" | ||
| grep -A2 "variable \"${var_name}\"" "$BAKE_FILE" | grep -oP 'default\s*=\s*"\K[^"]*' || echo "unknown" | ||
| } |
There was a problem hiding this comment.
grep -oP requires GNU grep (Perl regex mode). macOS ships BSD grep which doesn't support -oP. Since this is a developer utility run locally (not in Docker), this will break for macOS developers.
Consider replacing with sed or awk, e.g.:
grep 'default' | sed 's/.*default *= *"\([^"]*\)".*/\1/'Or at minimum add a comment/check that GNU grep is required.
| echo "ERROR" ; return | ||
| } | ||
| echo "$response" | grep -oP '"sha"\s*:\s*"\K[a-f0-9]{12}' | head -1 | ||
| } |
There was a problem hiding this comment.
Same GNU grep issue here — grep -oP with \K lookbehind won't work on macOS. Could use:
echo "$response" | grep -o '"sha" *: *"[a-f0-9]\{12\}' | head -1 | grep -o '[a-f0-9]\{12\}$'|
|
||
| # Fetch paginated registry and save as aggregated JSON | ||
| print(" Fetching ComfyRegistry (paginated)...") | ||
| nodes = fetch_registry_all() |
There was a problem hiding this comment.
If api.comfy.org is down or returns unexpected JSON (missing totalPages/nodes key), the entire Docker build fails with an unhelpful traceback. Since this cache is a cold-start optimization (not required — Manager re-fetches after 24h), consider wrapping the registry fetch in a try/except that prints a warning and continues:
try:
nodes = fetch_registry_all()
registry_data = json.dumps(nodes, separators=(",", ":"))
fname = cache_filename(REGISTRY_URL)
(CACHE_DIR / fname).write_bytes(registry_data.encode())
print(f" Cached {len(nodes)} registry nodes")
except Exception as e:
print(f" WARNING: Registry fetch failed ({e}), skipping — Manager will fetch on first start")| git remote add origin https://github.com/ltdrdata/ComfyUI-Manager.git && \ | ||
| cd /tmp/build/ComfyUI/custom_nodes/ComfyUI-KJNodes && \ | ||
| git init && git add -A && git -c user.name=- -c user.email=- commit -q -m "ComfyUI-KJNodes ${KJNODES_SHA}" && \ | ||
| git remote add origin https://github.com/kijai/ComfyUI-KJNodes.git && \ |
There was a problem hiding this comment.
Verify openssl availability in runtime stage: start.sh uses openssl rand -base64 12 for SSH password generation when no PUBLIC_KEY is set. The runtime stage doesn't explicitly install openssl. It likely comes as a transitive dependency of openssh-server or libssl-dev, but commit 6f77fe5 on main ("chore: add openssl") suggests this was previously an issue.
Please verify it's present in the final image, or add it explicitly to the apt-get install list to be safe.
Summary
Centralized version pinning with locked dependency chain
docker-bake.hclas single source of truthARGnames — no hardcoded defaultspip-compile --generate-hashes— full hash-verified supply chainscripts/fetch-hashes.shadded to query GitHub API for latest custom node commit SHAs with HCL-formatted outputBase image upgrades
ComfyUI-Manager compatibility