Skip to content

feat: add Nvidia Jetson Thor Developer Kit GPU detection / --runtime nvidia#700

Open
whitesscott wants to merge 3 commits intoNVIDIA:mainfrom
whitesscott:jetson-thor
Open

feat: add Nvidia Jetson Thor Developer Kit GPU detection / --runtime nvidia#700
whitesscott wants to merge 3 commits intoNVIDIA:mainfrom
whitesscott:jetson-thor

Conversation

@whitesscott
Copy link

@whitesscott whitesscott commented Mar 23, 2026

Summary

This PR adds basic Jetson/Thor support to NemoClaw install-time GPU detection and NIM container startup.

Changes

  • detect Jetson/Thor devices using /etc/nv_tegra_release
  • add device-tree based Jetson/Thor fallback detection
  • use unified system memory as a fallback when nvidia-smi does not report desktop-style VRAM
  • use 1/2 Thor shared memory
  • add "--runtime nvidia" to 'docker run' to enable loading of GPU via Jetpack and Cuda libraries from /etc/nvidia-container-runtime/host-files-for-container.d/devices.csv drivers.csv
    into containers.

Why

On Jetson/Thor, GPU discovery and memory reporting differ from discrete GPU systems. These changes let NemoClaw recognize the platform and proceed further on Jetson devices.

Notes

  • this improves platform detection and container startup behavior on Jetson/Thor
  • it does not by itself guarantee that every downstream NIM image or model is supported on Jetson
  • local push hooks also reported an unrelated pyright failure resolving pytest in nemoclaw-blueprint/orchestrator/test_endpoint_validation.py

Summary by CodeRabbit

  • New Features

    • CLI now exposes only detect-gpu and start commands; container naming and supported models simplified.
    • Jetson/Thor detection added with unified-memory reporting; on Jetson devices container runs with NVIDIA runtime automatically.
  • Bug Fixes

    • Ignore invalid/non-positive NVIDIA VRAM readings.
    • Improved GPU and total memory estimation (including DGX/Jeston adjustments).
  • Chores

    • Shell script formatting and option-parsing cleanup.
    • Pre-push checks now run static type checks and tests.

@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Converted the Node nim library into a CLI entrypoint, added Jetson/Thor detection and system-memory GPU estimation, tightened NVIDIA VRAM parsing, conditionally enable Docker NVIDIA runtime on Jetson, refactored install.sh GPU probing and semver logic, and updated a pre-commit hook.

Changes

Cohort / File(s) Summary
Node nim CLI
bin/lib/nim.js
Replaced exported module with a CLI (#!/usr/bin/env node) running main(); removed lifecycle exports. Added isJetsonPlatform() and getSystemMemoryMB(), tightened nvidia-smi parsing, Jetson/Thor returns jetson:true/unifiedMemory:true and estimates GPU memory as half system RAM, hardcoded model→image map (llama, mistral), renamed container from nemoclaw-nim-... to nim-..., and conditionally adds --runtime nvidia on Jetson.
Installer script
install.sh
Expanded GPU detection to probe /etc/nv_tegra_release and /proc/device-tree (Jetson/Thor); VRAM detection now accepts only positive nvidia-smi totals and falls back to /proc/meminfo for unified-memory estimates; refactored helper formatting and rewrote version_gte() to use array parsing. Minor quoting/control-flow adjustments.
Pre-commit config
.pre-commit-config.yaml
Updated pyright-check hook invocation to run uv with --with pyright --with pytest (added pytest).

Sequence Diagram(s)

sequenceDiagram
  participant User as User/Caller
  participant NimCLI as bin/lib/nim.js (Node CLI)
  participant Installer as install.sh
  participant OS as Host OS (/proc, /etc, system_profiler)
  participant Docker as docker

  User->>NimCLI: run `nim detect-gpu` or `nim start ...`
  NimCLI->>OS: probe `/etc/nv_tegra_release`, `/proc/device-tree`, `/proc/meminfo`
  NimCLI->>OS: run `nvidia-smi` when available
  alt Jetson/Thor detected
    NimCLI->>NimCLI: compute gpuMemory = systemMemory/2, set unifiedMemory:true
    NimCLI->>Docker: docker run ... --runtime nvidia ...
  else Standard NVIDIA or macOS
    NimCLI->>OS: parse `nvidia-smi` or `system_profiler`
    NimCLI->>Docker: docker run ... (no special runtime)
  end

  Note over Installer,OS: install.sh uses similar probes for installer-time decisions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through /proc and sniffed a Tegra tune,
I halved the RAM beneath a sleepy moon,
Docker dons a tiny nvidia crown,
CLI shed exports and wore a runnable gown. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding Nvidia Jetson Thor GPU detection and --runtime nvidia support, which is reflected in extensive changes to GPU detection logic and container startup behavior across bin/lib/nim.js and install.sh.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
install.sh (1)

339-342: Redundant stdout redirection.

>/dev/stdout is a no-op since stdout already goes to stdout by default.

♻️ Suggested fix
-    if nvidia-smi -L >/dev/stdout; then
+    if nvidia-smi -L >/dev/null 2>&1; then

If the intent is to suppress output while checking exit status, redirect to /dev/null. If output should be visible, remove the redirection entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install.sh` around lines 339 - 342, The call to nvidia-smi in the conditional
uses a redundant redirection (nvidia-smi -L >/dev/stdout); update that
invocation so it either emits output (remove the >/dev/stdout part) or
suppresses output properly (redirect both stdout and stderr to /dev/null)
depending on the intended behavior in the install.sh check; change the line
containing the nvidia-smi -L invocation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Around line 27-50: There are two conflicting isJetsonPlatform()
implementations; remove the local isJetsonPlatform() inside detectGpu() and
consolidate detection into the single module-level isJetsonPlatform() (ensure it
checks /etc/nv_tegra_release, /proc/device-tree/compatible, and
/proc/device-tree/model as the fuller implementation does), then update
detectGpu() to call the module-level isJetsonPlatform() and ensure
startNimContainer() uses that exported/common function so the --runtime nvidia
decision is based on the complete detection logic.
- Around line 118-127: The current NVIDIA Jetson detection returns nimCapable:
true even when getSystemMemoryMB() fails (returns 0), producing totalMemoryMB: 0
and perGpuMB: 0; update the logic in the function that constructs this return
object (the block that sets type: "nvidia", name: "NVIDIA Jetson", count:
gpuCount, totalMemoryMB, perGpuMB, nimCapable, jetson, unifiedMemory) so that if
getSystemMemoryMB() === 0 you either return null (no GPU detected) or set
nimCapable: false and avoid advertising 0 MB memory — compute perGpuMB only when
totalMemoryMB > 0 and ensure nimCapable reflects successful memory detection.

In `@install.sh`:
- Around line 380-388: The Jetson/Tegra fallback in the install.sh Jetson
detection block currently returns total system RAM but should match nim.js
behavior (which uses Math.floor(getSystemMemoryMB() / 2)); modify the Jetson
branch that computes mem_kb to echo half the system RAM in MB instead of the
full amount (i.e., compute mem_kb -> MB and divide by 2) and then return, so the
echoed value aligns with nim.js’s getSystemMemoryMB halving logic.
- Line 381: The Jetson detection is inconsistent: detect_gpu() checks
/proc/device-tree/compatible for "nvidia,tegra" or "nvidia,thor" but
get_vram_mb() only checks /etc/nv_tegra_release and /proc/device-tree/model, so
devices detected via compatible will return 0 VRAM; update get_vram_mb() to also
test /proc/device-tree/compatible for those strings (or better, create a shared
helper is_jetson_platform() used by both detect_gpu() and get_vram_mb()) so the
same Jetson detection logic is reused and VRAM fallback runs when appropriate.

---

Nitpick comments:
In `@install.sh`:
- Around line 339-342: The call to nvidia-smi in the conditional uses a
redundant redirection (nvidia-smi -L >/dev/stdout); update that invocation so it
either emits output (remove the >/dev/stdout part) or suppresses output properly
(redirect both stdout and stderr to /dev/null) depending on the intended
behavior in the install.sh check; change the line containing the nvidia-smi -L
invocation accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f2d86b11-72ba-4fbd-9dc7-9573cc49220b

📥 Commits

Reviewing files that changed from the base of the PR and between c55a309 and 12fe0c3.

📒 Files selected for processing (2)
  • bin/lib/nim.js
  • install.sh

Comment on lines +118 to +127
return {
type: "nvidia",
name: "NVIDIA Jetson",
count: gpuCount,
totalMemoryMB: 0,
perGpuMB: 0,
nimCapable: true,
jetson: true,
unifiedMemory: true,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Returning nimCapable: true with totalMemoryMB: 0 may cause downstream issues.

When getSystemMemoryMB() returns 0 (detection failure), this block returns nimCapable: true with totalMemoryMB: 0 and perGpuMB: 0. This could lead to unexpected behavior when model selection logic checks memory requirements against 0 MB.

Consider returning null (no GPU detected) or setting nimCapable: false when memory detection fails.

🔧 Suggested fix
-      return {
-        type: "nvidia",
-        name: "NVIDIA Jetson",
-        count: gpuCount,
-        totalMemoryMB: 0,
-        perGpuMB: 0,
-        nimCapable: true,
-        jetson: true,
-        unifiedMemory: true,
-      };
+      // Memory detection failed - still report Jetson but mark as not NIM-capable
+      return {
+        type: "nvidia",
+        name: "NVIDIA Jetson",
+        count: gpuCount,
+        totalMemoryMB: 0,
+        perGpuMB: 0,
+        nimCapable: false,
+        jetson: true,
+        unifiedMemory: true,
+      };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return {
type: "nvidia",
name: "NVIDIA Jetson",
count: gpuCount,
totalMemoryMB: 0,
perGpuMB: 0,
nimCapable: true,
jetson: true,
unifiedMemory: true,
};
// Memory detection failed - still report Jetson but mark as not NIM-capable
return {
type: "nvidia",
name: "NVIDIA Jetson",
count: gpuCount,
totalMemoryMB: 0,
perGpuMB: 0,
nimCapable: false,
jetson: true,
unifiedMemory: true,
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 118 - 127, The current NVIDIA Jetson detection
returns nimCapable: true even when getSystemMemoryMB() fails (returns 0),
producing totalMemoryMB: 0 and perGpuMB: 0; update the logic in the function
that constructs this return object (the block that sets type: "nvidia", name:
"NVIDIA Jetson", count: gpuCount, totalMemoryMB, perGpuMB, nimCapable, jetson,
unifiedMemory) so that if getSystemMemoryMB() === 0 you either return null (no
GPU detected) or set nimCapable: false and avoid advertising 0 MB memory —
compute perGpuMB only when totalMemoryMB > 0 and ensure nimCapable reflects
successful memory detection.

Comment on lines +380 to +388
# 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
local mem_kb
mem_kb="$(awk '/MemTotal:/ {print $2}' /proc/meminfo)"
if [[ -n "${mem_kb:-}" && "$mem_kb" -gt 0 ]]; then
echo $((mem_kb / 1024))
return
fi
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent Jetson memory calculation between install.sh and nim.js.

install.sh returns full system RAM for Jetson devices, but nim.js (line 103) uses half: Math.floor(getSystemMemoryMB() / 2).

This inconsistency means:

  • install.sh might pull nemotron-3-super:120b on a 128GB Jetson Thor (reports 128GB)
  • nim.js would report only 64GB for the same device, potentially failing model requirements checks
🔧 Suggested fix: Align with nim.js by using half of system RAM
   # 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
   if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
     local mem_kb
     mem_kb="$(awk '/MemTotal:/ {print $2}' /proc/meminfo)"
     if [[ -n "${mem_kb:-}" && "$mem_kb" -gt 0 ]]; then
-      echo $((mem_kb / 1024))
+      echo $((mem_kb / 1024 / 2))  # Use half of system RAM as GPU budget (unified memory)
       return
     fi
   fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
local mem_kb
mem_kb="$(awk '/MemTotal:/ {print $2}' /proc/meminfo)"
if [[ -n "${mem_kb:-}" && "$mem_kb" -gt 0 ]]; then
echo $((mem_kb / 1024))
return
fi
fi
# 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
local mem_kb
mem_kb="$(awk '/MemTotal:/ {print $2}' /proc/meminfo)"
if [[ -n "${mem_kb:-}" && "$mem_kb" -gt 0 ]]; then
echo $((mem_kb / 1024 / 2)) # Use half of system RAM as GPU budget (unified memory)
return
fi
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install.sh` around lines 380 - 388, The Jetson/Tegra fallback in the
install.sh Jetson detection block currently returns total system RAM but should
match nim.js behavior (which uses Math.floor(getSystemMemoryMB() / 2)); modify
the Jetson branch that computes mem_kb to echo half the system RAM in MB instead
of the full amount (i.e., compute mem_kb -> MB and divide by 2) and then return,
so the echoed value aligns with nim.js’s getSystemMemoryMB halving logic.

# macOS — report unified memory as VRAM

# 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing /proc/device-tree/compatible check in get_vram_mb() Jetson fallback.

detect_gpu() (line 351) checks /proc/device-tree/compatible for nvidia,tegra|nvidia,thor, but get_vram_mb() only checks /etc/nv_tegra_release and /proc/device-tree/model.

A device detected solely via compatible would report 0 VRAM, causing incorrect model selection.

🔧 Suggested fix: Add compatible check
   # 2) Jetson / Thor fallback: use total system RAM as shared GPU memory budget proxy
-  if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
+  if [[ -f /etc/nv_tegra_release ]] \
+     || { [[ -r /proc/device-tree/compatible ]] && tr '\0' '\n' </proc/device-tree/compatible | grep -qiE 'nvidia,tegra|nvidia,thor'; } \
+     || { [[ -r /proc/device-tree/model ]] && tr '\0' '\n' </proc/device-tree/model | grep -qiE 'jetson|thor|nvidia'; }; then
     local mem_kb

Consider extracting a shared is_jetson_platform() function to avoid duplicating detection logic.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -f /etc/nv_tegra_release ]] || [[ -r /proc/device-tree/model ]]; then
if [[ -f /etc/nv_tegra_release ]] \
|| { [[ -r /proc/device-tree/compatible ]] && tr '\0' '\n' </proc/device-tree/compatible | grep -qiE 'nvidia,tegra|nvidia,thor'; } \
|| { [[ -r /proc/device-tree/model ]] && tr '\0' '\n' </proc/device-tree/model | grep -qiE 'jetson|thor|nvidia'; }; then
local mem_kb
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install.sh` at line 381, The Jetson detection is inconsistent: detect_gpu()
checks /proc/device-tree/compatible for "nvidia,tegra" or "nvidia,thor" but
get_vram_mb() only checks /etc/nv_tegra_release and /proc/device-tree/model, so
devices detected via compatible will return 0 VRAM; update get_vram_mb() to also
test /proc/device-tree/compatible for those strings (or better, create a shared
helper is_jetson_platform() used by both detect_gpu() and get_vram_mb()) so the
same Jetson detection logic is reused and VRAM fallback runs when appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
bin/lib/nim.js (2)

27-50: ⚠️ Potential issue | 🔴 Critical

Hoist isJetsonPlatform() to module scope.

Line 223 calls isJetsonPlatform() outside detectGpu(), but the helper now only exists inside detectGpu(). That raises ReferenceError before docker run on every call to startNimContainer(), not just on Jetson.

🔧 Minimal fix
+function isJetsonPlatform() {
+  try {
+    const nvTegra = runCapture("test -f /etc/nv_tegra_release && echo yes", {
+      ignoreError: true,
+    });
+    if (nvTegra && nvTegra.trim() === "yes") return true;
+  } catch {}
+
+  try {
+    const compat = runCapture("tr '\\0' '\\n' < /proc/device-tree/compatible", {
+      ignoreError: true,
+    });
+    if (compat && /nvidia,tegra|nvidia,thor/i.test(compat)) return true;
+  } catch {}
+
+  try {
+    const model = runCapture("tr '\\0' '\\n' < /proc/device-tree/model", {
+      ignoreError: true,
+    });
+    if (model && /jetson|thor|nvidia/i.test(model)) return true;
+  } catch {}
+
+  return false;
+}
+
 function detectGpu() {
-  function isJetsonPlatform() {
-    try {
-      const nvTegra = runCapture("test -f /etc/nv_tegra_release && echo yes", {
-        ignoreError: true,
-      });
-      if (nvTegra && nvTegra.trim() === "yes") return true;
-    } catch {}
-
-    try {
-      const compat = runCapture("tr '\\0' '\\n' < /proc/device-tree/compatible", {
-        ignoreError: true,
-      });
-      if (compat && /nvidia,tegra|nvidia,thor/i.test(compat)) return true;
-    } catch {}
-
-    try {
-      const model = runCapture("tr '\\0' '\\n' < /proc/device-tree/model", {
-        ignoreError: true,
-      });
-      if (model && /jetson|thor|nvidia/i.test(model)) return true;
-    } catch {}
-
-    return false;
-  }
-
   function getSystemMemoryMB() {

Also applies to: 223-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 27 - 50, The helper isJetsonPlatform() is
currently defined inside detectGpu(), but startNimContainer() (and other
callers) invoke isJetsonPlatform() at module scope, causing a ReferenceError;
move the isJetsonPlatform() function definition out of detectGpu() into the
module/global scope so it is available to startNimContainer() and any other
top-level callers, then update detectGpu() to call the now-top-level
isJetsonPlatform() (no other logic changes required).

118-127: ⚠️ Potential issue | 🟡 Minor

Don’t mark Jetson as NIM-capable when memory probing failed.

This branch returns nimCapable: true with both memory fields at 0. That makes the GPU descriptor contradictory and can mislead downstream model gating.

🔧 Minimal fix
       return {
         type: "nvidia",
         name: "NVIDIA Jetson",
         count: gpuCount,
         totalMemoryMB: 0,
         perGpuMB: 0,
-        nimCapable: true,
+        nimCapable: false,
         jetson: true,
         unifiedMemory: true,
       };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 118 - 127, The NVIDIA Jetson branch currently
returns nimCapable: true even when memory probing failed (totalMemoryMB and
perGpuMB are 0), which is contradictory; update the return object in the Jetson
branch (the object with fields type, name, count, totalMemoryMB, perGpuMB,
nimCapable, jetson, unifiedMemory) so that nimCapable is set to true only when
memory probing succeeded (e.g., totalMemoryMB and perGpuMB are non-zero or a
probeSuccess flag) and set to false when memory values are 0; ensure any
downstream callers still receive the jetson/unifiedMemory flags but rely on
nimCapable to indicate usable memory info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@bin/lib/nim.js`:
- Around line 27-50: The helper isJetsonPlatform() is currently defined inside
detectGpu(), but startNimContainer() (and other callers) invoke
isJetsonPlatform() at module scope, causing a ReferenceError; move the
isJetsonPlatform() function definition out of detectGpu() into the module/global
scope so it is available to startNimContainer() and any other top-level callers,
then update detectGpu() to call the now-top-level isJetsonPlatform() (no other
logic changes required).
- Around line 118-127: The NVIDIA Jetson branch currently returns nimCapable:
true even when memory probing failed (totalMemoryMB and perGpuMB are 0), which
is contradictory; update the return object in the Jetson branch (the object with
fields type, name, count, totalMemoryMB, perGpuMB, nimCapable, jetson,
unifiedMemory) so that nimCapable is set to true only when memory probing
succeeded (e.g., totalMemoryMB and perGpuMB are non-zero or a probeSuccess flag)
and set to false when memory values are 0; ensure any downstream callers still
receive the jetson/unifiedMemory flags but rely on nimCapable to indicate usable
memory info.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 82bb7514-3484-4304-a3d4-06bf8d5a0870

📥 Commits

Reviewing files that changed from the base of the PR and between 12fe0c3 and a21dd2e.

📒 Files selected for processing (1)
  • bin/lib/nim.js

The pyright pre-push hook type-checks test files that import pytest. Adds pytest to uv run.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/lib/nim.js (1)

212-235: ⚠️ Potential issue | 🟠 Major

Don't force-delete containers under a generic nim- prefix.

Line 235 removes any existing container with the computed name, and nim-${sandboxName} is generic enough to collide with unrelated local containers. Starting NemoClaw can now delete something it does not own; keep a project-specific prefix or verify an ownership label before docker rm -f.

🔧 Safer default
 function containerName(sandboxName) {
-  return `nim-${sandboxName}`;
+  return `nemoclaw-nim-${sandboxName}`;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 212 - 235, The current
startNimContainer/containerName logic uses a generic "nim-" prefix and
unconditionally runs `docker rm -f`, which can delete unrelated containers;
change container ownership handling so we only remove containers we own: update
containerName to include a project-specific unique prefix (e.g., include a fixed
project id or env var) or, better, before calling `docker rm -f` in
startNimContainer, verify ownership by checking for a specific Docker label
(e.g., "nim.sandbox.owner" or "nim.project=<project>") on the container via
`docker ps`/`docker inspect` and only remove if the label matches; alternatively
stop/remove without forcing all containers by name if label absent. Ensure
references to containerName and startNimContainer are updated to use the new
prefix/label scheme consistently.
♻️ Duplicate comments (1)
bin/lib/nim.js (1)

133-142: ⚠️ Potential issue | 🟡 Minor

Treat a 0 MB unified-memory probe as detection failure.

Lines 133-142 and 153-161 still return nimCapable: true when getSystemMemoryMB() comes back 0, so downstream code sees a usable NVIDIA target with no memory. Return null or set nimCapable: false whenever the unified-memory probe fails.

🔧 Suggested fix
       return {
         type: "nvidia",
         name: "NVIDIA Jetson",
         count: gpuCount,
         totalMemoryMB: 0,
         perGpuMB: 0,
-        nimCapable: true,
+        nimCapable: false,
         jetson: true,
         unifiedMemory: true,
       };

Apply the same totalMemoryMB > 0 guard to the GB10 branch before returning nimCapable: true.

Also applies to: 153-161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 133 - 142, The NVIDIA/Jetson detection currently
returns nimCapable: true even when getSystemMemoryMB() yields 0; update the
logic in the Jetson branch (the block that builds the object with properties
type:"nvidia", name:"NVIDIA Jetson", totalMemoryMB, perGpuMB, nimCapable,
jetson, unifiedMemory) and the GB10 branch to treat totalMemoryMB <= 0 as a
probe failure by either returning null or setting nimCapable: false instead of
true; ensure the same totalMemoryMB > 0 guard is applied before setting
nimCapable to true so downstream code never receives a nimCapable device with 0
MB.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/nim.js`:
- Around line 41-63: The code currently uses isJetsonPlatform() (which matches
both Jetson and Thor) to decide halving MemTotal; change the logic so only Thor
triggers the halving: add a new isThorPlatform() (or extend isJetsonPlatform()
to expose a Thor-specific check) that replicates the existing
compatible/model/nv_tegra checks but only matches "thor", then replace the
MemTotal /= 2 decision to use isThorPlatform() instead of isJetsonPlatform();
keep the original isJetsonPlatform() unchanged so regular Jetson boards are not
halved.
- Line 66: Restore detectGpu by exporting it from this module (e.g., add it to
module.exports or export the function so callers can call nim.detectGpu()), and
prevent main() from running on import by wrapping the unconditional call in a
CLI guard (use the Node check like if (require.main === module) { main(); } or
the ESM equivalent) so main() only executes when this file is the entrypoint;
update the exports to include detectGpu and leave main() guarded.

---

Outside diff comments:
In `@bin/lib/nim.js`:
- Around line 212-235: The current startNimContainer/containerName logic uses a
generic "nim-" prefix and unconditionally runs `docker rm -f`, which can delete
unrelated containers; change container ownership handling so we only remove
containers we own: update containerName to include a project-specific unique
prefix (e.g., include a fixed project id or env var) or, better, before calling
`docker rm -f` in startNimContainer, verify ownership by checking for a specific
Docker label (e.g., "nim.sandbox.owner" or "nim.project=<project>") on the
container via `docker ps`/`docker inspect` and only remove if the label matches;
alternatively stop/remove without forcing all containers by name if label
absent. Ensure references to containerName and startNimContainer are updated to
use the new prefix/label scheme consistently.

---

Duplicate comments:
In `@bin/lib/nim.js`:
- Around line 133-142: The NVIDIA/Jetson detection currently returns nimCapable:
true even when getSystemMemoryMB() yields 0; update the logic in the Jetson
branch (the block that builds the object with properties type:"nvidia",
name:"NVIDIA Jetson", totalMemoryMB, perGpuMB, nimCapable, jetson,
unifiedMemory) and the GB10 branch to treat totalMemoryMB <= 0 as a probe
failure by either returning null or setting nimCapable: false instead of true;
ensure the same totalMemoryMB > 0 guard is applied before setting nimCapable to
true so downstream code never receives a nimCapable device with 0 MB.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 58f27f0d-d8e9-4881-be61-af4d480deda1

📥 Commits

Reviewing files that changed from the base of the PR and between 8d418f0 and 5b549a6.

📒 Files selected for processing (2)
  • .pre-commit-config.yaml
  • bin/lib/nim.js
✅ Files skipped from review due to trivial changes (1)
  • .pre-commit-config.yaml

Comment on lines +41 to +63
function isJetsonPlatform() {
try {
const nvTegra = runCapture("test -f /etc/nv_tegra_release && echo yes", {
ignoreError: true,
});
if (nvTegra && nvTegra.trim() === "yes") return true;
} catch {}

try {
const compat = runCapture("tr '\\0' '\\n' < /proc/device-tree/compatible", {
ignoreError: true,
});
if (compat && /nvidia,tegra|nvidia,thor/i.test(compat)) return true;
} catch {}

try {
const model = runCapture("tr '\\0' '\\n' < /proc/device-tree/model", {
ignoreError: true,
});
if (model && /jetson|thor|nvidia/i.test(model)) return true;
} catch {}

return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only halve unified memory on Thor.

isJetsonPlatform() intentionally matches both Jetson and Thor, but Line 118 halves MemTotal for every match. The Thor-specific fallback in the PR objective now underreports regular Jetson boards by 50%, which can incorrectly disqualify models that should fit; split Thor detection out before applying the / 2 adjustment.

Also applies to: 105-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` around lines 41 - 63, The code currently uses
isJetsonPlatform() (which matches both Jetson and Thor) to decide halving
MemTotal; change the logic so only Thor triggers the halving: add a new
isThorPlatform() (or extend isJetsonPlatform() to expose a Thor-specific check)
that replicates the existing compatible/model/nv_tegra checks but only matches
"thor", then replace the MemTotal /= 2 decision to use isThorPlatform() instead
of isJetsonPlatform(); keep the original isJetsonPlatform() unchanged so regular
Jetson boards are not halved.

return false;
}

function detectGpu() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that callers still use bin/lib/nim.js as a library and that this file exports nothing.

printf '\n-- nim.detectGpu() call sites --\n'
rg -n -C2 --type=js --type=ts '\bnim\.detectGpu\s*\('

printf '\n-- exports currently declared in bin/lib/nim.js --\n'
sed -n '1,320p' bin/lib/nim.js | rg -n '^\s*export\s+'

Repository: NVIDIA/NemoClaw

Length of output: 1248


Restore detectGpu() as an importable API and guard the CLI entrypoint.

bin/lib/onboard.js at line 405 (and test files) still call nim.detectGpu(), but detectGpu() is no longer exported here and line 279 now runs main() unconditionally on import, breaking the GPU-detection flow before callers can use the symbol.

  • Export detectGpu function
  • Guard main() execution to only run when the file is the CLI entrypoint, not when imported as a module
🔧 Proposed fix
+import { pathToFileURL } from "node:url";
-function detectGpu() {
+export function detectGpu() {
-main();
+if (process.argv[1] && import.meta.url === pathToFileURL(process.argv[1]).href) {
+  main();
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function detectGpu() {
import { pathToFileURL } from "node:url";
// ... other code ...
export function detectGpu() {
// function body
}
// ... other code ...
if (process.argv[1] && import.meta.url === pathToFileURL(process.argv[1]).href) {
main();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/nim.js` at line 66, Restore detectGpu by exporting it from this
module (e.g., add it to module.exports or export the function so callers can
call nim.detectGpu()), and prevent main() from running on import by wrapping the
unconditional call in a CLI guard (use the Node check like if (require.main ===
module) { main(); } or the ESM equivalent) so main() only executes when this
file is the entrypoint; update the exports to include detectGpu and leave main()
guarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant