Skip to content

helm: add nodeSelector and tolerations to NIMCache templates#1645

Open
abhay1999 wants to merge 2 commits intoNVIDIA:mainfrom
abhay1999:1636-nimcache-nodeselector-tolerations
Open

helm: add nodeSelector and tolerations to NIMCache templates#1645
abhay1999 wants to merge 2 commits intoNVIDIA:mainfrom
abhay1999:1636-nimcache-nodeselector-tolerations

Conversation

@abhay1999
Copy link

Summary

Fixes #1636

NIMCache resources were missing nodeSelector and tolerations fields across all NIM Operator mode Helm templates. Without these fields, NIMCache pods cannot be scheduled on GPU nodes that have taints or require node selection (e.g. cloud provider GPU node pools with nvidia.com/gpu: NoSchedule taints).

The NIMService sections of the same templates already exposed these fields correctly — this PR makes NIMCache consistent with NIMService.

Changes

Added nodeSelector and tolerations to the NIMCache spec in all 8 affected templates:

  • helm/templates/llama-nemotron-embed-1b-v2.yaml
  • helm/templates/llama-nemotron-rerank-1b-v2.yaml
  • helm/templates/nemotron-graphic-elements-v1.yaml
  • helm/templates/nemotron-ocr-v1.yaml
  • helm/templates/nemotron-page-elements-v3.yaml
  • helm/templates/nemotron-table-structure-v1.yaml
  • helm/templates/nemotron-nano-12b-v2-vl.yaml (also adds missing nodeSelector to NIMService)
  • helm/templates/nemotron-parse.yaml (also adds missing nodeSelector to NIMService)

Before / After

Before — NIMCache would ignore nodeSelector/tolerations from values.yaml, causing pods to be unschedulable on tainted GPU nodes:

kind: NIMCache
spec:
  source: ...
  storage: ...
  # nodeSelector and tolerations missing

After — consistent with NIMService:

kind: NIMCache
spec:
  source: ...
  storage: ...
  nodeSelector:
    {{ toYaml .Values.nimOperator.<model>.nodeSelector | nindent 4 }}
  tolerations:
    {{ toYaml .Values.nimOperator.<model>.tolerations | nindent 4 }}

Testing

Verified template structure matches the existing NIMService pattern used consistently across all templates in the chart.

NIMCache resources were missing nodeSelector and tolerations fields
across all Operator-mode templates. Without these, NIMCache pods cannot
be scheduled on GPU nodes that have taints or require specific node
selection (e.g. cloud provider GPU node pools).

The NIMService sections of the same templates already expose these
fields correctly. This commit adds the equivalent fields to the NIMCache
sections of all eight affected templates, and also adds the missing
nodeSelector to the NIMService sections of nemotron-nano-12b-v2-vl and
nemotron-parse which only had tolerations.

Fixes NVIDIA#1636

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
@abhay1999 abhay1999 requested a review from a team as a code owner March 18, 2026 01:53
@abhay1999 abhay1999 requested a review from charlesbluca March 18, 2026 01:53
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Remove extra trailing blank line to satisfy pre-commit end-of-file-fixer hook.

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Helm chart NIMCache templates missing nodeSelector and tolerations fields

1 participant