Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,12 @@ subscription.

NRI plugins can subscribe to the following pod lifecycle events:

- creation
- stopping
- removal
- creation (RunPodSandbox)
- stopping (StopPodSandbox)
- removal (RemovePodSandbox)

For detailed specifications of pod sandbox event timing, state requirements, and plugin
expectations, see [Pod Sandbox Lifecycle Hooks](docs/pod-sandbox-lifecycle.md).

The following table lists the pod sandbox properties exposed to NRI plugins, together with
the first NRI, containerd and CRI-O versions each was available in.
Expand Down
162 changes: 162 additions & 0 deletions docs/pod-sandbox-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# NRI Pod Sandbox Lifecycle Hooks

## Relationship to CRI API

This specification defines how NRI plugins interact with pod sandbox lifecycle events. The underlying pod sandbox operations are defined by the [Kubernetes CRI API](https://github.com/kubernetes/cri-api):

- **RunPodSandbox (CRI)**: Creates and starts a pod-level sandbox. Runtimes must ensure the sandbox is in the ready state on success.
- **StopPodSandbox (CRI)**: Stops any running process that is part of the sandbox and directs the runtime to reclaim certain pod resources (e.g. Network Namespace, CNI teardown, and image mounts). May be called multiple times, and is idempotent.
- **RemovePodSandbox (CRI)**: Removes the sandbox. If there are any running containers, they must be forcibly terminated and removed.

This NRI specification details when and under what conditions NRI plugins receive notifications for these events, ensuring plugins can reliably depend on consistent sandbox state across different runtime implementations.

## Overview

The pod sandbox lifecycle consists of three distinct phases, each with a corresponding NRI event that plugins can subscribe to:

1. **RunPodSandbox**: Fired during the the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **RunPodSandbox**: Fired during the the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.
1. **RunPodSandbox**: Fired during the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.

2. **StopPodSandbox**: Fired when the runtime initiates CRI StopPodSandbox
3. **RemovePodSandbox**: Fired when the runtime performs CRI RemovePodSandbox

For each event, this specification defines:

- **Sandbox State Contract**: What sandbox infrastructure conditions runtimes MUST satisfy when firing the NRI event
- **Plugin Responsibilities and Capabilities**: What plugins can safely do in response to the event

## RunPodSandbox

**CRI Operation**: RunPodSandbox - Creates and starts a pod-level sandbox.

**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed most of the CRI RunPodSandbox operation; NRI plugin execution is the final step before the sandbox reaches a "Ready" state. The Kubelet does not start workload containers until after the sandbox becomes "Ready".
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will container runtime keep retrying or fail on first failure/timeout?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On errors other than plugin timeout, the runtime will fail the pod creation request. On plugin timeout the plugin is kicked out by the runtime.


### Sandbox State Contract
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NRI plugin timed out, will it receive the Stop event?

Copy link
Copy Markdown
Member

@klihub klihub Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an NRI plugin times out the runtime kicks it out by forcibly disconnecting the plugin.


When the runtime fires the RunPodSandbox NRI event, it guarantees:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say something about volumes and DRA devices in this list

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DRA downward API be accessible?


- The Pod-level cgroup hierarchy has been established
- The Sandbox namespaces (IPC, Network, UTS) are created and active
- Network setup has been fully configured (network interfaces are up and assigned addressing)
- The pod IP address (if applicable) is assigned and available
- The "pause" container (if the runtime uses one) is running
- All prerequisite operations for workload container startup are complete, the pod is in the "unknown state" and will become "Ready" once the NRI event is processed. This guarantees the NRI plugin has a window to allocate resources for the pod before any workload containers are started.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can NRI pluging start it's own processes in this sandbox?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not by using NRI for this. NRI itself does not provide means for this.


### Plugin Responsibilities and Capabilities

Upon receiving the RunPodSandbox event, plugins can safely:

- Access the network namespace and inspect network configuration
- Perform network-level operations or monitoring
- Inject sandbox-level hardware configurations (e.g., RDMA, RoCEv2)
- Establish plugin-specific tracking or monitoring for the pod
- Store initial state or baseline metrics for later reference
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's clarify the ordering of NRI plugins and what NRI plugin will "see" if there are multiple plugins.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently NRI plugins cannot mutate a pod via NRI. Therefore in the case of multiple plugins, all NRI plugins see identical pod sandbox data.


Plugins should treat this as an initialization phase. The sandbox infrastructure will remain accessible throughout the pod's lifetime until StopPodSandbox is called.

## StopPodSandbox

**CRI Operation**: StopPodSandbox - Stops any running process that is part of the sandbox and reclaims certain pod resources (e.g. Network Namespace, CNI teardown, and image mounts).

**NRI Event Timing**: The StopPodSandbox NRI event is fired when the runtime initiates the CRI StopPodSandbox operation.

### Sandbox State Contract

When the runtime fires the StopPodSandbox NRI event, it guarantees:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add guarantee: this sandbox will never be reused. There is never a scenario when any new workload (containers) will be started there by container runtime after the first attempt to Stop sandbox.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be called even if the Run was never called. Mostly for garbage collecting the sandbox which creation was interrupted. E.g. Run called, almost at the end containerd crashed, on restart containerd doesn't know if it needs clean up state so it must call Stop on each NRI plugin. Ortherwise there will be resources leak

Copy link
Copy Markdown
Member

@mikebrow mikebrow Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod.. at which point I believe the state of the pod would be "unknown"


- Workload containers within the sandbox are stopped or are stopping
- **CRITICAL**: The sandbox infrastructure still exists and remains fully accessible during this hook
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can runtime guarantee that the state is "good enough" to start another container in this sandbox?

Copy link
Copy Markdown
Member

@mikebrow mikebrow Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no.. this is currently post stopping the pod which includes killing the containers and the pod container, during which when the pod container is killed and during exit processing of it's task the pod state is set to NotReady, which subsequently blocks new containers on the pod. This call is pre-tear down of certain pod resources, noting there was a discrepancy in the crio placement of the call, WIP to nomalize.

We could add another call earlier in the stop processing if we want to add the goal of creating notifications that stop pod has been requested. And possibly a third call to indicate the pod containers are stopped but the pod container is still running and thus another container can be added.

- The pod resources allocated by the runtime; such as network namespace, CNI networks, and image mounts; are not unmounted or deleted until this hook completes
- The pod's cgroups remain accessible
- All pod-level resources remain stable until this hook returns

### Plugin Responsibilities and Capabilities
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responsibility: this event processing must be re-entrant.


StopPodSandbox is the designated cleanup and observation phase for plugins. Upon receiving this event, plugins can:

- Access the pod's network namespace to read final telemetry or metrics
- Collect final state for observability or troubleshooting
- Detach hardware interfaces or reconfigure resources
- Clean up custom firewall configurations, routing rules, or other network-level state
- Perform graceful cleanup or resource release before sandbox teardown

**Important**: Plugin processing must complete within the configured request timeout. Do not assume sandbox access persists after this hook returns or times out.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we need to say that there will be a retry if timeout happened.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that perspective, can we say that it can be assumed as long as plugin keeps returning error?


## RemovePodSandbox

**CRI Operation**: RemovePodSandbox - Removes the sandbox and forcibly terminates any remaining containers.

**NRI Event Timing**: The RemovePodSandbox NRI event is fired when the runtime initiates the CRI RemovePodSandbox operation, just prior to removing the pod from the pod list.

### Sandbox State Contract

When the runtime fires the RemovePodSandbox NRI event:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another guarantee is that it never be called until the Stop succeeded


- All workload containers have been removed
- The StopPodSandbox operation has completed
- Network setup teardown may be underway or complete
- The pod's namespaces (Network, IPC, UTS) may have already been deleted
- Pod-level cgroups may be destroyed
- Sandbox infrastructure access is **not guaranteed**

### Plugin Responsibilities and Capabilities
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin must be reentrant here as well


RemovePodSandbox is strictly for plugin-internal cleanup. Plugins MUST NOT attempt to access pod infrastructure (namespaces, cgroups, network configuration) during this hook, as their existence is not guaranteed.

Plugins receiving this event should only:

- Clean up plugin-internal memory caches or object tracking associated with the podSandboxID
- Remove host-level tracking files, database entries, or other locally stored pod references
- Release any plugin resources held for this specific pod
- Perform final accounting or bookkeeping

**Important**: This hook is informational only. Plugins should not assume any pod infrastructure exists. Only clean up information the plugin created or stored internally.

## Event Ordering and Guarantees

Runtimes MUST guarantee the following ordering:

1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution, but before the pod is set to the "Ready" state.
2. **StopPodSandbox** NRI event fires during CRI StopPodSandbox execution, just prior to removing the runtime pod resources allocated by the runtime; such as network namespace, CNI networks, and image mounts
3. **RemovePodSandbox** NRI event fires during CRI RemovePodSandbox execution
4. These events MUST fire in strict order: RunPodSandbox → StopPodSandbox → RemovePodSandbox
5. No workload containers will be started until after RunPodSandbox hook completes
6. All workload containers will be stopped before StopPodSandbox hook is called
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this contradicts the contract above. It was saying stopped or being stopped:

  • Workload containers within the sandbox are stopped or are stopping

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod was wondering why the "or" didn't get a chance to verify that

7. No network resource reclamation should occur during StopPodSandbox hook execution

See the [CRI API specification](https://github.com/kubernetes/cri-api) for details on each CRI operation.

## Plugin Implementation Guidance

### Subscribing to Events

Plugins subscribe to these events during the Configure phase by returning the appropriate event flags in the ConfigureResponse:

- `Event_RUN_POD_SANDBOX` (1 << 0) for RunPodSandbox
- `Event_STOP_POD_SANDBOX` (1 << 1) for StopPodSandbox
- `Event_REMOVE_POD_SANDBOX` (1 << 2) for RemovePodSandbox

These events are delivered to plugins using the RunPodSandbox, StopPodSandbox and RemovePodSandbox event handlers.

### Timeout Handling

All plugin processing must complete within the configured request timeout. Plugins should plan accordingly:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this statement is hard to parse as a spec. I suggest to specify that the timeout is treated as error by runtime


- **RunPodSandbox**: Failure may result in pod creation failure
- **StopPodSandbox**: Non-blocking for subsequent operations; the plugin should not depend on completion of subsequent teardown
- **RemovePodSandbox**: Non-blocking; removal will proceed regardless of plugin timeout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not guaranteeing successful execution here? It may lead to resource leak. With DRA we guarantee successful unprepare call, why shouldn't we do the same here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have discussed MUST run plugins vs optional plugins. Is a good point that in this RM processing we need to also consider which of the plugins MUST complete, or possibly cause leaks.


### Error Handling

On the teardown path, plugin errors MUST NOT prevent the operation from proceeding. Runtimes MUST ensure that a failing plugin cannot block pod or container teardown:

- **RunPodSandbox errors**: A plugin error may prevent the pod from being created, depending on runtime policy. Plugins bear responsibility for errors they return at this phase.
- **StopPodSandbox errors**: A plugin error MUST NOT prevent the sandbox from being stopped. The runtime MUST proceed with teardown regardless of plugin failures.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Is there any alternative way for plugin to handle clean up if runtime will not guarantee call and retry?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative: on sync of the list of pods a plugin may be able to normalize it's internal list of pods with resources vs the actual sync list.

- **RemovePodSandbox errors**: A plugin error MUST NOT prevent the sandbox from being removed. The runtime MUST proceed with removal regardless of plugin failures.


### Multi-Plugin Coordination

When multiple plugins are active:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if plugin didn't exist on sandbox creation and was notified about existing sandboxes on loading. What is the guarantee on consistency between that notification and plugin Stop/Remove calls?


- All RunPodSandbox hooks complete before first workload container starts
- Hooks execute in plugin index order; later plugins should not assume earlier plugins' modifications will persist
- RemovePodSandbox hooks are independent; plugins should not rely on side effects from other plugins
Loading