Skip to content

define pod sandbox lifecycle contract#286

Open
aojea wants to merge 1 commit intocontainerd:mainfrom
aojea:pod_sandbox_hooks
Open

define pod sandbox lifecycle contract#286
aojea wants to merge 1 commit intocontainerd:mainfrom
aojea:pod_sandbox_hooks

Conversation

@aojea
Copy link
Copy Markdown
Contributor

@aojea aojea commented Apr 9, 2026

Define the contract for the PodSandbox hooks for the NRI plugins.

The Sandbox hooks are based on the CRI-API RPCs , since the OCI runtime only specify the container lifecycle.

/assign @samuelkarp @haircommander

@SergeyKanzhelev
Copy link
Copy Markdown

same way we do conformance for runtimes, some of these contracts may be added to conformance: kubernetes-sigs/cri-tools#2046. I think critest is the right place for it. But it opens an interesting discussion on whether we will also want to do conformance with other things like CNI. NRI is different enough that I would see it as a reasonable conformance requirement

@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 11, 2026

other things like CNI.

CNI is not part of CRI, is an implementation detail of the runtimes and has several flaws that cause a lot of problem with current workloads ... during last kubecon during the OCI meeting we also discussed to replace it by a modular solution based on NRI ... I have a draft that once I have time I plan to finish and share, but I suggest to not include CNI as part of conformance of anything

Comment thread docs/pod-sandbox-lifecycle.md Outdated
Comment thread docs/pod-sandbox-lifecycle.md Outdated
Plugins should handle errors gracefully and avoid leaving the pod or system in an inconsistent state. Error recovery strategies:

- **RunPodSandbox errors**: Problematic; may block pod creation depending on failure severity and runtime policy
- **StopPodSandbox errors**: May not prevent scenario termination depending on runtime policy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on the teardown path, actually both for pods and containers, we should not allow a plugin to try and prevent the operation with an error. If we agree, then we should clearly state here that, for StopPodSandbox and RemovePodSandbox, a plugin failing with an error will not prevent the operation from proceeding.

The current implementation has incorrect/inconstent behavior in this regard when multiple plugins are involved in the sense that for some of these (or corresponding container) teardown lifecycle events, a failure in a plugin will incorrectly prevent the event from being delivered to subsequent plugins, although it will not prevent the CRI-level operation from proceeding. There is a fix coming in for this, but it is waiting for #274 from get merged first (which is waiting for #277 to get merged first).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub can you please suggest the better wording for this?

I do not feel I'm able to translate that correctly to words :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me take a stab at this, I think now I got what you mean

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see new commit rephrasing it

@klihub
Copy link
Copy Markdown
Member

klihub commented Apr 11, 2026

@aojea Thank you, this looks great ! I only have a few comments.

@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 13, 2026

@aojea Thank you, this looks great ! I only have a few comments.

@klihub addressed comments on a new commit to simplify the review

@klihub
Copy link
Copy Markdown
Member

klihub commented Apr 13, 2026

@aojea Thank you, this looks great ! I only have a few comments.

@klihub addressed comments on a new commit to simplify the review

@aojea Thanks ! This now LGTM. We can squash the commits before/during merging.

@aojea aojea force-pushed the pod_sandbox_hooks branch from 609d764 to 3c820b0 Compare April 13, 2026 19:10
@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 13, 2026

squashed

@klihub klihub self-requested a review April 14, 2026 06:52
Copy link
Copy Markdown
Member

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@klihub klihub requested a review from chrishenzie April 14, 2026 06:53
@dkennetzoracle
Copy link
Copy Markdown

@aojea thanks for putting this together, great doc.

Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good.. see comments

Comment thread docs/pod-sandbox-lifecycle.md Outdated

**CRI Operation**: RunPodSandbox - Creates and starts a pod-level sandbox.

**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed the CRI RunPodSandbox operation and the sandbox has reached a "Ready" state, but before any workload containers are started.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed the CRI RunPodSandbox operation and the sandbox has reached a "Ready" state, but before any workload containers are started.
**NRI Event Timing**: The RunPodSandbox NRI event is fired when the runtime has tentatively finished executing the CRI RunPodSandbox operation but just before setting the pod to the "Ready" state, which occurs immediately after NRI event processing, and thus before any workload containers are/can be started as the Pod is still in the "unknown" state.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed the CRI RunPodSandbox operation and the sandbox has reached a "Ready" state, but before any workload containers are started.
**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed most of the CRI RunPodSandbox operation; NRI plugin execution is the final step before the sandbox reaches a "Ready" state. The Kubelet does not start workload containers until after the sandbox becomes "Ready".

Comment thread docs/pod-sandbox-lifecycle.md Outdated
This specification defines how NRI plugins interact with pod sandbox lifecycle events. The underlying pod sandbox operations are defined by the [Kubernetes CRI API](https://github.com/kubernetes/cri-api):

- **RunPodSandbox (CRI)**: Creates and starts a pod-level sandbox. Runtimes must ensure the sandbox is in the ready state on success.
- **StopPodSandbox (CRI)**: Stops any running process that is part of the sandbox and reclaims network resources.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **StopPodSandbox (CRI)**: Stops any running process that is part of the sandbox and reclaims network resources.
- **StopPodSandbox (CRI)**: Stops any running process that is part of the sandbox and directs the runtime to reclaim certain pod resources (e.g. Network Namespace, CNI teardown, and image mounts). May be called multiple times, and is idempotent.

Comment thread docs/pod-sandbox-lifecycle.md Outdated

The pod sandbox lifecycle consists of three distinct phases, each with a corresponding NRI event that plugins can subscribe to:

1. **RunPodSandbox**: Fired after the runtime successfully executes CRI RunPodSandbox
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **RunPodSandbox**: Fired after the runtime successfully executes CRI RunPodSandbox
1. **RunPodSandbox**: Fired after the runtime successfully creates the pod, but before setting the pod to running and then replying success to CRI RunPodSandbox request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **RunPodSandbox**: Fired after the runtime successfully executes CRI RunPodSandbox
1. **RunPodSandbox**: Fired during the the runtime CRI RunPodSandbox execution

Comment thread docs/pod-sandbox-lifecycle.md Outdated
- Network setup has been fully configured (network interfaces are up and assigned addressing)
- The pod IP address (if applicable) is assigned and available
- The "pause" container (if the runtime uses one) is running
- All prerequisite operations for workload container startup are complete
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- All prerequisite operations for workload container startup are complete
- All prerequisite operations for workload container startup are complete, the pod is in the "unknown state" and will become "Ready" once the NRI event is processed. *This guarantees the NRI plugin has a window to allocate resources for the pod before any workload containers are started.

Comment thread docs/pod-sandbox-lifecycle.md Outdated

- Workload containers within the sandbox are stopped or are stopping
- **CRITICAL**: The sandbox infrastructure still exists and remains fully accessible during this hook
- The network namespace is not unmounted or deleted until this hook completes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The network namespace is not unmounted or deleted until this hook completes
- The pod resources allocated by the runtime; such as network namespace, CNI networks, and image mounts; are not unmounted or deleted until this hook completes

Comment thread docs/pod-sandbox-lifecycle.md Outdated

**CRI Operation**: RemovePodSandbox - Removes the sandbox and forcibly terminates any remaining containers.

**NRI Event Timing**: The RemovePodSandbox NRI event is fired when the runtime initiates the CRI RemovePodSandbox operation, during final garbage collection.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**NRI Event Timing**: The RemovePodSandbox NRI event is fired when the runtime initiates the CRI RemovePodSandbox operation, during final garbage collection.
**NRI Event Timing**: The RemovePodSandbox NRI event is fired by the runtime just prior to removing the pod from the pod list.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime doesn't initiate RemovePodSandbox, the Kubelet does.

Comment thread docs/pod-sandbox-lifecycle.md Outdated

Runtimes MUST guarantee the following ordering:

1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution
1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution, but before the pod is set to the "Ready" state

Comment thread docs/pod-sandbox-lifecycle.md Outdated
Runtimes MUST guarantee the following ordering:

1. **RunPodSandbox** NRI event fires after successful CRI RunPodSandbox execution
2. **StopPodSandbox** NRI event fires during CRI StopPodSandbox execution
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **StopPodSandbox** NRI event fires during CRI StopPodSandbox execution
2. **StopPodSandbox** NRI event fires during CRI StopPodSandbox execution, just prior to removing the runtime pod resources allocated by the runtime; such as network namespace, CNI networks, and image mounts

@aojea aojea force-pushed the pod_sandbox_hooks branch from 3c820b0 to 18b2b7b Compare April 23, 2026 08:04
@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 23, 2026

new commit trying to reconcile @mikebrow and @samuelkarp comments

Comment thread docs/pod-sandbox-lifecycle.md Outdated
@mikebrow
Copy link
Copy Markdown
Member

pls squash :-)

Define the contract for the PodSandbox hooks for the  NRI plugins.

The Sandbox hooks are based on the CRI-API RPCs , since the OCI runtime
only specify the container lifecycle.

Co-authored-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: Antonio Ojea <aojea@google.com>
@aojea aojea force-pushed the pod_sandbox_hooks branch from b635dc2 to 9a74a7b Compare April 24, 2026 15:53
Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


The pod sandbox lifecycle consists of three distinct phases, each with a corresponding NRI event that plugins can subscribe to:

1. **RunPodSandbox**: Fired during the the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **RunPodSandbox**: Fired during the the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.
1. **RunPodSandbox**: Fired during the runtime CRI RunPodSandbox execution, after the PodSandbox is created but before setting the pod to running and then replying success to CRI RunPodSandbox request.


**CRI Operation**: RunPodSandbox - Creates and starts a pod-level sandbox.

**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed most of the CRI RunPodSandbox operation; NRI plugin execution is the final step before the sandbox reaches a "Ready" state. The Kubelet does not start workload containers until after the sandbox becomes "Ready".
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will container runtime keep retrying or fail on first failure/timeout?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On errors other than plugin timeout, the runtime will fail the pod creation request. On plugin timeout the plugin is kicked out by the runtime.


**NRI Event Timing**: The RunPodSandbox NRI event is fired after the runtime has successfully executed most of the CRI RunPodSandbox operation; NRI plugin execution is the final step before the sandbox reaches a "Ready" state. The Kubelet does not start workload containers until after the sandbox becomes "Ready".

### Sandbox State Contract
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If NRI plugin timed out, will it receive the Stop event?

Copy link
Copy Markdown
Member

@klihub klihub Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an NRI plugin times out the runtime kicks it out by forcibly disconnecting the plugin.

- Network setup has been fully configured (network interfaces are up and assigned addressing)
- The pod IP address (if applicable) is assigned and available
- The "pause" container (if the runtime uses one) is running
- All prerequisite operations for workload container startup are complete, the pod is in the "unknown state" and will become "Ready" once the NRI event is processed. This guarantees the NRI plugin has a window to allocate resources for the pod before any workload containers are started.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can NRI pluging start it's own processes in this sandbox?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not by using NRI for this. NRI itself does not provide means for this.


### Sandbox State Contract

When the runtime fires the RunPodSandbox NRI event, it guarantees:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say something about volumes and DRA devices in this list

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DRA downward API be accessible?

3. **RemovePodSandbox** NRI event fires during CRI RemovePodSandbox execution
4. These events MUST fire in strict order: RunPodSandbox → StopPodSandbox → RemovePodSandbox
5. No workload containers will be started until after RunPodSandbox hook completes
6. All workload containers will be stopped before StopPodSandbox hook is called
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this contradicts the contract above. It was saying stopped or being stopped:

  • Workload containers within the sandbox are stopped or are stopping

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod was wondering why the "or" didn't get a chance to verify that


- **RunPodSandbox**: Failure may result in pod creation failure
- **StopPodSandbox**: Non-blocking for subsequent operations; the plugin should not depend on completion of subsequent teardown
- **RemovePodSandbox**: Non-blocking; removal will proceed regardless of plugin timeout
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not guaranteeing successful execution here? It may lead to resource leak. With DRA we guarantee successful unprepare call, why shouldn't we do the same here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have discussed MUST run plugins vs optional plugins. Is a good point that in this RM processing we need to also consider which of the plugins MUST complete, or possibly cause leaks.


### Timeout Handling

All plugin processing must complete within the configured request timeout. Plugins should plan accordingly:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this statement is hard to parse as a spec. I suggest to specify that the timeout is treated as error by runtime

On the teardown path, plugin errors MUST NOT prevent the operation from proceeding. Runtimes MUST ensure that a failing plugin cannot block pod or container teardown:

- **RunPodSandbox errors**: A plugin error may prevent the pod from being created, depending on runtime policy. Plugins bear responsibility for errors they return at this phase.
- **StopPodSandbox errors**: A plugin error MUST NOT prevent the sandbox from being stopped. The runtime MUST proceed with teardown regardless of plugin failures.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Is there any alternative way for plugin to handle clean up if runtime will not guarantee call and retry?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative: on sync of the list of pods a plugin may be able to normalize it's internal list of pods with resources vs the actual sync list.


### Multi-Plugin Coordination

When multiple plugins are active:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if plugin didn't exist on sandbox creation and was notified about existing sandboxes on loading. What is the guarantee on consistency between that notification and plugin Stop/Remove calls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants