-
Notifications
You must be signed in to change notification settings - Fork 86
Implement support for PodSandboxStatus RPC #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This is a great thing to add, thanks @MikeZappa87 ! |
5cae6e9 to
b4bf04c
Compare
klihub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions about the chosen semantics. But it might be that I just lack the necessary context here...
1b28406 to
6cc0259
Compare
Signed-off-by: Michael Zappa <michael.zappa@gmail.com>
3690424 to
f713eb0
Compare
| if ipOwner != "" { | ||
| return nil, fmt.Errorf("plugins %q and %q both tried to set PodSandboxStatus IP address field", ipOwner, plugin.name()) | ||
| } | ||
| rsp.Ip = pluginRsp.Ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already return the pod ips #119 , do you need anything else from the Status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok, this is not for read, this is for update ... this has a high risk of causing problems with the runtimes, last time I checked I think that crio and containerd both had different hooks for the container IP setting and the NRI hook ... first of all I think that we should define the contract between the NRI hooks and the runtimes of at what point one thing is executed ... I added integration tests in containerd to ensure the current defined order and lifecycle is respected and does not break containerd/containerd#11331 but we should do the same exercise in crio to avoid drifting the runtimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about this? If you make any changes those are not commited back to the internal store of containerd. I am not 100% following you here? Since kubelet calls PodSandboxStatus, we need to hook into that RPC directly otherwise the change needs to be made in either RunPodSandbox or the kubelet?
Having kubelet use something internal like reading a claim instead of PodSandboxStatus would be an idea for sure. If we update RunPodSandbox, we would need to have that update the internal store for example?
|
|
||
| // PodSandboxStatus relays the corresponding CRI request to plugins. | ||
| // If a plugin returns IP addresses, those will be returned to the caller. | ||
| // An error is returned if multiple plugins attempt to set the same field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be a nightmare to troubleshoot in prod on kubernetes clusters, since the error surfaced will most probably do not indicate something, the CNI will say it is correct, since it assignes the IP, and this NRI plugin will be probably opaque to the user ... I think we need to define first a better authoritative model for the IP assignment in the runtimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a couple things here. If this PR merges, the CNI would be optional so people would need to take that into consideration. I believe this concern is shared across NRI as well? NRI can modify fields on the Pod and containers without the user knowing. The way I was going to implement this in containerd at least was to not allow this RPC to be modified if the CNI was enabled, at least that was just a thought to prevent these conditions however open to all the ideas here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on the authority for ip assignments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts. If the NRI plugin can specify all necessary networking details for a pre-existing interface, then the container runtime could perhaps configure everything itself. But CNIs exist to set up an optimal (to them or the problem at hand) overlay network between nodes, so involving CNIs should perhaps not be removed from the equation altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NRI can modify fields on the Pod and containers without the user knowing
Yeah, but those are properties of the application itself, rlimits, cpu, and assigned by the runtime IIUIC .. the IP assigned is a property required to communicate with the rest of the world, and also assigned by another actor, so you are changing a convention that will generate a lot of issues in prod
@MikeZappa87 this idea of a dedicated hook
explicit Pre- and PostSetupNetwork
Makes it easier to think about, so runtime can choose
if NRISetupNetwork {
callNRINetworkPlugin
} if CNISetupNetwork {
call CNI // current state
} else {
return error, no network plugin configured
This way there is no risk someone assigns an IP and later it realizes is not being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if NRISetupNetwork {
callNRINetworkPlugin
} if CNISetupNetwork {
call CNI // current state
} else {
return error, no network plugin configured
The POC works in this way. However I wonder if I want to move this PR forward or not. I debate. It might be better to close for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the end of the day is a containerd/crio decision, I can only provide non-binding opinions here
|
Thanks for doing this, I initiated this path with this in mind but I unfortunately didn't have the necessary time to continue, so super glad you are taking the initiative, added some comments from my previous research in this area |
|
A few years back we were investigating CNI lifecycle management in NRI by adding a set of CNI lifecycle hooks (https://github.com/containerd/nri/pull/57/changes#top). Here we added explicit Pre- and PostSetupNetwork, Pre- and PostNetworkDeleted and NetworkConfigurationchanged messages to help with network lifecycle tracking. NRI plugins were able to alter the networking setup at will, and in this PoC quite wildly indeed. Although the messaging at the time was built around CNI as we did not want to build an alternative CNI protocol, perhaps the idea of network lifecycle events can be of use here? There also was a quick demonstration/hack for containerd of the day to provide a PoC for the whole setup (pfl/containerd@ecb9061). Our takeaway at the time was the lifecycle management, not the actual message format sent back and forth. But back to the CNI/non-CNI issue. As most (all?) of the real-world CNIs seem to have network daemons running while getting configuration requests via CNI directory binaries, maybe there should be a revised protocol for contacting CNIs directly from container runtimes with a perhaps a fallback to current CNI messaging if needed. Maybe messaging could be redefined so that there is a distinction between choices already done by NRI plugins (IP, routing, network devices, etc) with the ones not specified left to the discretion of the CNI? With this I'm assuming the CNI daemons still want to set up an optimized overlay network needed for a particular cluster and its configuration? |
This PR is to add PodSandboxStatus support to NRI.
The idea to why we want to bring this RPC in now is that we want to try and begin the process to deprecate the CNI and we need a way to return Pod IP's back to the kubelet. Without this we ether need to come up with a new RPC outside of NRI or experiment directly in the kubelet. Using NRI seems to be the best place right now. This is why the PodSandboxStatusResponse type only returns pod ip addresses, its open to more fields be added.
I put together this branch in containerd that brings this NRI branch in and disables the CNI as a test bed.
containerd/containerd@main...MikeZappa87:containerd:mzappa/knext