fix(canopy): run proxy as a native sidecar so canopy Jobs complete#86
Merged
Conversation
The canopy-proxy was added as a plain container in the restore and snapshot-list Jobs. In a Job Pod (restartPolicy: Never) the kubelet doesn't stop still-running containers when another exits, so once the main kopia container finished the proxy kept serving and the Pod never reached a terminal phase. The Job sat Active until activeDeadlineSeconds fired -> DeadlineExceeded -> Failed, even though the snapshot-list callback had already been POSTed successfully. Confirmed live: the failed Job showed reason=DeadlineExceeded with both containers plain and no init containers. Move the proxy to a native sidecar (init container with restartPolicy: Always). The kubelet keeps it running alongside the main container and SIGTERMs it once the main container exits, so the Pod completes on the main container's exit code. Requires k8s >= 1.29 for SidecarContainers GA; the target cluster is 1.34. Also fix the proxy to wait on SIGTERM as well as SIGINT — tokio's ctrl_c() only catches SIGINT, so under k8s the proxy would have hung until SIGKILL and lost its final traffic stats. Add a 30s terminationGracePeriod so the sidecar can flush stats on SIGTERM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 The canopy-proxy was added as a plain container in the restore and
snapshot-list Jobs. In a Job Pod (
restartPolicy: Never) the kubeletdoesn't stop still-running containers when another exits, so once the
main kopia container finished, the proxy kept serving and the Pod never
reached a terminal phase. The Job sat
ActiveuntilactiveDeadlineSecondsfired →DeadlineExceeded→Failed, eventhough the snapshot-list callback had already been POSTed successfully.
Confirmed on the live cluster: the failed
canopy-replica-snapshot-listJob showed
reason=DeadlineExceeded, both containers plain, no initcontainers.
Fix
Move the proxy to a native sidecar — an init container with
restartPolicy: Always. The kubelet keeps it running alongside the maincontainer and SIGTERMs it once the main container exits, so the Pod
completes on the main container's exit code. Needs k8s ≥ 1.29
(SidecarContainers GA); target cluster is 1.34.
Two supporting changes:
tokio'sctrl_c()only catches SIGINT, so under k8s the proxy would have hunguntil SIGKILL and lost its final traffic stats.
terminationGracePeriodso the sidecarcan flush its stats callback on SIGTERM.
Regression test
canopy_restore_job_proxy_is_native_sidecarasserts theproxy is an init container with
restartPolicy=Alwaysand not a plaincontainer.