General Technical Review: Confidential Containers - Incubation#2051
General Technical Review: Confidential Containers - Incubation#2051halcyondude wants to merge 2 commits intocncf:mainfrom
Conversation
Signed-off-by: Matt Young <halcyondude@gmail.com>
Signed-off-by: Matt Young <halcyondude@gmail.com>
fitzthum
left a comment
There was a problem hiding this comment.
Made a few notes. Looks good generally.
|
|
||
| * Describe how the project is handling certificate rotation and mitigates any issues with certificates. | ||
|
|
||
| **TODO (Maintainers):** Please describe the mechanisms for rotating the internal TLS/mTLS certificates used between Trustee, the CDH, and the Attestation Agent. |
There was a problem hiding this comment.
As mentioned below the trustee operator uses cert manager for this. Users may also have their own approach or infrastructure depending on how Trustee is tied into their network. Also, keep in mind that the KBS protocol is designed to be secure even without HTTPS.
While we're here, there are several other places where we can talk about rotation. For example, the attestation token. For this, the token is short-lived (usually 5 minutes) and the guest will automatically try to re-attest when it expires. For individual resources stored in the KBS, rotation is out of scope of Trustee, and should be driven by the owner of those resources. For hardware evidence, revocation is platform specific. Refer to cert chain / collateral documentation for the various hw platforms.
There was a problem hiding this comment.
@fitzthum would you please point me to the docs you're referring to?
There was a problem hiding this comment.
Which part? For the hw evidence, you can see something like the AMD VCEK spec, which describes using CRLs for checking the AMD cert chain. Other platforms have their own mechanisms. This sort of flow mainly sits below the CoCo project.
There was a problem hiding this comment.
For trustee operator using cert-manager, it's described in this blog - https://confidentialcontainers.org/blog/2026/02/11/deploy-trustee-in-kubernetes/
| |[JDCloud](https://www.jdcloud.com)|JoyScale |Beta |End-User / Service Provider | JoyScale leverages CoCo to protect the AI data privacy in the process of the company's business and end user. (For details: huoqifeng1@jd.com)| | ||
| |[Kubermatic](https://www.kubermatic.com/)| Kubeone | Beta | Service Provider / Consultancy | Running confidential containers on baremetal kubeone clusters. | | ||
|
|
||
| **TODO (Maintainers):** Please provide a brief summary or links to any additional adopter interviews, user surveys, or formal UX research (if any) conducted during the Sandbox phase. |
There was a problem hiding this comment.
Some additional adopters were shared with the toc. These are not listed here due to privacy concerns or pending internal approvals.
There was a problem hiding this comment.
Thanks, and understood regarding adopters. Who's a good person to follow up re: surveys, UX research, etc?
|
|
||
| * How can a rollout or rollback fail? Describe any impact to already running workloads. | ||
|
|
||
| **TODO (Maintainers):** Describe any specific failure modes during upgrades/downgrades. For instance, do existing VMs keep running if the host-level `kata-shim` or `containerd` drops connection? Are there state-migration issues with Trustee CRDs during a rollback? |
There was a problem hiding this comment.
Ok, so similar to the main project, Trustee is planning to deprecate its operator in favor of a Helm chart? I personally think this is great news, replacing a layer of operational complexity with a simpler solution. If you could provide more details or links to resources I'll update this accordingly. Thanks!
There was a problem hiding this comment.
It's not quite as straightforward as replacing the Trustee operator with the Helm chart, although this is one potential outcome. The Helm chart will be designed a bit differently from the Trustee-operator so that it is better suited to running Trustee inside of confidential containers itself. This may take the place of the Trustee operator, but since the operator is currently used in some sophisticated production environments, we're not going to rush on that. It's possible that we will end up with two options that have different applications, although this isn't ideal for maintenance. We will see what makes the most sense down the road.
Anyway there is some discussion about this here as well as a PR to add a helm chart to Trustee.
|
|
||
| * Describe how the project is following and implementing [secure software supply chain best practices](https://project.linuxfoundation.org/hubfs/CNCF\_SSCP\_v1.pdf) | ||
|
|
||
| The project has achieved SLSA Build Level 2 (see [blog](https://confidentialcontainers.org/blog/2025/02/17/confidential-containerscoco-and-supply-chain-levels-for-software-artifacts-slsa), automatically generating signed provenance in `in-toto` format via GitHub Actions for components like `kata-containers`, `guest-components`, and `cloud-api-adaptor`. |
There was a problem hiding this comment.
It's also worth noting that supply chain security itself is a use case that is in coco's orbit. Artifacts and reference values are very important in confidential computing. Ultimately, we would like to build confidential containers itself inside of confidential containers.
There was a problem hiding this comment.
Having reviewed some of the CI workflows and associated docs, I think it would be worthwhile (as a suggestion) for the project to post a followup to the blog from around this time last year (https://confidentialcontainers.org/blog/2025/02/17/confidential-containerscoco-and-supply-chain-levels-for-software-artifacts-slsa). It could cover what the project has done in this domain in the past year leading up to it's Incubation application. It could serve as a valuable case study (and "working example") for other projects around release processes by talking about how CoCo has hardened it's build and release pipelines, produces artifacts, SLSA, etc.
There's an initiative in TAG Operational Resilience focused on curating resources and examples of what's above that I'm sure would welcome connecting with the project (#1849, attn: @krol3).
There was a problem hiding this comment.
TODO: add link to https://confidentialcontainers.org/docs/use-cases/supply-chain/
|
|
||
| * Describe the project’s resource requirements, including CPU, Network and Memory. | ||
|
|
||
| Worker nodes require virtualization support and a recommended minimum of 8GB RAM and 4 CPUs to accommodate the hypervisor/Kata overhead. |
There was a problem hiding this comment.
Note that your worker node should also have confidential computing support unless you are using the dev/test runtime.
There was a problem hiding this comment.
+1, I'll also link to https://confidentialcontainers.org/docs/getting-started/prerequisites/hardware/
This PR contains the General Technical Review for the Confidential Containers project, following the template (general-technical-questions.md), covering Day 0 and Day 1 questions for Incubation:
"human-friendly" reading link:
https://github.com/halcyondude/toc/blob/my-coco-incubation-tech-review/projects/confidential-containers/tech-review/2026-02-24-gtr-coco-incubation.md
There are a few questions remaining (marked with
TODO) where input from project maintainers would be appreciated.Marking as a draft PR to solicit feedback from the TOC and Project Reviews Community.
Further resources:
Feedback heartily welcomed!
Related-to: #1504
Resolves: #2032