Skip to content

doc: Add Kubeflow tech review snapshot#2180

Open
christian-heusel wants to merge 1 commit into
cncf:mainfrom
christian-heusel:doc/kubeflow-add-gtr-snapshot
Open

doc: Add Kubeflow tech review snapshot#2180
christian-heusel wants to merge 1 commit into
cncf:mainfrom
christian-heusel:doc/kubeflow-add-gtr-snapshot

Conversation

@christian-heusel

Copy link
Copy Markdown

In the process of graduating Kubeflow as an official CNCF project it is common practice to add a snapshot of the general technical review document to the toc repo.

Ref: #1861
Ref: #2117
Fixes: kubeflow/community#964


cc @kfaseela @andreyvelich

Note: The document was added as a direct copy of https://github.com/kubeflow/community/blob/master/KUBEFLOW-GENERAL-TECHNICAL-REVIEW.md, if there are any modifications desired before the addition here just let me know!

@christian-heusel christian-heusel requested a review from a team as a code owner May 29, 2026 17:57
@github-actions github-actions Bot added needs-triage Indicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied) needs-kind Indicates an issue or PR that is missing an issue type or kind (a kind/foo label) labels May 29, 2026
@github-actions github-actions Bot added the needs-group Indicates an issue or PR that has not been assigned a group (toc or tag/foo label applied) label May 29, 2026
@christian-heusel

Copy link
Copy Markdown
Author

Anything left to do here from the Kubeflow side? 🤗

@kfaseela

kfaseela commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@christian-heusel : thanks for creating the PR - me and @brandtkeller are yet to review this, as currently busy with the adopter interviews. Will soon get to this and give you feedback :)

Comment thread projects/kubeflow/tech-review/2026-06-18.md
Comment thread projects/kubeflow/tech-review/2026-04-24.md Outdated
In the process of graduating Kubeflow as an official CNCF project it is
common practise to add a snapshot of the general technical review
document to the toc repo.

Ref: cncf#1861
Ref: cncf#2117
Fixes: kubeflow/community#964
Signed-off-by: Christian Heusel <christian@heusel.eu>
@christian-heusel christian-heusel force-pushed the doc/kubeflow-add-gtr-snapshot branch from 8fcc1fa to 0623298 Compare June 18, 2026 16:24
Comment thread projects/kubeflow/tech-review/2026-06-18.md
@joshgav joshgav added the review/tech Project Tech Review label Jun 25, 2026
@github-project-automation github-project-automation Bot moved this to New - Pending Review in Project Reviews Jun 25, 2026
@joshgav joshgav added sub/project-reviews TOC Project Review Subproject kind/review Item related to a governance, tech, or other review labels Jun 25, 2026
@joshgav joshgav moved this from New - Pending Review to In Progress in Project Reviews Jun 25, 2026

@brandtkeller brandtkeller left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Day 0 review - Day1/2 WIP


For more information, check ROADMAP for each Kubeflow Project:

- [Kubeflow Spark Operator](https://github.com/kubeflow/spark-operator/blob/master/ROADMAP.md)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking - worth noting that a number of these are dated by year and in those cases appear to be out-of-date.

- [Kubeflow Notebooks](https://github.com/kubeflow/notebooks/blob/main/ROADMAP.md)

Community-wide changes are proposed as [Kubeflow Enhancement proposals (KEPs)](https://github.com/kubeflow/community/tree/master/proposals)
in the `kubeflow/community` repository or in the [Kubeflow sub-projects KEPs](https://github.com/kubeflow/trainer/tree/master/docs/proposals).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 link


#### Explain which use cases have been identified as unsupported by the project

As Kubeflow is composed of multiple projects, each working group makes its own determinations as t

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - perhaps:

Suggested change
As Kubeflow is composed of multiple projects, each working group makes its own determinations as t
As Kubeflow is composed of multiple projects, each working group makes its own determinations as to

- The projects are deployed in any Kubernetes (each release will specify tested versions),
regardless of the underlying infrastructure, independently through Kubernetes manifests leveraging
Kustomize and/or Helm Charts. However, the project doesn’t provide an implementation to be deployed
on infrastructure besides Kubernetes. - We do not officially enforce a deployment method or distribution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this last bullet meant to be included in the surrounding bullet?


- Kubeflow doesn’t provide a GitOps implementation, however Kubeflow manifests can be integrated
into a GitOps solution. For example, Platform Engineers can create an ArgoCD Application (CRD)
to install and configure Kubeflow projects. by providing Kubeflow individual project manifests,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to install and configure Kubeflow projects. by providing Kubeflow individual project manifests,
to install and configure Kubeflow projects by providing Kubeflow individual project manifests,

Comment on lines +162 to +163
- [Kubeflow 2025 Survey](https://docs.google.com/forms/d/11cSe5vmGLrGekJISHBMfjVh_97WFGuhcvGnd0l5aNLg/edit#responses)
- [2025:UX designers supporting Model Registry conducted a series of user sessions to understand preferred interaction patterns (link)](https://docs.google.com/forms/d/11cSe5vmGLrGekJISHBMfjVh_97WFGuhcvGnd0l5aNLg/edit#responses)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking - These two links reference the survey itself and I cannot see the responses.


#### Describe the user experience (UX) and user interface (UI) of the project

Kubeflow user experience in each project is a collection of projects, the user experience for the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubeflow user experience in each project is a collection of projects

Is this accurate?


<div style="text-align: center;">
<img
src="https://raw.githubusercontent.com/kubeflow/sdk/main/docs/images/persona_diagram.svg"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404 link

end users and vendors. However, we aim to provide a strong foundation through reference architectures
similar things from which to build on.

#### Describe the project’s High Availability requirements

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this reference is accurate anymore - but regardless this may be an area of improvement for future iterations. I imagine there are more specific requirements to consider for each project and high availability than simply adjusting replicas?

@github-actions github-actions Bot removed needs-kind Indicates an issue or PR that is missing an issue type or kind (a kind/foo label) needs-group Indicates an issue or PR that has not been assigned a group (toc or tag/foo label applied) labels Jun 29, 2026

@brandtkeller brandtkeller left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More feedback - consider none of this blocking


#### How can this project be enabled or disabled in a live cluster? Please describe any downtime required of the control plane or nodes

Users can set the replica count to 0 in the Kubeflow projects deployment. Existing AI workloads

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems functional but maybe not a pragmatic choice for handling "disabling" - Does this offer any benefits over more declarative abstractions (why not remove on disable versus scaling to zero?)


#### Explain how upgrades and rollbacks were tested and how the upgrade->downgrade->upgrade path was tested

Currently, it’s being manually tested by users, but automated tests are work in progress.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References would be great.


#### Describe the increase in resource usage in any components as a result of enabling this project, to include CPU, Memory, Storage, Throughput

Resources requirements for Kubeflow projects [are set here](https://github.com/kubeflow/manifests/pull/3091#issuecomment-3016609243).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table highlights the max usages (as I understood it) whereas the comment also captured actual use. Maye be worth a distinction between "here is the minimum resource usage to expect being used versus the maximum for planning on both ends.


#### Describe how the project surfaces project resource requirements for adopters to monitor cloud and infrastructure costs, e.g. FinOps That must happen on the Kubernetes namespace level

Users are recommended to use third-party tools like Kubecost to measure cloud and infrastructure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an implicit or explicit recommendation (I did not see a docs reference). I think it's fine to defer this to other processes and metrics.


#### How can an operator determine if the project is in use by workloads

- Check the Pods in `kubeflow-profil`e labeled namespaces.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Check the Pods in `kubeflow-profil`e labeled namespaces.
- Check the Pods in `kubeflow-profile` labeled namespaces.

- Check the CRDs in user’s namespaces
- Check the Kubeflow Dashboard resources.

#### How can someone using this project know that it is working for his instance

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### How can someone using this project know that it is working for his instance
#### How can someone using this project know that it is working for their instance

@brandtkeller brandtkeller left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized the document is missing a section.

Each Kubeflow project handles failure modes differently beyond native Kubernetes fault tolerance.
Many of them are configured at the application level in user code.

### Security

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Security
### Compliance
* What steps does the project take to ensure that all third-party code and components have correct and complete attribution and license notices?
* Describe how the project ensures alignment with CNCF [recommendations](https://github.com/cncf/foundation/blob/main/policies-guidance/recommendations-for-attribution.md) for attribution notices.
<!--Note that each question describes a use case covered by the referenced policy document.-->
* How are notices managed for third-party code incorporated directly into the project's source files?
* How are notices retained for unmodified third-party components included within the project's repository?
* How are notices for all dependencies obtained at build time included in the project's distributed build artifacts (e.g. compiled binaries, container images)?
### Security

The compliance section is missing from this document.

angellk added a commit to angellk/cncf-toc that referenced this pull request Jun 30, 2026
DD for k8gb Sandbox → Incubation (cncf#1472).

Primary DD: @TheFoxAtWork and @ricardorocha
Adopter interviews: @angellk and @kgamanji

- Tech review: Satisfactory (Kashif Khan, TAG Infrastructure, 30-Jan-2026)
- Governance review: Satisfactory (joshgav, 21-Jan-2026)
- Security: Self-assessment complete, OpenSSF passing
- Adopter verification: 3 interviews across 3 orgs, 3 geographies
  (financial services x2, managed services x1). 220+ clusters combined.
- Must-fix resolved: API group renamed to k8gb.io/v1beta1 (cncf#2180)

Ref: cncf#1472
Signed-off-by: Karena Angell <karena.angell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/review Item related to a governance, tech, or other review needs-triage Indicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied) review/tech Project Tech Review sub/project-reviews TOC Project Review Subproject

Projects

Status: New
Status: In Progress
Status: No status
Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

[CNCF Graduation] Submit Kubeflow GTR snapshot to cncf/toc

6 participants