diff --git a/projects/karmada/tech-review/2026-04-21.md b/projects/karmada/tech-review/2026-04-21.md new file mode 100644 index 000000000..15e567b62 --- /dev/null +++ b/projects/karmada/tech-review/2026-04-21.md @@ -0,0 +1,845 @@ +# General Technical Review - Karmada / Incubation + +- **Project:** Karmada +- **Project Version:** v1.17.1 +- **Website:** https://karmada.io/ +- **Date Updated:** 2026-04-30 +- **Template Version:** v1.0 +- **Template Link:** [cncf/toc/general-technical-questions.md](https://github.com/cncf/toc/blob/42ee695dd281dd913400d242416c75010dc29444/toc_subprojects/project-reviews-subproject/general-technical-questions.md) +- **Description:** Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration + +## Overview +This document provides a comprehensive technical review of the Karmada project as part of its CNCF Graduation process. It is intended to give reviewers, contributors, and adopters a clear understanding of Karmada’s +architecture, design principles, security posture, installation and operational procedures, and its alignment with cloud native best practices. + +The review is structured to follow the typical lifecycle of a cloud native project, covering planning, installation, security, enablement, upgrade, and operational considerations. +It addresses both technical and organizational aspects, including user personas, use cases, compliance, and governance. The goal is to ensure transparency, highlight strengths, and identify any areas for improvement as Karmada progresses through the CNCF Graduation process. + +## Day 0 - Planning Phase + +### Scope + +#### Describe the roadmap process, how scope is determined for mid to long term features, as well as how the roadmap maps back to current contributions and maintainer ladder? + +The Karmada roadmap is guided by the project’s vision and strategic goals for open, multi-cloud, multi-cluster Kubernetes orchestration. The roadmap is collaboratively defined by the maintainers and the broader community, and is regularly updated to reflect upcoming releases and long-term objectives. See the [Karmada Roadmap](https://github.com/karmada-io/community/blob/main/ROADMAP.md) for details. + +- **Feature Gathering:** + - Feature requests and proposals are collected from: + - GitHub issues and proposals submitted by contributors + - Feedback from users during community meetings + - Input from the [Karmada Adopter Group](https://github.com/karmada-io/community/tree/main/adopter-group) + - Community-driven planning sessions +- **Scope Determination:** + - **Mid-term:** Features are prioritized based on immediate impact and feasibility. + - **Long-term:** Direction is set by technical leadership and alignment with the broader cloud native ecosystem. +- **Mapping to Contributions:** + - Issues are tagged with milestone labels and tracked on release project boards to ensure alignment with the roadmap. +- **Mapping to Maintainer Ladder:** + - The maintainer ladder is used to assign tasks and recognize contributors, driving progress and encouraging community participation. + +#### Describe the target persona or user(s) for the project? + +- **Enterprise IT Managers/Architects:** + - Manage multiple Kubernetes clusters across different data centers or hybrid cloud environments efficiently. + - Simplify cluster management to ensure high availability and optimize resource utilization. + - Design scalable architectures to accommodate business growth and increasing workloads. + - Ensure seamless integration and coordination of applications across multiple clusters. +- **Cloud Service Providers:** + - Enhance multi-cluster management capabilities for Kubernetes-as-a-Service offerings. + - Differentiate services by providing advanced cross-cluster features such as application deployment, cluster failover, and resource sharing. +- **Application Developers and DevOps Engineers:** + - Simplify the process of deploying applications across multiple Kubernetes clusters. + - Integrate Karmada into CI/CD pipelines for cross-cluster application deployment and management. + - Ensure consistent application deployment, perform rolling updates, and manage application lifecycles across multiple clusters. +- **AI Infrastructure Platform Teams**: + - Use Karmada to build and operate multi-cluster AI infrastructure for large-scale training and inference workloads. + - Support high availability, failover, and intelligent workload placement for inference services. + - Provide a unified multi-cluster control plane for deploying, scaling, and operating AI inference platforms reliably. + +#### Explain the primary use case for the project. What additional use cases are supported by the project? + +Primary use case: +- Turnkey automation for multi-cluster application management in multi-cloud and hybrid cloud scenarios +- Multi-policy multi-cluster scheduling +- AI/Big data workloads management across multiple clusters + +Additional use cases: +- Application/Cluster Failover +- Federated Resource Quota +- Resource status collection and aggregation +- Global search for resources and events +- Multi-cluster service discovery + +#### Explain which use cases have been identified as unsupported by the project. + +Karmada is designed to run in Kubernetes environments; non-Kubernetes environments are not supported. + +#### Describe the intended types of organizations who would benefit from adopting this project. (i.e. financial services, any software manufacturer, organizations providing platform engineering services)? + +The Karmada project is particularly beneficial for the following types of organizations: +- Cloud Service Providers: Those managing multi-cloud or hybrid-cloud environments, aiming to offer unified cluster management capabilities to their customers. +- Financial Services Organizations: Requiring high availability, disaster recovery, and compliance with regional data regulations across distributed clusters. +- Large Enterprises with Distributed Infrastructure: operating applications across multiple regions or cloud providers. +- Platform Engineering Teams: Responsible for building internal developer platforms that abstract underlying infrastructure complexity while providing consistent deployment patterns. +- Software-as-a-Service (SaaS) Providers: Needing to deploy and scale applications across multiple clusters to meet global customer demand and ensure fault tolerance. + +In summary, any organization operating Kubernetes clusters across multiple clouds, regions, or on-premises environments can benefit from Karmada's ability to centralize management, ensure high availability, and optimize resource utilization. + +#### Please describe any completed end user research and link to any reports. + +- [Survey](https://docs.google.com/document/d/1lOXHfpLiA0sg5dJr7ye9E0Qemd_YvOS-TACfsA3xQig/edit?usp=sharing) for installation and operation. + - Join the [mailing list](https://groups.google.com/g/karmada/c/Xe489XICWqs/m/7i-sPn8cAAAJ) for access. +- [[survey] Need your feedback on Karmada Concept ](https://github.com/karmada-io/community/issues/137): The Kubernetes SIG-Multicluster is proposing a standardized definition for the central cluster (currently termed "Host Cluster" in Karmada) to unify terminology across multi-cluster management projects like Karmada, OCM, clusternet, kubefleet, MCO, KubeAdmiral and so on. This initiative aims to improve cross-project interoperability and reduce ecosystem fragmentation. To evaluate the impact of adopting this standard and gather user feedback, ​​we are launching a community survey​​. + +### Usability + +#### How should the target personas interact with your project? + +Users determine the Karmada components to install based on their usage scenarios and customize relevant configurations, such as feature gates. During daily operation and maintenance, users interact with the control plane using `karmadactl` or `kubectl`. + +#### Describe the user experience (UX) and user interface (UI) of the project. + +- **CLI Tools: [Karmadactl](https://karmada.io/docs/reference/karmadactl/karmadactl-commands/karmadactl)** + - Reuses Kubernetes command patterns. + - Manages contexts for both the Karmada control plane and member clusters. +- **API Server & Custom Resources** + - Kubernetes API Compatibility: Zero-change upgrade from single-cluster to multi-cluster; seamless integration with existing Kubernetes toolchains. + - Custom Resource Definitions (CRDs): Extend Kubernetes with multi-cluster resources. +- **UI: [Karmada Dashboard](https://github.com/karmada-io/dashboard)** + - Cluster Management: Provides cluster access and an overview of cluster status. + - Resource Management: Manages the configuration of business resources. + - Policy Management: Manages Karmada policies. +- **Documentation and Community Support** + - Official documentation: Guides for installation, configuration, and use cases. + - Tutorials: Step-by-step examples for common scenarios. + - Community channels: Slack, GitHub issues, and forums for support. + - Security: Security recommendations. +- **Metrics:** + - Provides a rich set of [metrics](https://karmada.io/docs/next/reference/instrumentation/metrics) to characterize the running status of Karmada. + +#### Describe how this project integrates with other projects in a production environment. + +Karmada uses Kubernetes-native APIs, enabling seamless integration with existing Kubernetes toolchains in production environments. Karmada API server directly uses the implementation of kube-apiserver from Kubernetes, which is the reason why Karmada is naturally compatible with Kubernetes API. That makes integration with the Kubernetes ecosystem very simple for Karmada, such as allowing users to use kubectl to operate Karmada, integrating with ArgoCD, integrating with Flux and so on. + +### Design + +#### Explain the design principles and best practices the project is following. + +- Compatibility with Kubernetes: Karmada is designed to be compatible with Kubernetes native APIs and Custom Resource Definitions (CRDs). This allows existing Kubernetes-based systems to be migrated to a multi-cluster environment with little or no code refactoring, reducing the learning cost and migration difficulty for users. +- Scalability and Flexibility: provides a rich variety of definable policies along with comprehensive default policies to address the diverse requirements of users. +- Security and Multi-Tenancy + - Use Kubernetes namespace for tenant isolation with configured resources limits and RBAC; + - Resource Isolation Between Member Clusters. +- Community over Product or Company: Sustaining and growing our community takes priority over shipping code or sponsors' organizational goals. Each contributor participates in the project as an individual. + +#### Outline or link to the project’s architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable. + +Karmada architecture requirements: https://karmada.io/docs/core-concepts/architecture + +- Proof of Concept/Development/Test: user can run the [quick installation script](https://github.com/karmada-io/karmada/blob/master/hack/local-up-karmada.sh) to install Karmada in local environment for quick iteration. +- Production: user can deploy Karmada on k8s cluster using officially released helm chart. Alternatively, they can use the command "karmadactl init", and they can also use the karmada-operator to manage the lifecycle of Karmada. + +#### Define any specific service dependencies the project relies on in the cluster. + +- Prometheus: collects metrics from configured targets to monitor Karmada control plane. + +#### Describe how the project implements Identity and Access Management. + +Karmada implements Identity and Access Management (IAM) through Kubernetes-native mechanisms. + +- **Kubernetes API Server Authentication:** + - Uses standard Kubernetes authentication plugins (e.g., bearer tokens, X509 certificates, OIDC, or webhook). + - Authenticates users and service accounts accessing the Karmada API server. +- **Cross-Cluster Authentication:** + - Supports impersonation to act on behalf of users or service accounts across clusters. + +#### Describe how the project has addressed sovereignty. + +Karmada is designed to be cloud and platform agnostic, running on any Kubernetes cluster (on-premises or any cloud provider). It avoids vendor lock-in by using open standards and Kubernetes-native APIs. All data and model artifacts remain within the adopter's control, and integrations with external storage or identity providers are optional and configurable by the user. + +#### Describe any compliance requirements addressed by the project. + +- Open Source License Compliance: Karmada project is released under the Apache License 2.0. All source code contributions to the project must comply with this license. + +#### Describe the project’s High Availability requirements. + +- Controller HA: Karmada controller can be deployed with multiple replicas across different nodes and availability zones. It uses leader election to ensure only one instance is active at a time, while others are on standby. This ensures that if one instance fails, another can take over without downtime. +- Karmada Instance HA: a highly available managed Karmada control plane can be deployed across multiple management clusters that can span various data centers, thus fulfilling disaster recovery requirements. +- Etcd HA: Karmada supports the use of external etcd, enabling the creation of a DR-compliant etcd cluster. + +#### Describe the project’s resource requirements, including CPU, Network and Memory. + +Resource requirements depend on the number of clusters managed by Karmada, as well as the scale of a single cluster. For large-scale clusters, resource requests and limits should be adjusted accordingly. + +#### Describe the project’s storage requirements, including its use of ephemeral and/or persistent storage. + +Karmada control plane requires persistent storage for etcd, which stores all cluster configuration data, resource definitions, and state information. + +#### Please outline the project’s API Design: +##### Describe the project’s API topology and conventions + +- Karmada exposes a Kubernetes-native declarative API using Custom Resource and Kubernetes API Aggregation Layer, mainly include: + - PropagationPolicy/ClusterPropagationPolicy (Custom Resource): represents the policy that propagates a group of resources to one or more clusters. + - Cluster (Kubernetes API Aggregation Layer): represents the desired state and status of a member cluster. +- The API follows Kubernetes conventions for resource specification, status, and metadata, supporting standard CRUD operations via kubectl and the Kubernetes API server. +- Karmada supports versioned APIs (e.g., work.karmada.io/v1alpha2) and uses OpenAPI schema validation for resource definitions. +- For more details, see the [Karmada API Reference](https://karmada.io/docs/category/karmada-api). + +##### Describe the project defaults + +Karmada provides secure, production-ready defaults out-of-the-box to simplify multi-cluster Kubernetes management while maintaining security best practices. For example: + +- **Security Considerations** + - To avoid the use of insecure algorithms such as 3DES during the communication process, the TLS configuration is set to `--tls-min-version=VersionTLS13` during the installation of Karmada-related components. +- **Policy-related** + - The default replicaSchedulingType of the PropagationPolicy is "Duplicated", which will schedule the workload to all member clusters. +- **Installation-related** + * Karmada components and related resources are deployed to the karmada-system namespace by default. +- **others** + - Karmada uses karmada-scheduler to schedule workloads across multi-clusters by default. + +##### Outline any additional configurations from default to make reasonable use of the project + +- In large-scale environments, administrators can adjust `cluster-api-qps` and `cluster-api-burst` for karmada-controller-manager, karmada-agent and karmada-metrics-adapter. +- In large-scale environments, administrators can adjust upward the rate limit parameters `kube-api-qps` and `kube-api-burst` for each component to access karmada-apiserver, and adjust `cluster-api-qps` and `cluster-api-burst` for scheduler-estimator to access member cluster apiserver. +- Configure resource requests and limits for Karmada components based on cluster scale - defaults may be insufficient for production workloads. +- See the [Karmada configuration](https://karmada.io/docs/administrator/configuration/configure-controllers) for more configuration recommendations. + +##### Describe any new or changed API types and calls - including to cloud providers - that will result from this project being enabled and used + +- The `PropagationPolicy/ClusterPropagationPolicy` CRD represents the policy that propagates a group of resources to one or more clusters. +- The `Cluster` API represents the desired state and status of a member cluster. +- The `ResourceBinding/ClusterResourceBinding` CRD represents a binding of a Kubernetes resource. +- The `Work` CRD defines a list of resources to be deployed on the member cluster. +- The `OverridePolicy/ClusterOverridePolicy` CRD represents the cluster-wide policy that overrides a group of resources to one or more clusters. +- The `FederatedResourceQuota` CRD sets aggregate quota restrictions enforced per namespace across all clusters. +- The `WorkloadRebalancer` CRD represents the desired behavior and status of a job which can enforce a resource rebalance. +- The `FederatedHPA/CronFederatedHPA` CRD can scale any resource implementing the scale subresource +- The `ResourceInterpreterCustomization/ResourceInterpreterWebhookConfiguration` CRD takes the responsibility to tell Karmada the details of the resource object, especially for custom resources. +- The `MultiClusterIngress/MultiClusterService` CRD is mainly to provide the application with the service discovery capability across Kubernetes clusters. +- The `Remedy` CRD represents the cluster-level management strategies based on cluster conditions. +- The `ResourceRegistry` CRD represents the configuration of the cache scope, mainly describes which resources in which clusters should be cached. +- Enabling Karmada does not change existing Kubernetes APIs, but adds these new CRDs and related endpoints. + +##### Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server + +- All Karmada CRDs are implemented using Kubernetes Custom Resource Definitions and are fully compatible with the Kubernetes API server. +- These CRDs follow Kubernetes API conventions for resource creation, update, deletion, and status reporting, and can be managed using standard Kubernetes tools (kubectl, client libraries, etc.). +- Karmada CRDs are versioned and validated using OpenAPI schemas, ensuring compatibility with Kubernetes admission controllers and API server validation mechanisms. +- No changes are made to existing Kubernetes APIs; Karmada only extends the API surface by adding new resource types. + +##### Describe versioning of any new or changed APIs, including how breaking changes are handled + +- Karmada CRDs use Kubernetes API versioning conventions, such as v1alpha1 and v1alpha2 in their API group. +- New features and changes are introduced in alpha or beta versions before being promoted to stable (v1). +- Conversion webhooks are provided to support seamless migration between CRD versions when breaking changes are introduced. +- Deprecated fields and versions are announced in release notes and documentation, and are supported for a deprecation period before removal. +- Backward compatibility is maintained for stable APIs, and breaking changes are only introduced in new major or beta versions, following Kubernetes best practices. + +##### Describe the project’s release processes, including major, minor and patch releases. + +- Karmada follows a structured release process for major, minor, and patch releases. +- Each release includes versioning, release notes, and changelogs to communicate new features, bug fixes, and deprecations. +- Release candidates are published for community testing before final releases. +- The process includes automated CI/CD checks, validation, and artifact publishing to container registries and GitHub artifacts. +- Deprecated features and APIs are maintained for a deprecation period before removal. +- See [Karmada Releases](https://karmada.io/docs/releases) for detailed release processes. + +### Installation + +#### Describe how the project is installed and initialized, e.g. a minimal install with a few lines of code or does it require more complex integration and configuration? + +- For getting started, Karmada can be installed via a simple command `hack/local-up-karmada.sh`. +- For production uses, please refer to the [installation guide](https://karmada.io/docs/installation/). + - For example, Karmada can be installed via Helm. + +#### How does an adopter test and validate the installation? + +- Each of the installation guides mentioned above includes validation steps. +- The installation can be verified by the running status of the karmada component. +```bash +$ kubectl get deployments -n karmada-system +NAME READY UP-TO-DATE AVAILABLE AGE +karmada-aggregated-apiserver 1/1 1 1 102s +karmada-apiserver 1/1 1 1 2m34s +karmada-controller-manager 1/1 1 1 116s +karmada-scheduler 1/1 1 1 119s +karmada-webhook 1/1 1 1 113s +kube-controller-manager 1/1 1 1 2m3s + +$ kubectl get statefulsets -n karmada-system +NAME READY AGE +etcd 1/1 28m +``` + +### Security + +#### Please provide a link to the project’s cloud native [security self assessment](https://tag-security.cncf.io/community/assessments/). + +[Karmada security self assessment](https://github.com/karmada-io/community/blob/main/security-team/assessments/self-assessment.md) + +#### Please review the [Cloud Native Security Tenets](https://github.com/cncf/contribute-site/blob/main/docs/community/tags/security-and-compliance/publications/secure-defaults-cloud-native-8.md) from TAG Security. +##### How are you satisfying the tenets of cloud native security projects? + +- Karmada is built with security as a foundational concern. By leveraging Kubernetes-native constructs such as Custom Resource Definitions (CRDs), Role-Based Access Control (RBAC), and Network Policies, Karmada integrates seamlessly into secure Kubernetes environments. +- Secure defaults are enabled by default, but users can override them if needed. +- Insecure options require explicit configuration. + +##### How do you recommend users alter security defaults in order to "loosen" the security of the project? Please link to any documentation the project has written concerning these use cases. + +- Karmada Security Considerations: https://karmada.io/docs/administrator/security/security-considerations +- By default, the gRPC connection between the karmada-scheduler/karmada-descheduler component and the karmada-scheduler-estimator component uses mutual authentication. + - The karmada-scheduler/karmada-descheduler component can disable the verification of the karmada-scheduler-estimator's certificate by setting `insecure-skip-estimator-verify` to true. For example: + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: karmada-scheduler + namespace: karmada-system + spec: + template: + spec: + containers: + - name: karmada-scheduler + command: + - /bin/karmada-scheduler + - --enable-scheduler-estimator=true + - --insecure-skip-estimator-verify=true + ``` + + - The karmada-scheduler-estimator component can disable the verification of the gRPC client's certificate by setting insecure-skip-grpc-client-verify to true. For example: + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: karmada-scheduler-estimator-member1 + namespace: karmada-system + spec: + template: + spec: + containers: + - name: karmada-scheduler-estimator + command: + - /bin/karmada-scheduler-estimator + - --insecure-skip-grpc-client-verify=true + ``` + +- By default, Karmada components set the TLS configuration option `--tls-min-version` for client-to-server communication to `VersionTLS13` to avoid the use of insecure algorithms such as 3DES during the communication process. This is documented on [Karmada website](https://karmada.io/docs/administrator/security/security-considerations#tls-configuration). To loosen this security setting, users can configure the TLS configuration option `--tls-min-version` to other values (e.g., `VersionTLS12, VersionTLS11`). For example, + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: karmada-aggregated-apiserver + namespace: karmada-system + spec: + template: + spec: + containers: + - name: karmada-aggregated-apiserver + command: + - /bin/karmada-aggregated-apiserver + - --kubeconfig=/etc/karmada/config/karmada.config + - --tls-min-version=VersionTLS12 + ``` + +- By default, Karmada’s security context restricts containers from running in privileged mode and denies a process from obtaining more privileges than its parent process. This can be achieved by modifying securityContext of Karmada components. For example: + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: karmada-controller-manager + namespace: karmada-system + spec: + template: + spec: + containers: + - name: karmada-controller-manager + securityContext: + allowPrivilegeEscalation: true + privileged: true + ``` + +#### Security Hygiene +##### Please describe the frameworks, practices and procedures the project uses to maintain the basic health and security of the project. + +- Robust CI/CD Pipeline: Unit tests and end-to-end tests are integrated into the CI/CD pipeline to catch bugs early in the pull request (PR) stage. +- Vulnerability Scanning: Weekly scheduled image scanning using trivy. +- Static Code Analysis: Code linting and static analysis tools are employed to enforce code quality and detect potential issues before they reach production. +- Dependency Management: The project uses lock files (e.g., go.mod) for reproducibility. Automated generation of Software Bill of Materials (SBOMs) is now integrated into the CI/CD pipeline for all container images and components, supporting supply chain security and transparency. +- Secure Defaults: Default Kubernetes manifests and Helm charts are configured with security best practices, such as running containers as non-privileged. +- Code Review Process: All code changes require peer review and explicit approval before being merged into the main branch. +- Transparent Governance: The project practices open governance with regular public meetings and community channels for raising and discussing security or health concerns. +- Structured Releases: Releases are managed with clear versioning and changelogs, ensuring traceability and transparency for all changes. + +##### Describe how the project has evaluated which features will be a security risk to users if they are not maintained by the project? + +Karmada has conducted a comprehensive evaluation of security-critical features that require ongoing maintenance to ensure user safety: + +**Core Security Features Requiring Continuous Maintenance:** +- **Authentication and Authorization**: RBAC policies, certificate management, and cross-cluster authentication mechanisms that, if unmaintained, could lead to privilege escalation or unauthorized access +- **TLS/Certificate Management**: Certificate rotation, validation, and secure communication channels between control plane and member clusters +- **Admission Controllers and Webhooks**: Security validation logic that prevents malicious or misconfigured resources from being deployed +- **Network Security**: Multi-cluster service discovery and network policies that ensure secure inter-cluster communication +- **Resource Isolation**: Tenant isolation mechanisms and resource quota enforcement across clusters + +**Risk Assessment Process:** +- **Dependency Analysis**: Continuous monitoring of third-party dependencies for security updates +- **Community Feedback**: Security issues reported through the security team and vulnerability disclosure process help identify maintenance-critical features +- **Compliance Requirements**: Features required for regulatory compliance (e.g., audit logging, access controls) are prioritized for ongoing maintenance + +**Mitigation Strategies:** +- **Core Maintainer Commitment**: Critical security features are assigned to core maintainers with long-term project commitment +- **Documentation and Knowledge Transfer**: Comprehensive documentation ensures security features can be maintained by multiple contributors +- **Deprecation Policy**: Clear communication and migration paths when security features need to be deprecated or replaced + +#### Cloud Native Threat Modeling +##### Explain the least minimal privileges required by the project and reasons for additional privileges. + +[Karmada Component Permissions docs](https://karmada.io/docs/administrator/security/component-permission) provides a detailed explanation of the resources each Karmada component needs to access and the reasons for these accesses. + +##### Describe how the project is handling certificate rotation and mitigates any issues with certificates. + +- When installing Karmada, users can either customize certificates or use the certificates automatically generated by Karmada. +- When generating certificates through Karmada, users can configure the validity period of the certificates. +- These certificates are stored in Kubernetes ConfigMaps or Secrets, and are manually provisioned and rotated by cluster administrators. + +##### Describe how the project is following and implementing [secure software supply chain best practices](https://project.linuxfoundation.org/hubfs/CNCF_SSCP_v1.pdf) + +- Automated Testing and CI/CD Integration: Karmada integrates unit tests and end-to-end tests within its CI/CD pipeline. This ensures that any changes are validated early, reducing the risk of introducing vulnerabilities. +- Vulnerability Scanning: The project employs vulnerability scanning tools such as trivy, dependabot and gosec to identify and address known security issues in dependencies. +- Use of Lock Files and SBOM Generation: Karmada utilizes dependency lock files (e.g., go.mod) to ensure reproducibility. Automated generation of Software Bill of Materials (SBOMs) is now integrated into the CI/CD pipeline for all container images and components. This enhances transparency of software components and supports supply chain security. +- DCO Check: Committers are required to sign and comply with the Developer Certificate of Origin (DCO) to affirm the legitimacy and authorship of their contributions. +- Branch Protection: The project enforces branch protection rules to prevent unauthorized changes, enforce status checks, require pull request reviews, and control who can push to protected branches. +- License compliance: Automated license compliance checks are now integrated into the CI/CD pipeline. All dependencies are scanned and validated for license compatibility as part of every pull request and release build, ensuring transparency and legal compliance. +- Peer Review: Commits and builds are validated through peer-reviewed pull request workflows, requiring approval before merge. + +## Day 1 - Installation and Deployment Phase + +### Project Installation and Configuration + +#### Describe what project installation and configuration look like. + +- Please refer to the installation section in Day 0 section. +- During the installation process, Karmada provides multiple configurable options to meet users' diverse usage scenarios. Taking `karmada init` as an example, its configuration file is as follows: + ```yaml + apiVersion: config.karmada.io/v1alpha1 + kind: KarmadaInitConfig + spec: + hostCluster: + kubeconfig: "${KUBECONFIG_PATH}/${HOST_CLUSTER_NAME}.config" + components: + karmadaControllerManager: + repository: "${REGISTRY}/karmada-controller-manager" + tag: "${VERSION}" + karmadaScheduler: + repository: "${REGISTRY}/karmada-scheduler" + tag: "${VERSION}" + karmadaWebhook: + repository: "${REGISTRY}/karmada-webhook" + tag: "${VERSION}" + karmadaAggregatedAPIServer: + repository: "${REGISTRY}/karmada-aggregated-apiserver" + tag: "${VERSION}" + karmadaDataPath: "${HOME}/karmada" + karmadaPkiPath: "${HOME}/karmada/pki" + karmadaCrds: "./crds.tar.gz" + ``` + +### Project Enablement and Rollback + +#### How can this project be enabled or disabled in a live cluster? Please describe any downtime required of the control plane or nodes. + +To enable or disable Karmada in a live cluster, follow these steps: + +- Enabling Karmada in a live Cluster + - Install Karmada Components + - Register Member Clusters: Use `karmadactl join(push mode)` or `karmadactl register(pull mode)` to register existing clusters as members of Karmada. + +- Disabling Karmada in a live Cluster + - Unregister member cluster: Remove member clusters from Karmada using `karmadactl unjoin(push mode)` or `karmadactl unregister(pull mode)`. + - Delete Karmada Control Plane: Use kubectl to delete the Karmada namespace and all associated resources. + +#### Describe how enabling the project changes any default behavior of the cluster or running workloads. + +The default behavior and existing workloads will not be impacted. Karmada operates as an overlay control plane that manages resources across multiple clusters without modifying the underlying Kubernetes clusters' behavior. + +**Specific examples of non-impact:** + +- **Existing Workloads**: Applications already running in member clusters continue to operate normally. Karmada does not interfere with existing pods, services, or other resources unless explicitly managed through Karmada policies. + +#### Describe how the project tests enablement and disablement. + +The relevant tests are integrated into the CI/CD Pipeline. + +- enablement by helm: https://github.com/karmada-io/karmada/blob/release-1.14/.github/workflows/installation-chart.yaml +- enablement by karmadactl init: https://github.com/karmada-io/karmada/blob/release-1.14/.github/workflows/installation-cli.yaml +- enablement by operator: https://github.com/karmada-io/karmada/blob/release-1.14/.github/workflows/installation-operator.yaml +- disablement test is integrated into e2e test: https://github.com/karmada-io/karmada/blob/release-1.14/.github/workflows/ci.yml + +These tests are triggered both upon submitting and merging PRs, ensuring that the system behaves as expected when enabling or disabling Karmada. + +#### How does the project clean up any resources created, including CRDs? + +- **Member Cluster Unregistration** + - Use the `karmadactl unjoin/unregister` command to remove the member cluster from Karmada. + - This command deletes both the member cluster's resource records in the Karmada control plane and the Karmada-specific agent components (e.g., `karmada-agent`) running in the member cluster. +- **Karmada Control Plane Removal** (Based on Installation Method) + - Karmada-Operator: Delete the Karmada Custom Resource (CR). The operator automatically cleans up all related resources (e.g., Deployments, Services). + - karmadactl init: Use karmadactl deinit to remove resources created by the initialization process. + - Helm: Execute helm uninstall to purge all Helm-managed resources. + +### Rollout, Upgrade and Rollback Planning + +#### How does the project intend to provide and maintain compatibility with infrastructure and orchestration management tools like Kubernetes and with what frequency? + +Karmada has a defined compatibility matrix: [https://github.com/karmada-io/karmada#kubernetes-compatibility](https://github.com/karmada-io/karmada#kubernetes-compatibility) + +In addition, Karmada checks the compatibility between the maintained versions and Kubernetes weekly. + +#### Describe how the project handles rollback procedures. + +Karmada supports multiple installation methods. Among them, `Helm` and `karmada-operator` support both rollout and rollback. + +- **Helm**: rollback can be performed by using the `helm rollback` command to restore a previous release revision. +- **karmada-operator**: rollout is managed through updates to the `Karmada` custom resource (CR) spec, and rollback can be achieved by reverting the `Karmada` CR spec to its previous desired state. + +#### How can a rollout or rollback fail? Describe any impact to already running workloads. + +A rollout or rollback can fail due to various technical, operational, or environmental factors. For example: + +- Component options are deprecated. +- Incompatible with Kubernetes. + +Traffic will not be switched to the new unless the newer revision is ready to accept traffic. Hence, the already running workloads will not be affected in any case if rollout/rollback fails. + +#### Describe any specific metrics that should inform a rollback. + +A rollback is usually considered when the actual runtime state does not match the expected state. For Karmada, some signals that may inform a rollback decision include: + +- the readiness status of Karmada components remaining `False` for a sustained period; +- for Helm-based installations, an abnormal or failed release status reported by `helm status`; +- for `karmada-operator`-based installations, the Ready condition of the `Karmada` custom resource (CR) remaining `False`. + +#### Explain how upgrades and rollbacks were tested and how the upgrade->downgrade->upgrade path was tested. + +Upgrade->downgrade->upgrade testing is not covered. + +For rollback tests: +- If installed via Helm, use `helm rollback` to perform the rollback test. +- If installed via karmada-operator, update the Karmada CR (Custom Resource) to a previous version to conduct the rollback test. Refer to https://github.com/karmada-io/karmada/blob/master/operator/README.md#upgrade-a-karmada-instance. + +For upgrade tests: +- If installed via Helm, use `helm upgrade` to perform the upgrade test. +- If installed via karmada-operator, update the Karmada CR (Custom Resource) to conduct the upgrade test. Refer to https://github.com/karmada-io/karmada/blob/master/operator/README.md#upgrade-a-karmada-instance. + +#### Explain how the project informs users of deprecations and removals of features and APIs. + +- All API changes are backward compatible which are announced in release notes and in the official website. +- release notes: [https://github.com/karmada-io/karmada/tree/master/docs/CHANGELOG](https://github.com/karmada-io/karmada/tree/master/docs/CHANGELOG) +- upgrade instruction: [https://karmada.io/docs/administrator/upgrading/](https://karmada.io/docs/administrator/upgrading/) + +#### Explain how the project permits utilization of alpha and beta capabilities as part of a rollout. + +Karmada’s feature lifecycle uses the Kubernetes model. Features are initially implemented using alpha version strings in the API and are feature gated. Features then graduate to beta and the feature gate is removed as part of the release lifecycle. + +In addition, we provide conversion webhooks when CRD versions are updated. For example, webhook `resourcebindings.work.karmada.io` is used to update binding to v1alpha2. + +## Day 2 - Day-to-Day Operations Phase + +### Scalability/Reliability + +#### Describe how the project increases the size or count of existing API objects. + +Overall, the total object count grows approximately linearly with the number of workloads and target clusters. +When a user deploys a workload together with a PropagationPolicy in Karmada, the control plane creates a corresponding ResourceBinding object. It then creates one Work object for each target cluster selected for propagation. As a result, object growth is primarily fan-out based: a single source workload may produce one ResourceBinding and multiple per-cluster Work objects. + +#### Describe how the project defines Service Level Objectives (SLOs) and Service Level Indicators (SLIs). + +Scalability and performance are key characteristics of multi-cluster federation. Karmada provides a large set of [measurable metrics](https://karmada.io/docs/next/reference/instrumentation/metrics). Based on these metrics, Karmada community defines the following [SLIs and SLOs](https://karmada.io/blog/2022/10/26/test-report#slisslos) to evaluate the service quality of multi-cluster federation. + +1. API Call Latency + + +| Status | SLI | SLO | +|---------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| Official | P99 latency of Mutating API calls (including POST, PUT, DELETE, PATCH) to a single resource object in the last 5 minutes | P99 \<\= 1s | +| Official | P99 latency of non-streaming read-only API calls (including GET and LIST) in the last 5 minutes | (a)Scope=resource, P99 \<\= 1s, (b)Scope=namespace or Scope=cluster, P99 \<\= 30s | + +2. Resource Distribution Latency + + +| Status | SLI | SLO | +|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------| +| Official | After the user submits the resource template and delivers the policy on the federated control plane, the P99 delay from the time when the resource is created on the member cluster, regardless of the network fluctuation between the control plane and the member cluster | P99 \<\= 2s | + +3. Cluster Registration Latency + + +| Status | SLI | SLO | +|--------|----------------------------------------------------------------------------------------------------------------------------------------|-----| +| WIP | P99 delay from when the cluster is registered in the federation control plane to when the status can be collected by the control plane | TBD | + +4. Resource Usage + + +| Status | SLI | SLO | +|--------|---------------------------------------------------------------------------------------------------------------------------------------------------|-----| +| WIP | The amount of necessary resource usage for the cluster federation to maintain its normal operation after registering a certain number of clusters | TBD | + + +#### Describe any operations that will increase in time covered by existing SLIs/SLOs. + +The operations most likely to see increased latency under existing SLIs/SLOs are scheduling, propagation and status aggregation. In general, these operations become slower as the number of workloads, target clusters, and derived per-cluster objects increases. + +#### Describe the increase in resource usage in any components as a result of enabling this project, to include CPU, Memory, Storage, Throughput. + +Enabling the project increases resource usage primarily in the control plane components. +- CPU: CPU usage mainly increases in `karmada-scheduler`, `karmada-scheduler-estimator` and `karmada-controller-manager` due to additional reconciliation, placement evaluation, propagation, and status aggregation work. +- Memory: Memory usage may increase because controllers and caches must hold more project-managed objects and more per-cluster state. For example, `karmada-search` is memory-intensive, as it mirrors the full state of registered resources from every member cluster. As a result, its memory cost grows multiplicatively with the number of clusters, resource types, and object counts. +- Storage: Storage usage may increase primarily in `karmada-etcd`, because the project creates additional objects. +- Throughput: The impact on the existing system is generally limited. The throughput increase mainly occurs within Karmada’s own control plane, where components such as `karmada-controller-manager` consume additional API read/write capacity from `karmada-apiserver`. + +Overall, the increase is expected to be driven mainly by the number of workloads, policies, and target clusters, and to grow approximately linearly with those scaling dimensions. + +#### Describe which conditions enabling / using this project would result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.) + +When using `hack/local-up-karmada.sh` to install Karmada, the issue of "too many open files" may sometimes occur on nodes running `karmada-webhook`. This scenario is covered in [Karmada troubleshooting documentation](https://karmada.io/docs/next/troubleshooting/trouble-shooting#karmada-webhook-keeps-on-crashing-due-to-too-many-open-files). Administrators can resolve this by using the following configurations: + +```bash +sysctl -w fs.inotify.max_user_watches=100000 +sysctl -w fs.inotify.max_user_instances=100000 +``` + +In addition, most Karmada components are CPU-intensive, so platform administrators need to ensure that the nodes have sufficient capacity to run Karmada components. + +#### Describe the load testing that has been performed on the project and the results. + +Karmada has performed both large-scale load testing and targeted performance testing. + +- [Test Report on Karmada's Support for 100 Large-Scale Clusters](https://karmada.io/blog/2022/10/26/test-report) + Tested Karmada on managing 100 Kubernetes clusters (each cluster containing 5k nodes and 20k pods) at the same time. The test results show that Karmada can stably support 100 large-scale clusters with 500,000 nodes connected at the same time, running more than 2 million pods. +- In addition to this large-scale validation, the Karmada community also performs targeted performance tests when making performance optimizations. Examples include: + - [optimize the mechanism of create or update dependencies-distribute resourcebinding](https://github.com/karmada-io/karmada/pull/7153) + - [Overview of performance improvements for v1.15](https://github.com/karmada-io/karmada/issues/6516) +- Karmada also provides [CI-integrated](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/workflows/ci-performance-compare.yaml) and [script-based](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/hack/performance/README.md) performance testing capabilities, making it easier to run repeatable load tests in both automated and local environments. + +#### Describe the recommended limits of users, requests, system resources, etc. and how they were obtained. + +Based on published [large-scale testing](https://karmada.io/blog/2022/10/26/test-report), Karmada is known to work at the scale of around 100 managed clusters. Scaling beyond that point may require increasing the resource requests/limits and replica counts of key Karmada control-plane components, as well as tuning supporting dependencies such as the Kubernetes API server and etcd. + +#### Describe which resilience pattern the project uses and how, including the circuit breaker pattern. + +Karmada uses the Kubernetes resilience pattern, which includes the following key elements: + +- **Control-plane HA:** Key control-plane components such as `karmada-controller-manager` can be deployed with multiple replicas and use leader election. If the active replica fails, another replica can take over and continue reconciliation. +- **[ETCD HA:](https://karmada.io/docs/next/installation/ha-installation/#options-for-highly-available-topology)** Karmada relies on etcd for control-plane state durability. Depending on the deployment model, adopters can use either Stacked etcd topology or External etcd topology to state durability with etcd. +- **[Health-based scheduling isolation and failover:](https://karmada.io/docs/next/userguide/failover/failover-analysis)** Karmada uses taints on the Cluster object to control application scheduling and execution. The system automatically adds `NoSchedule` taints (e.g., `cluster.karmada.io/not-ready`, `cluster.karmada.io/unreachable`) based on cluster health conditions, preventing new workloads from being placed on unhealthy clusters. For more advanced scenarios, the `ClusterTaintPolicy` API allows administrators to define custom taint rules based on arbitrary cluster conditions. When the Failover feature gate is explicitly enabled, `NoExecute` taints can trigger eviction of workloads that do not tolerate the taint, enabling automatic failover to healthy clusters. +- **[Cluster-level remediation policies:](https://karmada.io/docs/next/reference/karmada-api/remedy-resources/remedy-v1alpha1/)** The `Remedy` CRD defines cluster-level management strategies based on cluster conditions and can be used together with failure handling and recovery workflows. + +### Observability Requirements + +#### Describe the signals the project is using or producing, including logs, metrics, profiles and traces. Please include supported formats, recommended configurations and data storage. + +Karmada produces several standard observability signals, including metrics, events, logs, and runtime profiles. + +- **Metrics**: Karmada exposes [Prometheus-compatible metrics](https://karmada.io/docs/next/reference/instrumentation/metrics) to report runtime status and component behavior. These metrics are intended to be scraped by Prometheus or another compatible monitoring backend. +- **Events**: Karmada uses [Kubernetes Events](https://karmada.io/docs/next/reference/instrumentation/event) to provide real-time visibility into operational state changes, errors, and important milestones in resource propagation and cluster management workflows. These events are stored through the standard Kubernetes event pipeline, typically in the API server/etcd according to the cluster’s event retention configuration. +- **Profiles**: Karmada components can [enable profiling](https://karmada.io/docs/next/developers/profiling-karmada/#enable-profiling) to collect runtime diagnostic data for debugging and performance analysis. +- **Logs**: Karmada components [emit logs and support both `json` and `text` formats](https://karmada.io/blog/2025/09/05/karmada-v1.15/karmada-v1.15/#structured-logging). The log format can be configured using the `--logging-format` flag. In production environments, structured JSON logs are generally recommended for easier collection and analysis by centralized logging systems. + +#### Describe how the project captures audit logging. + +Platform admins can leverage [Kubernetes audit](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/) for Karmada projects. + +#### Describe any dashboards the project uses or implements as well as any dashboard requirements. + +Karmada provides and uses two main dashboard options: +- **Karmada Dashboard**: [Karmada Dashboard](https://github.com/karmada-io/dashboard) is a general-purpose, web-based control panel for Karmada, a multi-cluster management project. It provides a centralized UI for managing and observing Karmada resources and clusters. +- **Grafana Dashboards**: Karmada also provides [production-ready Grafana dashboards](https://karmada.io/docs/next/administrator/monitoring/karmada-observability/#grafana-dashboards) that can be downloaded and imported directly. These dashboards offer comprehensive observability coverage for Karmada components and operations. + +**Requirements**: +Karmada Dashboard requires deployment of the separate dashboard project. The Grafana dashboards require a monitoring stack capable of collecting Karmada metrics, typically Prometheus for scraping and storage, and Grafana for visualization. + +#### Describe how the project surfaces project resource requirements for adopters to monitor cloud and infrastructure costs, e.g. FinOps. + +Karmada does not currently implement a dedicated FinOps capability. Instead, it exposes standard operational and resource-related signals that adopters can use to monitor infrastructure footprint and estimate cloud costs. These include: +- [configurable CPU and memory requests/limits](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/operator/pkg/apis/operator/v1alpha1/type.go#L679-L682) for Karmada components; +- [Prometheus-compatible metrics](https://karmada.io/docs/next/reference/instrumentation/metrics), logs, and [Grafana dashboards](https://karmada.io/docs/next/administrator/monitoring/karmada-observability/#grafana-dashboards). + +In practice, adopters typically combine these signals with external monitoring and cost-management systems to perform FinOps analysis. For example, users could use per-member-cluster allocation guages - such as [cluster_cpu_allocated_number](https://karmada.io/docs/next/administrator/monitoring/karmada-observability#cluster_cpu_allocated_number) to correlate resource demands with member cluster spend. + +#### Which parameters is the project covering to ensure the health of the application/service and its workloads? + +Karmada uses standard Kubernetes health mechanisms such as [liveness and readiness probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) for core components, for example [karmada-controller-manager](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/pkg/karmadactl/cmdinit/kubernetes/deployments.go#L484-L498). + +#### How can an operator determine if the project is in use by workloads? + +Operators can determine whether workloads are using Karmada in several ways: +- **The most direct method**: resources propagated by Karmada to member clusters carry the label `karmada.io/managed: "true"`. Operators can use this label to identify workloads managed by Karmada: +```bash +$ export KUBECONFIG=path/to/karmadaconfigfile +$ kubectl config use-context karmada-apiserver +$ karmadactl get all --operation-scope all -l karmada.io/managed=true +``` +- **By checking Karmada control-plane objects**: operators can inspect whether `PropagationPolicy`/`ClusterPropagationPolicy` objects have been created, and whether Karmada has generated the corresponding `ResourceBinding`/`ClusterResourceBinding` and `Work` objects for workload propagation: +```bash +$ karmadactl get propagationpolicy -A +$ karmadactl get clusterpropagationpolicy -A +$ karmadactl get resourcebinding -A +$ karmadactl get clusterresourcebinding -A +$ karmadactl get work -A +``` + +#### How can someone using this project know that it is working for their instance? + +Users can verify that Karmada is working for their instance by checking the health status of Karmada components and confirming that there are no abnormal logs. + +In addition, they can perform a basic cross-cluster scheduling test and verify that the scheduling and propagation results match expectations. + +#### Describe the SLOs (Service Level Objectives) for this project. + +Karmada defines SLOs around the scalability and performance characteristics of multi-cluster federation. + +The current official SLOs include: + +- **Mutating API latency:** P99 latency of mutating API calls to a single resource object should be **<= 1 second** over the last 5 minutes. +- **Read-only API latency:** + - for resource-scoped non-streaming read-only API calls, P99 latency should be **<= 1 second** over the last 5 minutes; + - for namespace-scoped or cluster-scoped non-streaming read-only API calls, P99 latency should be **<= 30 seconds** over the last 5 minutes. +- **Resource distribution latency:** P99 delay from submitting a resource template and policy at the federated control plane to resource creation on the member cluster should be **<= 2 seconds**. + +In addition, Karmada is also tracking work-in-progress SLO areas including cluster registration latency and resource usage, with concrete targets still under development. + +#### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + +Operators can use the following SLIs to assess the health and service quality of Karmada: + +- **P99 latency of mutating API calls** (including POST, PUT, DELETE, and PATCH) to a single resource object over the last 5 minutes. +- **P99 latency of non-streaming read-only API calls** (including GET and LIST) over the last 5 minutes, measured separately by request scope. +- **P99 resource distribution latency**, measured from the time a resource template and policy are submitted to the federated control plane to the time the resource is created on the member cluster. +- **Cluster registration latency**, measured from the time a cluster is registered in the federation control plane to the time its status can be collected. +- **Resource usage**, measured as the amount of resources required for the federation control plane to operate normally after managing a given number of clusters. + +These SLIs are backed by Karmada's measurable metrics and can be used by operators to monitor the runtime health, responsiveness, and scalability of the service. + +### Dependencies + +#### Describe the specific running services the project depends on in the cluster. + +Karmada runs as an application on a Kubernetes host/management cluster, so its core runtime dependency is the Kubernetes cluster as an execution environment rather than additional mandatory third-party in-cluster services. In particular, Karmada depends on the host cluster’s standard workload-running capabilities, including Pod scheduling and lifecycle management, ServiceAccounts, RBAC, ConfigMaps, Secrets, Services, namespaces, and normal in-cluster networking and DNS. + +Optional services such as Prometheus, Grafana, or Dashboard can be integrated for observability and UI, but they are not required for core functionality. + +#### Describe the project's dependency lifecycle policy. + +Karmada manages dependencies through a combination of automated update tooling, security-driven patching and release maintenance. +- **Automated updates**: Karmada uses [Dependabot](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/dependabot.yml) to track and propose updates for GitHub Actions and Docker image dependencies on a weekly basis. This is configured not only for the main branch, but also for maintained release branches. +- **Security lifecycle**: Karmada includes [image vulnerability scanning in CI](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/workflows/ci-image-scanning.yaml). Security-driven dependency upgrades, including CVE-related updates, are handled as part of normal maintenance and release work. +- **Release maintenance**: Dependency updates are reflected in regular release and patch changelogs, including updates to Kubernetes dependencies, Go versions, base images, and supporting build/test tools. + +Overall, Karmada’s dependency lifecycle policy is to keep actively maintained branches current. + +#### How does the project incorporate and consider source composition analysis as part of its development and security hygiene? Describe how this source composition analysis (SCA) is tracked. + +Karmada incorporates source composition analysis (SCA) into its development and security hygiene through a combination of automated dependency monitoring, vulnerability scanning, license/composition analysis, and release transparency. + +- **Dependency monitoring**: Karmada uses [Dependabot](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/dependabot.yml) to track and propose updates for selected dependency classes, including GitHub Actions and Docker-related dependencies. This helps keep maintained branches current and reduces the window for known vulnerable dependencies. +- **Composition and license analysis**: Karmada runs [FOSSA](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/workflows/fossa.yml) scans in GitHub Actions to analyze dependency composition and related license metadata as part of its supply-chain hygiene. +- **Vulnerability scanning**: Karmada uses [Trivy in CI and on a scheduled basis](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/.github/workflows/ci-image-scanning.yaml) to scan component images for both OS and library vulnerabilities. +- **SBOM and release transparency**: Karmada [publishes SPDX SBOMs](https://karmada.io/docs/administrator/security/verify-artifacts#sbom) as part of its release assets, which provides downstream users with a machine-readable view of released dependency composition. + +SCA is tracked through Dependabot PRs, GitHub Actions workflow results, GitHub Security tab SARIF uploads, published SPDX SBOMs in release assets, and the project’s SECURITY-INSIGHTS.yml, which documents the SCA-related tooling and practices. + +#### Describe how the project implements changes based on source composition analysis (SCA) and the timescale. + +Changes driven by source composition analysis are typically applied through dependency update pull requests, validated in CI, and released through normal maintenance, release, and patch release workflows. + +In terms of timescale, these changes are handled continuously as part of ongoing maintenance, with some update signals generated weekly and security-related fixes prioritized as needed. + +### Troubleshooting + +#### How does this project recover if a key component or feature becomes unavailable? e.g. Kubernetes API server, etcd, database, leader node, etc. + +Karmada recovers from key component or dependency failures primarily through Kubernetes-native self-healing, leader election, and reconciliation. +- **Component failures**: Core Karmada components such as `karmada-controller-manager`, `karmada-scheduler`, `karmada-webhook`, and `karmada-apiserver` run as Kubernetes-managed workloads. If a component Pod fails, Kubernetes restarts or recreates it according to the deployment/stateful workload configuration and health probes. +- **Leader failure**: Several Karmada control-plane components use leader election. In a multi-replica deployment, if the active leader becomes unavailable, another replica can take over and continue processing. +- **API server/etcd unavailability**: If the Karmada API server or its backing etcd becomes unavailable, control-plane operations such as scheduling, propagation, and status updates may be temporarily delayed. Once service is restored, Karmada controllers resume reconciliation and converge the system back to the desired state. +- **Member cluster/API unavailability**: If a member cluster or its API server becomes temporarily unreachable, Karmada may be unable to propagate resources to that cluster or collect status from it during the outage. After connectivity is restored, reconciliation and retry logic continue synchronization automatically. + +In practice, recovery is achieved through a combination of Kubernetes restart/self-healing behavior, multi-replica leader failover, and controller reconciliation after the unavailable component or dependency returns. + +#### Describe the known failure modes. + +Known failure modes in Karmada mainly include control-plane dependency failures, control-plane component unavailability, member-cluster connectivity issues, and scale-related performance degradation. +- **Karmada API server or etcd unavailability**: scheduling, propagation, and status updates may be delayed or blocked until the control plane recovers. +- **Controller-manager or scheduler unavailability**: existing workloads in member clusters usually continue running, but new scheduling, reconciliation, propagation, and failover handling may stall temporarily. +- **Member cluster connectivity failure**: Karmada may be unable to propagate resources to that cluster or collect status from it. + +Overall, Karmada’s known failure modes are typically expressed as temporary control-plane unavailability, partial cluster-level degradation, or increased latency under scale. + +### Compliance + +#### What steps does the project take to ensure that all third-party code and components have correct and complete attribution and license notices? + +Karmada uses the following measures to preserve attribution and license information for third-party code and components: + +- **Repository policy for project-owned files:** Project-owned source files are required to carry the standard Apache-2.0 header. This is enforced by [verify-license.sh](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/hack/verify-license.sh) in CI. +- **Consistent headers for generated files:** Generated source files inherit the same standard header through the shared boilerplate file in [boilerplate.go.txt](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/hack/boilerplate/boilerplate.go.txt). +- **Preservation of upstream third-party notices:** Third-party code is primarily kept in dedicated locations such as `vendor/**`. These paths are excluded from header rewriting so that upstream notices are preserved rather than overwritten. +- **Release-time transparency:** Karmada [packages its root `LICENSE` file with binary release archives](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/hack/release.sh#L33) and [publishes SPDX SBOMs as release assets](https://karmada.io/docs/administrator/security/verify-artifacts#sbom) to provide build-time dependency transparency. + +#### Describe how the project ensures alignment with CNCF [recommendations](https://github.com/cncf/foundation/blob/main/policies-guidance/recommendations-for-attribution.md) for attribution notices. + +##### How are notices managed for third-party code incorporated directly into the project's source files? + +Karmada primarily handles this case as follows: + +- **Preferred approach:** It avoids mixing third-party code snippets into ordinary project-owned source files whenever possible. +- **Retention in dedicated locations:** Where third-party code is kept in-tree, the project prefers to retain it in dedicated third-party paths so that upstream notices can remain intact. +- **No automatic overwriting of upstream notices:** Those third-party paths are excluded from the automated header normalization step, which helps preserve upstream attribution. +- **Separate treatment for project-owned files:** Project-owned and generated source files continue to use the standard Apache-2.0 header verified in CI. + +##### How are notices retained for unmodified third-party components included within the project's repository? + +Karmada retains notices for unmodified third-party components in the following way: + +- **Dedicated third-party paths:** Unmodified third-party components are generally stored under paths such as `vendor/` and `third_party/`. +- **Exclusion from automatic header rewriting:** These paths are excluded from the automated add-license check, which prevents upstream license and attribution notices from being replaced by project boilerplate. +- **Preservation of upstream materials:** Vendored dependencies are refreshed through `go mod vendor`, and upstream license files present in vendored packages are preserved in the repository as provided by upstream sources. + +##### How are notices for all dependencies obtained at build time included in the project's distributed build artifacts (e.g. compiled binaries, container images)? + +For build-time dependencies, Karmada uses the following approach: + +- **Bundled project license:** Binary release archives include the project `LICENSE` file via the [release packaging script](https://github.com/karmada-io/karmada/blob/cef28e8d92bbf8e4b209e066aed1e6e39ea09625/hack/release.sh#L33). +- **SBOM-based dependency disclosure:** Karmada publishes [SPDX SBOMs](https://karmada.io/docs/administrator/security/verify-artifacts#sbom) as release assets to disclose the dependency set incorporated into distributed outputs. + +### Security + +#### Security Hygiene + +##### How is the project executing access control? + +Karmada executes access control primarily through Kubernetes-native authentication and authorization mechanisms: + +- **RBAC-based authorization:** The Karmada API server is configured with `--authorization-mode=Node,RBAC`, and access to control-plane resources is governed through Kubernetes RBAC objects. +- **Certificate-based authentication and secure transport:** The control plane uses TLS, client CA validation, request-header authentication, and a minimum TLS version configuration to secure component-to-component and client-to-server access. +- **Optional OIDC integration:** The Karmada API server supports `--oidc-*` settings, allowing operators to integrate external identity providers for authentication. +- **Scoped permissions for agents and cluster registration:** During cluster registration, Karmada generates dedicated `ClusterRole`, `Role`, and binding resources for `karmada-agent`, including separate permissions for cluster access, secret access, and work access, rather than relying on a single broad permission set. +- **Impersonation for unified authorization:** Karmada uses impersonation in cluster proxy scenarios to pass the requesting user’s identity to the member cluster, where authorization is evaluated against the member cluster’s own RBAC rules. + +#### Cloud Native Threat Modeling + +##### How does the project ensure its security reporting and response team is representative of its community diversity (organizational and individual)? + +Karmada’s security reporting and response team is governed through [public community documentation](https://github.com/karmada-io/community/blob/7782163b8e140716a44d18ac17ed8a4c48e8dd5b/security-team/security-release-process.md#the-security-team) under security-team in the community repository. The joining, stepping-down, and contribution mechanisms are transparent and actionable, providing a clear path for qualified and interested individuals or organizations to participate in the rotation process. + +New participants first complete a three-month rotation in the Associate role and may then be nominated by maintainers. Participation is based on sustained contribution and active involvement, rather than on a specific organizational identity. + +##### How does the project invite and rotate security reporting team members? + +Karmada documents both invitation and rotation expectations in its [security-team process](https://github.com/karmada-io/community/blob/7782163b8e140716a44d18ac17ed8a4c48e8dd5b/security-team/security-release-process.md#the-security-team): +- **Invitation path:** According to `security-team/security-release-process.md`, new potential members first complete a minimum **3-month rotation** in the [Associate](https://github.com/karmada-io/community/blob/7782163b8e140716a44d18ac17ed8a4c48e8dd5b/security-team/security-release-process.md#Associate) role before joining the security team. +- **Nomination:** These individuals are nominated by maintainers. +- **Stepping down and turnover:** Members may step down at any time. +- **Removal for inactivity:** Members who are unreachable for more than 2 months or are not fulfilling documented responsibilities for more than 2 months may be removed through a super-majority vote of members. \ No newline at end of file