From c99d65c00aff01e133a3000fd637bb06cd122dd1 Mon Sep 17 00:00:00 2001 From: Guillaume Boutry Date: Sun, 12 Apr 2026 01:03:48 +0200 Subject: [PATCH] feat(ceph): decouple microceph from core and unify with storage framework MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extract microceph management into a dedicated CephFeature (EnableDisableFeature) with its own enable/disable lifecycle, and introduce a CephProvider abstraction (MicrocephProvider / NoCephProvider) following the existing OVN provider pattern. This allows deployments to opt out of internal Ceph via --no-default-storage while retaining the storage role for third-party backends (PureStorage, Hitachi, etc.). Key changes: Architecture: - New sunbeam.core.ceph module with CephProvider ABC, deployment mode persistence (CephDeploymentMode.MICROCEPH / NONE), and shared helpers (is_internal_ceph_enabled, is_internal_ceph_enabled_feature_aware, set_ceph_feature_enabled_state, get_default_ceph_bootstrap_steps, etc.) - New sunbeam.features.ceph.feature.CephFeature managing the full microceph + internal-ceph backend lifecycle, with a get_bootstrap_deploy_steps helper consumed by the bootstrap/join plans - New sunbeam.features.microceph.provider.MicrocephProvider implementing CephProvider for local Ceph via microceph charm - New sunbeam.storage.backends.internal_ceph.backend.InternalCephBackend routing cinder-volume-ceph through the storage framework - Moved microceph steps from sunbeam.steps.microceph to sunbeam.features.microceph.steps (deleted dead re-export shim) - Deleted sunbeam.steps.cinder_volume: functionality absorbed by the storage framework (storage/steps.py) Storage framework enhancements: - Backends declare extra_integrations and hypervisor_integrations via typed dataclasses (BackendIntegration, HypervisorIntegration) - Terraform plans support dynamic backend/hypervisor integrations via for_each instead of hardcoded cinder-volume-ceph resources - DeploySpecificCinderVolumeStep and DestroySpecificCinderVolumeStep manage per-backend cinder-volume lifecycle; deploy always recomputes machine_ids from clusterd so scale-out places units on new storage nodes for every HA backend - New CheckStorageNodeRemovalStep / RemoveStorageMachineUnitsStep for safe node removal - register_storage_terraform_plan extracted as a module-level helper in storage/base.py with STORAGE_TFPLAN / STORAGE_TFPLAN_DIR constants, with StorageBackendBase.register_terraform_plan kept as a thin shim - Destroy step uses a setdefault-backed local variable so a missing tfvars['backends'] entry no longer raises KeyError Upgrade migration (storage_migration.py): - MigrateCinderVolumeToStorageFrameworkStep migrates legacy cinder-volume-plan state to the unified storage backend plan, merging into existing third-party backend config rather than overwriting it; _clear_old_state runs last so partial failures leave the legacy state intact and the step re-runs cleanly - ImportCephResourcesToStorageFrameworkStep imports Juju resources into the new terraform state using explicit HA principal lookup and clears StorageBackendLegacyImportIds from clusterd on success - BackfillCephFeatureStateStep marks existing internal Ceph deployments as feature-enabled and sets CephConfig.mode = MICROCEPH explicitly so upgraded clusters satisfy strict-equality checks - Terraform moved blocks preserve brownfield hypervisor and storage backend resources across the migration Bootstrap / join / remove: - Bootstrap and maas deploy accept --no-default-storage to skip internal Ceph - Microceph deploy + OSD config steps are inlined into local bootstrap plan1, local join plan4 (first-storage), and maas deploy plan2 before DeployControlPlaneStep so data.juju_offer.microceph resolves at terraform plan time - SetCephProviderStep runs in the pre-plan2 phase for maas deploy so plan assembly sees the correct CephConfig.mode - Local join runs ReapplyHypervisorOptionalIntegrationsStep in a post-ensure_default_ceph_feature plan5 so collect_hypervisor_integrations sees internal-ceph in clusterd and wires cinder-volume-ceph:ceph-access on first storage join - Node remove uses storage framework checks instead of legacy cinder-volume distribution checks - Feature join/depart hooks propagate exceptions to surface failures Disable flow safety: - Ceph disable uses phased execution: destroy backend first (mode stays MICROCEPH), then flip mode to NONE, then reapply control plane (sees NoCephProvider), then destroy microceph — minimising inconsistency window on partial failure - A ceph_disabling feature marker is set on entry and cleared only after the final phase; is_internal_ceph_enabled_feature_aware respects the marker so the transient window does not leak stale "enabled" state to readers, and retries re-enter each idempotent phase safely Telemetry: - Added --metrics-storage-offer option for S3-compatible metrics storage - Telemetry enable/disable updates cinder-volume notification flags across all storage backends Other: - Shared filesystem feature blocks enablement without internal Ceph - Shared filesystem, maintenance, and telemetry call sites use is_internal_ceph_enabled_feature_aware - Maintenance commands cache is_internal_ceph_enabled per invocation - Removed deprecated cinder_volume_tfhelper parameter from hypervisor - TerraformHelper gains import_resource() and improved error messages Assisted-By: Claude Code (claude-opus-4-6) Assisted-By: Codex (gpt-5-4) Signed-off-by: Guillaume Boutry --- cloud/etc/deploy-openstack-hypervisor/main.tf | 17 +- .../deploy-openstack-hypervisor/variables.tf | 12 +- cloud/etc/deploy-storage/main.tf | 3 + .../deploy-storage/modules/backend/main.tf | 53 +- .../modules/backend/variables.tf | 23 + cloud/etc/deploy-storage/variables.tf | 7 + sunbeam-python/sunbeam/commands/resize.py | 26 +- sunbeam-python/sunbeam/core/ceph.py | 330 ++++++ sunbeam-python/sunbeam/core/deployment.py | 24 + sunbeam-python/sunbeam/core/terraform.py | 31 +- sunbeam-python/sunbeam/feature_manager.py | 37 + .../sunbeam/features/ceph/__init__.py | 2 + .../sunbeam/features/ceph/feature.py | 474 +++++++++ .../{steps => features/ceph}/microceph.py | 76 ++ .../features/instance_recovery/feature.py | 2 + .../sunbeam/features/interface/v1/base.py | 12 + .../sunbeam/features/maintenance/checks.py | 2 +- .../sunbeam/features/maintenance/commands.py | 21 +- .../sunbeam/features/secrets/feature.py | 2 + .../features/shared_filesystem/feature.py | 9 + .../sunbeam/features/telemetry/feature.py | 536 +++++++--- sunbeam-python/sunbeam/features/tls/ca.py | 5 +- sunbeam-python/sunbeam/features/tls/common.py | 5 +- .../sunbeam/provider/local/commands.py | 324 +++--- .../sunbeam/provider/local/deployment.py | 44 +- .../sunbeam/provider/maas/commands.py | 468 +++++---- sunbeam-python/sunbeam/provider/maas/steps.py | 2 +- sunbeam-python/sunbeam/steps/cinder_volume.py | 347 ------- .../sunbeam/steps/cluster_status.py | 9 +- sunbeam-python/sunbeam/steps/hypervisor.py | 32 +- sunbeam-python/sunbeam/steps/maintenance.py | 2 +- sunbeam-python/sunbeam/steps/openstack.py | 50 +- .../sunbeam/steps/upgrades/inter_channel.py | 23 +- .../sunbeam/steps/upgrades/intra_channel.py | 131 ++- .../steps/upgrades/storage_migration.py | 579 +++++++++++ .../backends/internal_ceph/__init__.py | 8 + .../storage/backends/internal_ceph/backend.py | 128 +++ sunbeam-python/sunbeam/storage/base.py | 153 ++- sunbeam-python/sunbeam/storage/manager.py | 44 +- sunbeam-python/sunbeam/storage/steps.py | 305 +++++- sunbeam-python/sunbeam/versions.py | 18 +- .../tests/unit/sunbeam/core/test_terraform.py | 43 + .../features/ceph/test_ceph_feature.py | 631 ++++++++++++ .../features/maintenance/test_commands.py | 4 + .../tests/unit/sunbeam/features/test_base.py | 81 ++ .../unit/sunbeam/features/test_telemetry.py | 400 ++++---- .../sunbeam/provider/local/test_commands.py | 160 +++ .../sunbeam/provider/maas/test_commands.py | 83 ++ .../unit/sunbeam/steps/test_cinder_volume.py | 363 ------- .../unit/sunbeam/steps/test_hypervisor.py | 179 ++++ .../unit/sunbeam/steps/test_maintenance.py | 2 +- .../unit/sunbeam/steps/test_microceph.py | 5 +- .../unit/sunbeam/steps/test_openstack.py | 9 + .../steps/upgrades/test_intra_channel.py | 62 +- .../steps/upgrades/test_storage_migration.py | 890 ++++++++++++++++ .../storage/backends/test_internal_ceph.py | 162 +++ .../tests/unit/sunbeam/storage/test_base.py | 144 +++ .../unit/sunbeam/storage/test_manager.py | 173 ++++ .../tests/unit/sunbeam/storage/test_steps.py | 959 ++++++++++++++++++ .../unit/sunbeam/test_terraform_configs.py | 29 + 60 files changed, 7046 insertions(+), 1709 deletions(-) create mode 100644 sunbeam-python/sunbeam/core/ceph.py create mode 100644 sunbeam-python/sunbeam/features/ceph/__init__.py create mode 100644 sunbeam-python/sunbeam/features/ceph/feature.py rename sunbeam-python/sunbeam/{steps => features/ceph}/microceph.py (89%) delete mode 100644 sunbeam-python/sunbeam/steps/cinder_volume.py create mode 100644 sunbeam-python/sunbeam/steps/upgrades/storage_migration.py create mode 100644 sunbeam-python/sunbeam/storage/backends/internal_ceph/__init__.py create mode 100644 sunbeam-python/sunbeam/storage/backends/internal_ceph/backend.py create mode 100644 sunbeam-python/tests/unit/sunbeam/features/ceph/test_ceph_feature.py create mode 100644 sunbeam-python/tests/unit/sunbeam/provider/local/test_commands.py create mode 100644 sunbeam-python/tests/unit/sunbeam/provider/maas/test_commands.py delete mode 100644 sunbeam-python/tests/unit/sunbeam/steps/test_cinder_volume.py create mode 100644 sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_storage_migration.py create mode 100644 sunbeam-python/tests/unit/sunbeam/storage/backends/test_internal_ceph.py create mode 100644 sunbeam-python/tests/unit/sunbeam/test_terraform_configs.py diff --git a/cloud/etc/deploy-openstack-hypervisor/main.tf b/cloud/etc/deploy-openstack-hypervisor/main.tf index 7f02bed70..e99615c8a 100644 --- a/cloud/etc/deploy-openstack-hypervisor/main.tf +++ b/cloud/etc/deploy-openstack-hypervisor/main.tf @@ -103,6 +103,11 @@ moved { to = juju_integration.hypervisor-ovn[0] } +moved { + from = juju_integration.hypervisor-cinder-ceph[0] + to = juju_integration.hypervisor-extra-integration["cinder-volume-ceph-ceph-access"] +} + resource "juju_integration" "hypervisor-ovn" { # Should be deployed if ovn-relay-offer-url set count = (var.ovn-relay-offer-url != null) ? 1 : 0 @@ -147,18 +152,20 @@ resource "juju_integration" "hypervisor-ceilometer" { } } -resource "juju_integration" "hypervisor-cinder-ceph" { - count = (var.cinder-volume-ceph-application-name != null) ? 1 : 0 +resource "juju_integration" "hypervisor-extra-integration" { + for_each = { + for i in var.extra_integrations : "${i.application_name}-${i.endpoint_name}" => i + } model_uuid = data.juju_model.machine_model.uuid application { name = juju_application.openstack-hypervisor.name - endpoint = "ceph-access" + endpoint = each.value.hypervisor_endpoint_name } application { - name = var.cinder-volume-ceph-application-name - endpoint = "ceph-access" + name = each.value.application_name + endpoint = each.value.endpoint_name } } diff --git a/cloud/etc/deploy-openstack-hypervisor/variables.tf b/cloud/etc/deploy-openstack-hypervisor/variables.tf index bbdb111e6..9f5ded091 100644 --- a/cloud/etc/deploy-openstack-hypervisor/variables.tf +++ b/cloud/etc/deploy-openstack-hypervisor/variables.tf @@ -79,10 +79,14 @@ variable "ceilometer-offer-url" { default = null } -variable "cinder-volume-ceph-application-name" { - description = "Name for cinder-volume-ceph application" - type = string - default = null +variable "extra_integrations" { + description = "Additional juju integrations for the hypervisor" + type = set(object({ + application_name = string + endpoint_name = string + hypervisor_endpoint_name = string + })) + default = [] } # Mandatory relation, no defaults diff --git a/cloud/etc/deploy-storage/main.tf b/cloud/etc/deploy-storage/main.tf index d715b1330..45f42c370 100644 --- a/cloud/etc/deploy-storage/main.tf +++ b/cloud/etc/deploy-storage/main.tf @@ -23,6 +23,8 @@ module "backends" { model_uuid = data.juju_model.model.uuid + application_name = each.value.application_name + units = each.value.units name = each.key principal_application = each.value.principal_application charm_name = each.value.charm_name @@ -32,6 +34,7 @@ module "backends" { charm_config = each.value.charm_config endpoint_bindings = each.value.endpoint_bindings secrets = each.value.secrets + extra_integrations = each.value.extra_integrations } module "cinder-volume" { diff --git a/cloud/etc/deploy-storage/modules/backend/main.tf b/cloud/etc/deploy-storage/modules/backend/main.tf index f6edac59f..03520ad94 100644 --- a/cloud/etc/deploy-storage/modules/backend/main.tf +++ b/cloud/etc/deploy-storage/modules/backend/main.tf @@ -21,35 +21,55 @@ data "juju_application" "cinder-volume" { model_uuid = data.juju_model.model.uuid } +moved { + from = juju_secret.secret + to = juju_secret.secret[0] +} + resource "juju_secret" "secret" { + count = length(local.secret_values) > 0 ? 1 : 0 model_uuid = data.juju_model.model.uuid name = "${var.name}-config-secret" - value = { - # Only template secrets that have a corresponding charm config value - for k, v in var.secrets : v => var.charm_config[k] if can(var.charm_config[k]) - } + value = local.secret_values +} + +moved { + from = juju_access_secret.secret-access + to = juju_access_secret.secret-access[0] } resource "juju_access_secret" "secret-access" { + count = length(local.secret_values) > 0 ? 1 : 0 model_uuid = data.juju_model.model.uuid - secret_id = juju_secret.secret.secret_id + secret_id = juju_secret.secret[0].secret_id applications = [juju_application.storage-backend.name] } locals { + application_name = var.application_name != null ? var.application_name : var.name + + secret_values = { + # Only template secrets that have a corresponding charm config value + for k, v in var.secrets : v => var.charm_config[k] if can(var.charm_config[k]) + } + charm_config = merge( { volume-backend-name = var.name }, var.charm_config, # Only template secrets uris in charm config if they have a value - { for k, v in var.secrets : k => juju_secret.secret.secret_uri if can(var.charm_config[k]) } + { + for k, v in var.secrets : + k => juju_secret.secret[0].secret_uri + if length(local.secret_values) > 0 && can(var.charm_config[k]) + } ) } # Deploy Storage backend charms resource "juju_application" "storage-backend" { - name = var.name + name = local.application_name model_uuid = data.juju_model.model.uuid - units = 1 + units = var.units charm { name = var.charm_name @@ -77,3 +97,20 @@ resource "juju_integration" "storage-backend-to-cinder-volume" { endpoint = "cinder-volume" } } + +resource "juju_integration" "backend-extra-integration" { + for_each = { + for i in var.extra_integrations : "${i.application_name}-${i.endpoint_name}" => i + } + model_uuid = data.juju_model.model.uuid + + application { + name = juju_application.storage-backend.name + endpoint = each.value.backend_endpoint_name + } + + application { + name = each.value.application_name + endpoint = each.value.endpoint_name + } +} diff --git a/cloud/etc/deploy-storage/modules/backend/variables.tf b/cloud/etc/deploy-storage/modules/backend/variables.tf index 3c2ad8c4e..6ec0a959f 100644 --- a/cloud/etc/deploy-storage/modules/backend/variables.tf +++ b/cloud/etc/deploy-storage/modules/backend/variables.tf @@ -12,6 +12,19 @@ variable "principal_application" { default = "cinder-volume" } +variable "application_name" { + description = "Juju application name for the deployed backend" + type = string + default = null +} + +variable "units" { + description = "Requested unit count for the backend application" + type = number + default = 1 + nullable = true +} + variable "charm_name" { description = "Name of the Storage charm" type = string @@ -57,3 +70,13 @@ variable "secrets" { type = map(string) default = {} } + +variable "extra_integrations" { + description = "Additional juju integrations for this backend" + type = set(object({ + application_name = string + endpoint_name = string + backend_endpoint_name = string + })) + default = [] +} diff --git a/cloud/etc/deploy-storage/variables.tf b/cloud/etc/deploy-storage/variables.tf index 6b17b4585..09ad9f5b7 100644 --- a/cloud/etc/deploy-storage/variables.tf +++ b/cloud/etc/deploy-storage/variables.tf @@ -28,6 +28,8 @@ variable "cinder-volumes" { variable "backends" { description = "Map of storage backend configurations" type = map(object({ + application_name = optional(string) + units = optional(number) principal_application = string charm_name = string charm_base = string @@ -36,6 +38,11 @@ variable "backends" { charm_config = map(string) endpoint_bindings = set(map(string)) secrets = map(string) + extra_integrations = optional(set(object({ + application_name = string + endpoint_name = string + backend_endpoint_name = string + })), []) })) default = {} } diff --git a/sunbeam-python/sunbeam/commands/resize.py b/sunbeam-python/sunbeam/commands/resize.py index d84d5dcca..1b5a06fd6 100644 --- a/sunbeam-python/sunbeam/commands/resize.py +++ b/sunbeam-python/sunbeam/commands/resize.py @@ -8,16 +8,16 @@ from rich.console import Console from sunbeam.clusterd.client import Client +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.common import click_option_topology, run_plan from sunbeam.core.deployment import Deployment from sunbeam.core.juju import JujuHelper from sunbeam.core.terraform import TerraformInitStep -from sunbeam.steps.cinder_volume import DeployCinderVolumeApplicationStep -from sunbeam.steps.k8s import PatchCoreDNSStep -from sunbeam.steps.microceph import ( +from sunbeam.features.ceph.microceph import ( DeployMicrocephApplicationStep, SetCephMgrPoolSizeStep, ) +from sunbeam.steps.k8s import PatchCoreDNSStep from sunbeam.steps.openstack import DeployControlPlaneStep from sunbeam.utils import click_option_show_hints @@ -48,7 +48,6 @@ def resize( openstack_tfhelper = deployment.get_tfhelper("openstack-plan") microceph_tfhelper = deployment.get_tfhelper("microceph-plan") - cinder_volume_tfhelper = deployment.get_tfhelper("cinder-volume-plan") jhelper = JujuHelper(deployment.juju_controller) storage_nodes = client.cluster.list_nodes_by_role("storage") @@ -58,7 +57,7 @@ def resize( LOG.warning("WARNING: Option --force is deprecated and the value is ignored.") plan = [] - if len(storage_nodes): + if len(storage_nodes) and is_internal_ceph_enabled(client): # Change default-pool-size based on number of storage nodes plan.extend( [ @@ -94,23 +93,6 @@ def resize( ] ) - if len(storage_nodes): - # DeployCinderVolumeApplicationStep depends on openstack-tfhelper - # to get outputs, so let OpenStack deployment complete first - plan.extend( - [ - TerraformInitStep(cinder_volume_tfhelper), - DeployCinderVolumeApplicationStep( - deployment, - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ), - ] - ) - run_plan(plan, console, show_hints) click.echo("Resize complete.") diff --git a/sunbeam-python/sunbeam/core/ceph.py b/sunbeam-python/sunbeam/core/ceph.py new file mode 100644 index 000000000..9e5a23b4c --- /dev/null +++ b/sunbeam-python/sunbeam/core/ceph.py @@ -0,0 +1,330 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +"""Abstract interface for Ceph storage providers. + +This module defines the contract between the Sunbeam core orchestration +code and the storage infrastructure that provides Ceph services. + +It also provides the deployment-mode selection mechanism (MICROCEPH vs +NONE) persisted in clusterd, following the OVN provider pattern. +""" + +from __future__ import annotations + +import abc +import enum +import logging +from typing import TYPE_CHECKING, Any + +import pydantic + +from sunbeam.clusterd.client import Client +from sunbeam.core.common import BaseStep, Result, ResultType, StepContext +from sunbeam.core.questions import load_answers, write_answers + +if TYPE_CHECKING: + from sunbeam.core.deployment import Deployment + +LOG = logging.getLogger(__name__) + +CLUSTERD_CONFIG_KEY = "CephConfig" +INTERNAL_CEPH_BACKEND_NAME = "internal-ceph" + +# Feature-info key used by CephFeature.run_disable_plans to signal that +# a multi-phase disable flow is in progress. Readers of +# is_internal_ceph_enabled_feature_aware treat this as "not enabled" so +# they do not observe the transient mode/feature disagreement between +# phase 1 (backend destroyed) and phase 3 (microceph destroyed). +CEPH_DISABLING_KEY = "ceph_disabling" + + +class CephDeploymentMode(enum.StrEnum): + """How Ceph storage infrastructure is provided.""" + + MICROCEPH = "microceph" + NONE = "none" + + +DEFAULT_MODE = CephDeploymentMode.MICROCEPH + + +class CephConfig(pydantic.BaseModel): + """Persisted ceph deployment configuration.""" + + mode: CephDeploymentMode | None = None + + +def load_ceph_config(client: Client) -> CephConfig: + """Load the Ceph deployment configuration from clusterd. + + :param client: the Sunbeam clusterd client + :return: the Ceph deployment configuration + """ + answers = load_answers(client, CLUSTERD_CONFIG_KEY) + return CephConfig.model_validate(answers) + + +def write_ceph_config(client: Client, config: CephConfig) -> None: + """Write the Ceph deployment configuration to clusterd. + + :param client: the Sunbeam clusterd client + :param config: the Ceph deployment configuration + """ + write_answers(client, CLUSTERD_CONFIG_KEY, config.model_dump()) + + +def is_internal_ceph_enabled(client: Client) -> bool: + """Check whether deployment mode requires Sunbeam-managed internal Ceph. + + This is a desired-state answer derived from ``CephConfig.mode``. It does + not imply that the Ceph feature lifecycle has already been reconciled. + + :param client: the Sunbeam clusterd client + :return: True if the deployment mode is not NONE (or unset, defaulting + to MICROCEPH for backward compatibility) + """ + config = load_ceph_config(client) + mode = config.mode if config.mode is not None else DEFAULT_MODE + return mode != CephDeploymentMode.NONE + + +def set_ceph_feature_enabled_state( + deployment: Deployment, client: Client, enabled: bool +) -> None: + """Persist ceph feature enablement state.""" + from sunbeam.features.interface.v1.base import EnableDisableFeature + + feature = deployment.get_feature_manager().resolve_feature("ceph") + if not isinstance(feature, EnableDisableFeature): + LOG.debug("Failed to resolve ceph feature to update feature state.") + return + feature.update_feature_info(client, {"enabled": str(enabled).lower()}) + + +def ensure_default_ceph_feature( + deployment: Deployment, show_hints: bool, **kwargs: Any +) -> None: + """Ensure the default ceph feature is enabled for storage.""" + feature = deployment.get_feature_manager().resolve_feature("ceph") + if feature is None: + raise RuntimeError("Failed to resolve ceph feature for default storage.") + feature_api: Any = feature + feature_api.enable_default_storage(deployment, show_hints, **kwargs) + + +def get_default_ceph_bootstrap_steps( + deployment: Deployment, + *, + enabled: bool, + expect_storage_node: bool, + node_name: str | None = None, + accept_defaults: bool = False, +) -> list[BaseStep]: + """Return bootstrap microceph deploy steps from the default ceph feature. + + This is the entrypoint provider commands use to pre-deploy microceph + (and, for local, configure OSDs) before DeployControlPlaneStep runs + during bootstrap or first-storage-node join. + + Returns [] if the ceph feature cannot be resolved or does not expose + ``get_bootstrap_deploy_steps`` — callers can rely on that as a safe + no-op when a future provider type does not need inline setup. + + See ``CephFeature.get_bootstrap_deploy_steps`` for the flag semantics. + """ + feature = deployment.get_feature_manager().resolve_feature("ceph") + if feature is None or not hasattr(feature, "get_bootstrap_deploy_steps"): + return [] + feature_api: Any = feature + return feature_api.get_bootstrap_deploy_steps( + deployment, + enabled=enabled, + expect_storage_node=expect_storage_node, + node_name=node_name, + accept_defaults=accept_defaults, + ) + + +def is_internal_ceph_enabled_feature_aware( + deployment: Deployment, client: Client +) -> bool: + """Return whether Sunbeam-managed internal Ceph is active. + + Checks both the deployment mode and the ceph feature's enabled state. + Falls back to the deployment mode if the feature state cannot be read. + + Returns False while a phased disable flow is in progress (signalled + via ``CEPH_DISABLING_KEY`` on the ceph feature info). This avoids the + transient window between phase 1 (backend destroyed) and phase 3 + (microceph destroyed) where the ``enabled`` flag and ``mode`` would + otherwise disagree. + """ + from sunbeam.features.interface.v1.base import EnableDisableFeature + + mode_enables_internal_ceph = is_internal_ceph_enabled(client) + try: + feature = deployment.get_feature_manager().resolve_feature("ceph") + if isinstance(feature, EnableDisableFeature): + feature_info = feature.get_feature_info(client) + if feature_info.get(CEPH_DISABLING_KEY, "false").lower() == "true": + return False + ceph_feature_enabled = deployment.get_feature_manager().is_feature_enabled( + deployment, "ceph" + ) + except Exception as e: + LOG.debug("Failed to read ceph feature enablement state: %r", e) + return mode_enables_internal_ceph + + return ceph_feature_enabled or mode_enables_internal_ceph + + +class SetCephProviderStep(BaseStep): + """Persist the Ceph deployment mode in clusterd.""" + + def __init__(self, client: Client, *, no_default_storage: bool = False): + super().__init__( + "Set Ceph provider", + "Setting Ceph provider in deployment configuration", + ) + self.client = client + self.wanted_mode = ( + CephDeploymentMode.NONE + if no_default_storage + else CephDeploymentMode.MICROCEPH + ) + + def is_skip(self, context: StepContext) -> Result: + """Skip if the mode is already set to the desired value.""" + config = load_ceph_config(self.client) + if config.mode == self.wanted_mode: + LOG.debug( + "Ceph deployment mode is already set to %s", + self.wanted_mode, + ) + return Result(ResultType.SKIPPED) + return Result(ResultType.COMPLETED) + + def run(self, context: StepContext) -> Result: + """Write the desired mode to clusterd.""" + config = load_ceph_config(self.client) + config.mode = self.wanted_mode + write_ceph_config(self.client, config) + return Result(ResultType.COMPLETED) + + +class CephProvider(abc.ABC): + """Abstract interface for Ceph storage infrastructure providers. + + This defines the contract between the Sunbeam orchestration code + and whatever provides Ceph services (microceph, remote ceph, etc.). + + Implementations are responsible for providing terraform variables + that the control plane and cinder-volume steps need. + """ + + @abc.abstractmethod + def get_control_plane_tfvars( + self, + model_with_owner: str, + storage_node_count: int, + ) -> dict[str, Any]: + """Return Ceph-related terraform variables for the control plane. + + Called by DeployControlPlaneStep / ReapplyControlPlaneStep to + compute the ceph portion of the openstack terraform variables. + + :param model_with_owner: Juju model name with owner prefix + (e.g. "admin/openstack-machines"). + :param storage_node_count: Number of nodes with the storage role. + :returns: dict with at minimum: + - enable-ceph: bool + - ceph-offer-url: str (when enabled) + - ceph-nfs-offer-url: str (when enabled) + - ceph-rgw-offer-url: str (when enabled) + - ceph-osd-replication-count: int (when enabled) + """ + + @abc.abstractmethod + def get_cinder_volume_tfvars( + self, + deployment: Deployment, + storage_node_count: int, + ) -> dict[str, Any]: + """Return Ceph-related terraform variables for cinder-volume. + + Called by DeployCinderVolumeApplicationStep to configure the + cinder-volume-ceph subordinate charm. + + :param deployment: The active deployment (for accessing terraform helpers). + :param storage_node_count: Number of nodes with the storage role. + :returns: dict with at minimum: + - charm_cinder_volume_ceph_config: dict (ceph-osd-replication-count) + - ceph-application-name: str (when storage nodes exist) + """ + + @abc.abstractmethod + def get_replica_count(self, storage_node_count: int) -> int: + """Return the Ceph OSD replication count. + + :param storage_node_count: Number of nodes with the storage role. + """ + + @property + @abc.abstractmethod + def application_name(self) -> str: + """Return the Juju application name (e.g. 'microceph').""" + + @property + @abc.abstractmethod + def status_column(self) -> str: + """Return the column name for cluster status display.""" + + @property + @abc.abstractmethod + def terraform_plan_name(self) -> str: + """Return the terraform plan key (e.g. 'microceph-plan').""" + + +class NoCephProvider(CephProvider): + """Provider used when no local Ceph infrastructure is deployed. + + Returns ``enable-ceph: False`` and empty cinder-volume ceph config + so that the control plane and cinder-volume steps work correctly + without any microceph deployment. + """ + + def get_control_plane_tfvars( + self, + model_with_owner: str, + storage_node_count: int, + ) -> dict[str, Any]: + """Return disabled Ceph terraform variables.""" + return {"enable-ceph": False} + + def get_cinder_volume_tfvars( + self, + deployment: Deployment, + storage_node_count: int, + ) -> dict[str, Any]: + """Return empty cinder-volume ceph config.""" + return {"charm_cinder_volume_ceph_config": {}} + + def get_replica_count(self, storage_node_count: int) -> int: + """Return zero replicas when no Ceph is deployed.""" + return 0 + + @property + def application_name(self) -> str: + """Return empty application name.""" + return "" + + @property + def status_column(self) -> str: + """Return empty status column.""" + return "" + + @property + def terraform_plan_name(self) -> str: + """Return empty terraform plan name.""" + return "" diff --git a/sunbeam-python/sunbeam/core/deployment.py b/sunbeam-python/sunbeam/core/deployment.py index 62a00a70f..4ca8b4dd9 100644 --- a/sunbeam-python/sunbeam/core/deployment.py +++ b/sunbeam-python/sunbeam/core/deployment.py @@ -43,10 +43,12 @@ if TYPE_CHECKING: from sunbeam.core import ovn + from sunbeam.core.ceph import CephProvider from sunbeam.feature_manager import FeatureManager from sunbeam.features.interface.v1.base import BaseFeature from sunbeam.storage.manager import StorageBackendManager else: + CephProvider = object FeatureManager = object BaseFeature = object StorageBackendManager = object @@ -218,6 +220,28 @@ def get_ovn_manager(self) -> "ovn.OvnManager": return ovn.OvnManager(self.get_client()) + def get_ceph_provider(self) -> "CephProvider": + """Return the Ceph storage provider for the deployment. + + Returns MicrocephProvider when the deployment mode is MICROCEPH + (or unset, for backward compatibility), NoCephProvider otherwise. + + Not cached: the mode can change within a single process when + SetCephProviderStep runs (e.g. during enable/disable flows). + """ + from sunbeam.core.ceph import ( + CephDeploymentMode, + NoCephProvider, + load_ceph_config, + ) + from sunbeam.features.ceph.microceph import MicrocephProvider + + config = load_ceph_config(self.get_client()) + mode = config.mode if config.mode is not None else CephDeploymentMode.MICROCEPH + if mode == CephDeploymentMode.MICROCEPH: + return MicrocephProvider() + return NoCephProvider() + def get_proxy_settings(self) -> dict: """Fetch proxy settings from clusterd, if not available use defaults.""" proxy = {} diff --git a/sunbeam-python/sunbeam/core/terraform.py b/sunbeam-python/sunbeam/core/terraform.py index 240cf42f7..43cbd016a 100644 --- a/sunbeam-python/sunbeam/core/terraform.py +++ b/sunbeam-python/sunbeam/core/terraform.py @@ -339,7 +339,7 @@ def state_list(self) -> list: except subprocess.CalledProcessError as e: LOG.error(f"terraform state list failed: {e.output}") LOG.error(e.stderr) - raise TerraformException(str(e)) + raise TerraformException(e.stderr or e.output or str(e)) def state_rm(self, resource: str) -> None: """Remove a resource from Terraform state.""" @@ -370,6 +370,35 @@ def state_rm(self, resource: str) -> None: LOG.error(e.stderr) raise TerraformException(str(e)) + def import_resource(self, address: str, resource_id: str) -> None: + """Import an existing resource into Terraform state.""" + os_env = os.environ.copy() + timestamp = datetime.now().strftime("%Y%m%d%H%M%S") + tf_log = str(self.path / f"terraform-import-{timestamp}.log") + os_env.update({"TF_LOG_PATH": tf_log}) + os_env.setdefault("TF_LOG", "INFO") + if self.env: + os_env.update(self.env) + + try: + cmd = [self.terraform, "import", "-input=false", address, resource_id] + LOG.debug(f"Running command {' '.join(cmd)}") + process = subprocess.run( + cmd, + capture_output=True, + text=True, + check=True, + cwd=self.path, + env=os_env, + ) + LOG.debug( + f"Command finished. stdout={process.stdout}, stderr={process.stderr}" + ) + except subprocess.CalledProcessError as e: + LOG.error(f"terraform import failed: {e.output}") + LOG.error(e.stderr) + raise TerraformException(e.stderr or e.output or str(e)) + def sync(self, reporter: ProgressReporter | None = None) -> None: """Sync the running state back to the Terraform state file.""" os_env = os.environ.copy() diff --git a/sunbeam-python/sunbeam/feature_manager.py b/sunbeam-python/sunbeam/feature_manager.py index 1163af54f..9dcc112c1 100644 --- a/sunbeam-python/sunbeam/feature_manager.py +++ b/sunbeam-python/sunbeam/feature_manager.py @@ -480,3 +480,40 @@ def update_features( "channels" ) feature.upgrade_hook(deployment) + + def _call_enabled_features_hook( + self, + deployment: Deployment, + node: typing.Any, + hook: typing.Literal["on_join", "on_depart"], + **kwargs: typing.Any, + ) -> None: + """Call hook on enabled features.""" + client = deployment.get_client() + for feature in self.features().values(): + if not isinstance(feature, EnableDisableFeature): + continue + try: + if not feature.is_enabled(client): + continue + except Exception as e: + LOG.debug( + "Failed to check if feature %r is enabled for hook %r: %r", + feature.name, + hook, + e, + ) + continue + getattr(feature, hook)(deployment, node, **kwargs) + + def call_enabled_features_on_join( + self, deployment: Deployment, node: typing.Any, **kwargs: typing.Any + ) -> None: + """Call on_join on enabled features.""" + self._call_enabled_features_hook(deployment, node, "on_join", **kwargs) + + def call_enabled_features_on_depart( + self, deployment: Deployment, node: typing.Any, **kwargs: typing.Any + ) -> None: + """Call on_depart on enabled features.""" + self._call_enabled_features_hook(deployment, node, "on_depart", **kwargs) diff --git a/sunbeam-python/sunbeam/features/ceph/__init__.py b/sunbeam-python/sunbeam/features/ceph/__init__.py new file mode 100644 index 000000000..12519b28d --- /dev/null +++ b/sunbeam-python/sunbeam/features/ceph/__init__.py @@ -0,0 +1,2 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 diff --git a/sunbeam-python/sunbeam/features/ceph/feature.py b/sunbeam-python/sunbeam/features/ceph/feature.py new file mode 100644 index 000000000..01f012730 --- /dev/null +++ b/sunbeam-python/sunbeam/features/ceph/feature.py @@ -0,0 +1,474 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +import logging +from typing import Any + +import click +from packaging.version import Version +from rich.console import Console + +from sunbeam.core.ceph import ( + CEPH_DISABLING_KEY, + INTERNAL_CEPH_BACKEND_NAME, + SetCephProviderStep, + is_internal_ceph_enabled, +) +from sunbeam.core.common import BaseStep, run_plan, update_config +from sunbeam.core.deployment import Deployment +from sunbeam.core.juju import JujuHelper +from sunbeam.core.manifest import FeatureConfig +from sunbeam.core.terraform import TerraformInitStep +from sunbeam.features.ceph.microceph import ( + ConfigureMicrocephOSDStep, + DeployMicrocephApplicationStep, + DestroyMicrocephApplicationStep, + ceph_replica_scale, +) +from sunbeam.features.interface.v1.base import EnableDisableFeature +from sunbeam.steps.openstack import DeployControlPlaneStep +from sunbeam.storage.backends.internal_ceph.backend import ( + InternalCephBackend, + InternalCephConfig, +) +from sunbeam.storage.steps import ( + DeploySpecificCinderVolumeStep, + DestroySpecificCinderVolumeStep, +) +from sunbeam.utils import click_option_show_hints, pass_method_obj + +LOG = logging.getLogger(__name__) +console = Console() + +DEFAULT_STORAGE_RECONCILED_KEY = "default_storage_reconciled" + + +class CephFeature(EnableDisableFeature): + version = Version("0.0.1") + + name = "ceph" + + def _get_internal_ceph_backend(self) -> InternalCephBackend: + """Create and return an InternalCephBackend instance.""" + return InternalCephBackend() + + def _get_provider_specific_steps( + self, deployment: Deployment, **kwargs: Any + ) -> list[BaseStep]: + """Return provider-specific storage setup steps.""" + client = deployment.get_client() + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + model = deployment.openstack_machines_model + + if deployment.type == "local": + node_name = kwargs.get("node_name") + if not node_name: + return [] + return [ + ConfigureMicrocephOSDStep( + client, + node_name, + jhelper, + model, + manifest=manifest, + accept_defaults=kwargs.get("accept_defaults", False), + ) + ] + + if deployment.type == "maas": + maas_client = kwargs.get("maas_client") + storage = kwargs.get("storage", []) + if maas_client is None or not storage: + return [] + from sunbeam.provider.maas.steps import MaasConfigureMicrocephOSDStep + + return [ + MaasConfigureMicrocephOSDStep( + client, + maas_client, + jhelper, + storage, + manifest, + model, + ) + ] + + return [] + + def _get_internal_ceph_enable_steps(self, deployment: Deployment) -> list[BaseStep]: + """Return the steps to register the internal-ceph backend.""" + backend = self._get_internal_ceph_backend() + backend.register_terraform_plan(deployment) + + client = deployment.get_client() + storage_tfhelper = deployment.get_tfhelper(backend.tfplan) + openstack_tfhelper = deployment.get_tfhelper("openstack-plan") + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + + # Compute replication count from storage node count + storage_nodes = client.cluster.list_nodes_by_role("storage") + replication_count = ceph_replica_scale(len(storage_nodes)) + + # Store config in clusterd + config = InternalCephConfig.model_validate( + {"ceph_osd_replication_count": replication_count}, by_name=True + ) + config_key = backend.config_key(INTERNAL_CEPH_BACKEND_NAME) + update_config( + client, config_key, config.model_dump(exclude_none=True, by_alias=True) + ) + + return [ + TerraformInitStep(storage_tfhelper), + TerraformInitStep(openstack_tfhelper), + DeploySpecificCinderVolumeStep( + deployment, + client, + storage_tfhelper, + jhelper, + manifest, + INTERNAL_CEPH_BACKEND_NAME, + backend, + deployment.openstack_machines_model, + ), + backend.create_deploy_step( + deployment, + client, + storage_tfhelper, + jhelper, + manifest, + config.model_dump(exclude_none=True, by_alias=True), + INTERNAL_CEPH_BACKEND_NAME, + deployment.openstack_machines_model, + accept_defaults=True, + ), + DeployControlPlaneStep( + deployment, + openstack_tfhelper, + jhelper, + manifest, + "auto", + deployment.openstack_machines_model, + ), + ] + + def _get_internal_ceph_disable_steps( + self, deployment: Deployment + ) -> list[BaseStep]: + """Return the steps to remove the internal-ceph backend. + + Does NOT include DeployControlPlaneStep — the caller must run + SetCephProviderStep(no_default_storage=True) and then construct + DeployControlPlaneStep separately so it picks up NoCephProvider. + """ + backend = self._get_internal_ceph_backend() + backend.register_terraform_plan(deployment) + + client = deployment.get_client() + storage_tfhelper = deployment.get_tfhelper(backend.tfplan) + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + + return [ + TerraformInitStep(storage_tfhelper), + backend.create_destroy_step( + deployment, + client, + storage_tfhelper, + jhelper, + manifest, + INTERNAL_CEPH_BACKEND_NAME, + deployment.openstack_machines_model, + ), + DestroySpecificCinderVolumeStep( + deployment, + client, + storage_tfhelper, + jhelper, + manifest, + INTERNAL_CEPH_BACKEND_NAME, + backend, + deployment.openstack_machines_model, + ), + ] + + def run_enable_plans( + self, + deployment: Deployment, + config: FeatureConfig, + show_hints: bool, + *, + provider_kwargs: dict[str, Any] | None = None, + ) -> None: + """Run plans to enable ceph support via microceph. + + :param provider_kwargs: kwargs forwarded to + ``_get_provider_specific_steps`` (e.g. ``node_name``, + ``maas_client``, ``storage``). Passed explicitly instead of + stashed on ``self`` so multiple call sites can share the + feature instance safely. + """ + client = deployment.get_client() + tfhelper = deployment.get_tfhelper("microceph-plan") + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + plan: list[BaseStep] = [ + SetCephProviderStep(client), + TerraformInitStep(tfhelper), + DeployMicrocephApplicationStep( + deployment, + client, + tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ] + plan.extend( + self._get_provider_specific_steps(deployment, **(provider_kwargs or {})) + ) + plan.extend(self._get_internal_ceph_enable_steps(deployment)) + run_plan(plan, console, show_hints) + click.echo("Ceph enabled.") + + def get_bootstrap_deploy_steps( + self, + deployment: Deployment, + *, + enabled: bool, + expect_storage_node: bool, + node_name: str | None = None, + accept_defaults: bool = False, + ) -> list[BaseStep]: + """Return microceph deploy steps for bootstrap/join plans. + + These run BEFORE ``DeployControlPlaneStep`` so the microceph + offer exists when the openstack terraform plan reads + ``data.juju_offer.microceph``. + + Returns ``[]`` when internal Ceph is disabled (``enabled`` is + False, e.g. ``--no-default-storage``) or when no storage node + will be present at control-plane apply time (``expect_storage_node`` + is False). The latter mirrors + ``MicrocephProvider.get_control_plane_tfvars`` which returns + ``enable-ceph=False`` when ``storage_node_count == 0``. + + :param enabled: Whether internal Ceph should be deployed at all. + Driven by the caller's CLI flag (``--no-default-storage``) + to avoid a plan-assembly-time clusterd query. + :param expect_storage_node: Whether a storage-role node will be + present when the control plane terraform plan runs. + :param node_name: If set, append ``ConfigureMicrocephOSDStep`` + for that node. Only used by local deployments; MAAS uses + ``MaasConfigureMicrocephOSDStep`` later in the flow. + :param accept_defaults: Forwarded to ``ConfigureMicrocephOSDStep``. + """ + if not enabled or not expect_storage_node: + return [] + + client = deployment.get_client() + tfhelper = deployment.get_tfhelper("microceph-plan") + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + + steps: list[BaseStep] = [ + TerraformInitStep(tfhelper), + DeployMicrocephApplicationStep( + deployment, + client, + tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ] + if node_name: + steps.append( + ConfigureMicrocephOSDStep( + client, + node_name, + jhelper, + deployment.openstack_machines_model, + manifest=manifest, + accept_defaults=accept_defaults, + ) + ) + return steps + + def post_enable( + self, deployment: Deployment, config: FeatureConfig, show_hints: bool + ) -> None: + """Mark explicit Ceph enablement as fully reconciled.""" + self.update_feature_info( + deployment.get_client(), + {DEFAULT_STORAGE_RECONCILED_KEY: "true"}, + ) + + def _is_default_storage_reconciled(self, client: Any) -> bool: + """Return whether default-storage lifecycle has been fully reconciled.""" + info = self.get_feature_info(client) + return ( + info.get("enabled", "false").lower() == "true" + and info.get(DEFAULT_STORAGE_RECONCILED_KEY, "false").lower() == "true" + ) + + def enable_default_storage( + self, deployment: Deployment, show_hints: bool, **kwargs: Any + ) -> None: + """Enable the default ceph-backed storage path.""" + client = deployment.get_client() + if not is_internal_ceph_enabled(client): + return + if self._is_default_storage_reconciled(client): + return + + self.run_enable_plans( + deployment, FeatureConfig(), show_hints, provider_kwargs=kwargs + ) + self.update_feature_info( + client, + { + "enabled": "true", + DEFAULT_STORAGE_RECONCILED_KEY: "true", + }, + ) + + def on_join(self, deployment: Deployment, node: Any, **kwargs: Any) -> None: + """Reconcile default ceph-backed storage when a storage node joins.""" + roles = kwargs.get("roles") + if roles is None and isinstance(node, dict): + roles = node.get("role", []) + if "storage" not in (roles or []): + return + + client = deployment.get_client() + tfhelper = deployment.get_tfhelper("microceph-plan") + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + show_hints = kwargs.get("show_hints", False) + + plan: list[BaseStep] = [ + TerraformInitStep(tfhelper), + DeployMicrocephApplicationStep( + deployment, + client, + tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ] + plan.extend(self._get_provider_specific_steps(deployment, **kwargs)) + run_plan(plan, console, show_hints) + + # When the feature is already reconciled, reapply the storage + # backend and control plane so the new node gets cinder-volume + # placement and the replica count is updated. + if self._is_default_storage_reconciled(client): + run_plan( + self._get_internal_ceph_enable_steps(deployment), + console, + show_hints, + ) + + def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: + """Run plans to disable ceph support and teardown microceph. + + Three phases to minimise inconsistency on partial failure: + 1. Destroy internal-ceph backend and cinder-volume (mode still MICROCEPH) + 2. Persist mode=NONE, then reapply control plane (sees NoCephProvider) + 3. Destroy MicroCeph application + + A ``ceph_disabling`` marker is set on the feature info for the + duration of the flow so readers of + ``is_internal_ceph_enabled_feature_aware`` see the deployment as + "not enabled" throughout the transition window. The marker is + only cleared once all three phases succeed, so a retry after a + partial failure re-enters each idempotent phase safely. + """ + client = deployment.get_client() + tfhelper = deployment.get_tfhelper("microceph-plan") + jhelper = JujuHelper(deployment.juju_controller) + manifest = deployment.get_manifest(self.user_manifest) + + self.update_feature_info(client, {CEPH_DISABLING_KEY: "true"}) + + # Phase 1: destroy backend (mode stays MICROCEPH — safe to retry) + run_plan( + self._get_internal_ceph_disable_steps(deployment), + console, + show_hints, + ) + + # Phase 2: flip mode, then reapply control plane with NoCephProvider. + # DeployControlPlaneStep must be constructed AFTER SetCephProviderStep + # runs so it picks up NoCephProvider via deployment.get_ceph_provider(). + openstack_tfhelper = deployment.get_tfhelper("openstack-plan") + run_plan( + [SetCephProviderStep(client, no_default_storage=True)], + console, + show_hints, + ) + run_plan( + [ + TerraformInitStep(openstack_tfhelper), + DeployControlPlaneStep( + deployment, + openstack_tfhelper, + jhelper, + manifest, + "auto", + deployment.openstack_machines_model, + ), + ], + console, + show_hints, + ) + + # Phase 3: destroy MicroCeph + run_plan( + [ + TerraformInitStep(tfhelper), + DestroyMicrocephApplicationStep( + client, + tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ], + console, + show_hints, + ) + + # Only clear the marker once every phase has succeeded. + self.update_feature_info(client, {CEPH_DISABLING_KEY: "false"}) + click.echo("Ceph disabled.") + + @click.command() + @click_option_show_hints + @pass_method_obj + def enable_cmd(self, deployment: Deployment, show_hints: bool) -> None: + """Enable ceph support.""" + self.enable_feature(deployment, FeatureConfig(), show_hints) + + @click.command() + @click.option( + "--force", + is_flag=True, + default=False, + help="Force disable ceph. WARNING: This will result in data loss.", + ) + @click_option_show_hints + @pass_method_obj + def disable_cmd( + self, deployment: Deployment, force: bool = False, show_hints: bool = False + ) -> None: + """Disable ceph support.""" + if not force: + raise click.ClickException( + "Disabling ceph will result in data loss. Use --force to confirm." + ) + self.disable_feature(deployment, show_hints) diff --git a/sunbeam-python/sunbeam/steps/microceph.py b/sunbeam-python/sunbeam/features/ceph/microceph.py similarity index 89% rename from sunbeam-python/sunbeam/steps/microceph.py rename to sunbeam-python/sunbeam/features/ceph/microceph.py index de597a367..edc5a0545 100644 --- a/sunbeam-python/sunbeam/steps/microceph.py +++ b/sunbeam-python/sunbeam/features/ceph/microceph.py @@ -1,6 +1,13 @@ # SPDX-FileCopyrightText: 2023 - Canonical Ltd # SPDX-License-Identifier: Apache-2.0 +"""Microceph implementation of CephProvider. + +Provides local Ceph storage via the microceph charm deployed to the +machine model on storage-role nodes. This is the default and currently +only implementation of CephProvider. +""" + import ast import logging from typing import Any @@ -11,6 +18,7 @@ from sunbeam.clusterd.client import Client from sunbeam.clusterd.service import NodeNotExistInClusterException from sunbeam.core import questions +from sunbeam.core.ceph import CephProvider from sunbeam.core.common import ( BaseStep, Result, @@ -44,6 +52,7 @@ CEPH_NFS_RELATION = "ceph-nfs" NFS_OFFER_NAME = "microceph-ceph-nfs" RGW_OFFER_NAME = "microceph-ceph-rgw" +TERRAFORM_PLAN_NAME = "microceph-plan" MICROCEPH_APP_TIMEOUT = ( 1800 # 30 minutes, can trigger to deploy mutliple units in parallel ) @@ -607,3 +616,70 @@ def run(self, context: StepContext) -> Result: ) return super().run(context) + + +class MicrocephProvider(CephProvider): + """Ceph storage provider using the microceph charm. + + Deploys microceph locally on storage-role nodes. This is the default + and currently only implementation of CephProvider. + """ + + def get_control_plane_tfvars( + self, + model_with_owner: str, + storage_node_count: int, + ) -> dict[str, Any]: + """Return Ceph-related terraform variables for the control plane.""" + tfvars: dict[str, Any] = {} + if storage_node_count > 0: + tfvars["enable-ceph"] = True + tfvars["ceph-offer-url"] = f"{model_with_owner}.{APPLICATION}" + tfvars["ceph-nfs-offer-url"] = f"{model_with_owner}.{NFS_OFFER_NAME}" + tfvars["ceph-rgw-offer-url"] = f"{model_with_owner}.{RGW_OFFER_NAME}" + tfvars["ceph-osd-replication-count"] = ceph_replica_scale( + storage_node_count + ) + else: + tfvars["enable-ceph"] = False + return tfvars + + def get_cinder_volume_tfvars( + self, + deployment: Deployment, + storage_node_count: int, + ) -> dict[str, Any]: + """Return Ceph-related terraform variables for cinder-volume.""" + tfvars: dict[str, Any] = { + "charm_cinder_volume_ceph_config": { + "ceph-osd-replication-count": ceph_replica_scale(storage_node_count), + }, + } + + if storage_node_count > 0: + microceph_tfhelper = deployment.get_tfhelper(TERRAFORM_PLAN_NAME) + microceph_tf_output = microceph_tfhelper.output() + ceph_application_name = microceph_tf_output.get("ceph-application-name") + if ceph_application_name: + tfvars["ceph-application-name"] = ceph_application_name + + return tfvars + + def get_replica_count(self, storage_node_count: int) -> int: + """Return the Ceph OSD replication count.""" + return ceph_replica_scale(storage_node_count) + + @property + def application_name(self) -> str: + """Return the microceph Juju application name.""" + return APPLICATION + + @property + def status_column(self) -> str: + """Return the column name for cluster status display.""" + return "storage" + + @property + def terraform_plan_name(self) -> str: + """Return the terraform plan key.""" + return TERRAFORM_PLAN_NAME diff --git a/sunbeam-python/sunbeam/features/instance_recovery/feature.py b/sunbeam-python/sunbeam/features/instance_recovery/feature.py index 2571b520a..41150976b 100644 --- a/sunbeam-python/sunbeam/features/instance_recovery/feature.py +++ b/sunbeam-python/sunbeam/features/instance_recovery/feature.py @@ -177,6 +177,7 @@ def run_enable_plans( self.manifest, deployment.openstack_machines_model, extra_tfvars=extra_tfvars, + deployment=deployment, ), ] ) @@ -199,6 +200,7 @@ def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: self.manifest, deployment.openstack_machines_model, extra_tfvars=extra_tfvars, + deployment=deployment, ), RemoveSaasApplicationsStep( jhelper, diff --git a/sunbeam-python/sunbeam/features/interface/v1/base.py b/sunbeam-python/sunbeam/features/interface/v1/base.py index d14e3dcf3..f0be44dd6 100644 --- a/sunbeam-python/sunbeam/features/interface/v1/base.py +++ b/sunbeam-python/sunbeam/features/interface/v1/base.py @@ -589,6 +589,18 @@ def is_enabled(self, client: Client) -> bool: info = self.get_feature_info(client) return info.get("enabled", "false").lower() == "true" + def on_join( + self, deployment: Deployment, node: typing.Any, **kwargs: typing.Any + ) -> None: + """Hook invoked when a node joins the deployment.""" + pass + + def on_depart( + self, deployment: Deployment, node: typing.Any, **kwargs: typing.Any + ) -> None: + """Hook invoked when a node departs the deployment.""" + pass + def check_enabled_requirement_is_compatible( self, deployment: Deployment, requirement: FeatureRequirement ): diff --git a/sunbeam-python/sunbeam/features/maintenance/checks.py b/sunbeam-python/sunbeam/features/maintenance/checks.py index 34c757564..cc825608d 100644 --- a/sunbeam-python/sunbeam/features/maintenance/checks.py +++ b/sunbeam-python/sunbeam/features/maintenance/checks.py @@ -49,7 +49,7 @@ else: l_client = LazyImport("lightkube.core.client") l_apps = LazyImport("lightkube.resources.apps_v1") -from sunbeam.steps.microceph import APPLICATION as _MICROCEPH_APPLICATION +from sunbeam.features.ceph.microceph import APPLICATION as _MICROCEPH_APPLICATION console = Console() LOG = logging.getLogger(__name__) diff --git a/sunbeam-python/sunbeam/features/maintenance/commands.py b/sunbeam-python/sunbeam/features/maintenance/commands.py index ad16f317c..8e5ed7547 100644 --- a/sunbeam-python/sunbeam/features/maintenance/commands.py +++ b/sunbeam-python/sunbeam/features/maintenance/commands.py @@ -8,6 +8,7 @@ import click from rich.console import Console +from sunbeam.core.ceph import is_internal_ceph_enabled_feature_aware from sunbeam.core.checks import Check, run_preflight_checks from sunbeam.core.common import ( BaseStep, @@ -128,6 +129,9 @@ def __init__( self.client = deployment.get_client() self.jhelper = JujuHelper(deployment.juju_controller) self.ops_viewer = OperationViewer(self.node, OperationGoal.EnableMaintenance) + self._internal_ceph = is_internal_ceph_enabled_feature_aware( + self.deployment, self.client + ) def check(self, console: Console) -> None: """Run pre-flight checks.""" @@ -155,7 +159,7 @@ def check(self, console: Console) -> None: ), ] - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: preflight_checks += [ checks.MicroCephMaintenancePreflightCheck( client=self.client, @@ -219,7 +223,7 @@ def apply(self, console: Console, show_hints: bool, plan_results: dict) -> None: ) ) - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: operation_plan.append( MicroCephActionStep( client=self.client, @@ -308,7 +312,7 @@ def dry_run(self, console: Console, show_hints: bool) -> dict: ) ) - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: generate_operation_plan.append( MicroCephActionStep( client=self.client, @@ -363,7 +367,7 @@ def dry_run(self, console: Console, show_hints: bool) -> dict: if "compute" in node_status: self.ops_viewer.add_watch_actions(actions=audit_info["actions"]) - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: self.ops_viewer.add_maintenance_action_steps( action_result=microceph_enter_maintenance_dry_run_action_result ) @@ -399,6 +403,9 @@ def __init__( self.client = deployment.get_client() self.jhelper = JujuHelper(deployment.juju_controller) self.ops_viewer = OperationViewer(node, OperationGoal.DisableMaintenance) + self._internal_ceph = is_internal_ceph_enabled_feature_aware( + self.deployment, self.client + ) def check(self, console: Console) -> None: """Run pre-flight checks.""" @@ -446,7 +453,7 @@ def apply(self, console: Console, show_hints: bool, plan_results: dict) -> None: audit=audit_info["audit"], ), ] - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: operation_plan.append( MicroCephActionStep( client=self.client, @@ -507,7 +514,7 @@ def dry_run(self, console: Console, show_hints: bool) -> dict: deployment=self.deployment, node=self.node ) ) - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: generate_operation_plan.append( MicroCephActionStep( client=self.client, @@ -552,7 +559,7 @@ def dry_run(self, console: Console, show_hints: bool) -> dict: self.ops_viewer.add_step(step_name=EnableHypervisorStep.__name__) if not self.disable_instance_rebalancing: self.ops_viewer.add_watch_actions(actions=audit_info["actions"]) - if "storage" in node_status: + if "storage" in node_status and self._internal_ceph: self.ops_viewer.add_maintenance_action_steps( action_result=microceph_exit_maintenance_dry_run_action_result ) diff --git a/sunbeam-python/sunbeam/features/secrets/feature.py b/sunbeam-python/sunbeam/features/secrets/feature.py index aafeed30d..5bf449f7e 100644 --- a/sunbeam-python/sunbeam/features/secrets/feature.py +++ b/sunbeam-python/sunbeam/features/secrets/feature.py @@ -132,6 +132,7 @@ def run_enable_plans( self.manifest, deployment.openstack_machines_model, extra_tfvars=extra_tfvars, + deployment=deployment, ), ] run_plan(plan2, console, show_hints) @@ -156,6 +157,7 @@ def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: self.manifest, deployment.openstack_machines_model, extra_tfvars=extra_tfvars, + deployment=deployment, ), RemoveSaasApplicationsStep( jhelper, diff --git a/sunbeam-python/sunbeam/features/shared_filesystem/feature.py b/sunbeam-python/sunbeam/features/shared_filesystem/feature.py index 138387131..d3c1cb095 100644 --- a/sunbeam-python/sunbeam/features/shared_filesystem/feature.py +++ b/sunbeam-python/sunbeam/features/shared_filesystem/feature.py @@ -8,6 +8,7 @@ from packaging.version import Version from rich.console import Console +from sunbeam.core.ceph import is_internal_ceph_enabled_feature_aware from sunbeam.core.common import ( BaseStep, run_plan, @@ -217,6 +218,14 @@ def set_tfvars_on_resize( @pass_method_obj def enable_cmd(self, deployment: Deployment, show_hints: bool) -> None: """Enable Shared Filesystems service.""" + client = deployment.get_client() + if not client.cluster.list_nodes_by_role( + "storage" + ) or not is_internal_ceph_enabled_feature_aware(deployment, client): + raise click.ClickException( + "Shared Filesystems requires Ceph storage." + " Add storage nodes before enabling this feature." + ) self.enable_feature(deployment, FeatureConfig(), show_hints) @click.command() diff --git a/sunbeam-python/sunbeam/features/telemetry/feature.py b/sunbeam-python/sunbeam/features/telemetry/feature.py index e7313a033..82e53c762 100644 --- a/sunbeam-python/sunbeam/features/telemetry/feature.py +++ b/sunbeam-python/sunbeam/features/telemetry/feature.py @@ -4,12 +4,29 @@ import logging import click +import pydantic from packaging.version import Version from rich.console import Console -from sunbeam.core.common import BaseStep, run_plan +from sunbeam.clusterd.client import Client +from sunbeam.clusterd.service import ConfigItemNotFoundException +from sunbeam.core.ceph import is_internal_ceph_enabled_feature_aware +from sunbeam.core.checks import ( + Check, + JujuControllerRegistrationCheck, + run_preflight_checks, +) +from sunbeam.core.common import ( + BaseStep, + Result, + ResultType, + StepContext, + read_config, + run_plan, + update_config, +) from sunbeam.core.deployment import Deployment -from sunbeam.core.juju import JujuHelper +from sunbeam.core.juju import JujuHelper, JujuStepHelper, JujuWaitException from sunbeam.core.manifest import ( AddManifestStep, CharmManifest, @@ -17,6 +34,7 @@ SoftwareConfig, ) from sunbeam.core.openstack import OPENSTACK_MODEL +from sunbeam.core.questions import load_answers, write_answers from sunbeam.core.terraform import TerraformInitStep from sunbeam.features.interface.v1.openstack import ( DisableOpenStackApplicationStep, @@ -24,17 +42,183 @@ OpenStackControlPlaneFeature, TerraformPlanLocation, ) -from sunbeam.steps.cinder_volume import DeployCinderVolumeApplicationStep from sunbeam.steps.hypervisor import ReapplyHypervisorTerraformPlanStep from sunbeam.steps.juju import RemoveSaasApplicationsStep -from sunbeam.storage.manager import StorageBackendManager -from sunbeam.storage.steps import DeploySpecificCinderVolumeStep +from sunbeam.storage.base import STORAGE_TFPLAN, register_storage_terraform_plan +from sunbeam.storage.steps import ( + STORAGE_BACKEND_TFVAR_CONFIG_KEY, + ReapplyStorageBackendTerraformPlanStep, +) from sunbeam.utils import click_option_show_hints, pass_method_obj from sunbeam.versions import OPENSTACK_CHANNEL LOG = logging.getLogger(__name__) console = Console() +TELEMETRY_METRICS_BACKEND_KEY = "TelemetryMetricsBackend" + + +class UpdateCinderVolumeTelemetryTfvarsStep(BaseStep): + """Update enable-telemetry-notifications in all cinder-volume entries. + + Reads the storage backend tfvars from clusterd, flips the + ``enable-telemetry-notifications`` flag on every ``cinder-volumes`` + entry, and writes them back. A subsequent + ``ReapplyStorageBackendTerraformPlanStep`` is needed to actually + apply the change. + """ + + def __init__( + self, + client: Client, + enable: bool, + ): + action = "Enable" if enable else "Disable" + super().__init__( + f"{action} telemetry notifications on cinder-volume", + f"{action.lower().rstrip('e')}ing telemetry notifications" + " on cinder-volume entries", + ) + self.client = client + self.enable = enable + + def is_skip(self, context: StepContext) -> Result: + """Skip when there are no cinder-volume entries to update.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + return Result( + ResultType.SKIPPED, + "No storage backend config found; nothing to update.", + ) + + if not tfvars.get("cinder-volumes"): + return Result( + ResultType.SKIPPED, + "No cinder-volume entries found; nothing to update.", + ) + + return Result(ResultType.COMPLETED) + + def run(self, context: StepContext) -> Result: + """Update cinder-volume tfvars with telemetry notification flag.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + return Result(ResultType.COMPLETED) + + cinder_volumes = tfvars.get("cinder-volumes", {}) + if not cinder_volumes: + return Result(ResultType.COMPLETED) + + for entry in cinder_volumes.values(): + entry["enable-telemetry-notifications"] = self.enable + + update_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY, tfvars) + return Result(ResultType.COMPLETED) + + +GNOCCHI_S3_ENDPOINT = "gnocchi:s3-credentials" +TELEMETRY_DEPLOY_TIMEOUT = 1200 # 20 minutes + + +class TelemetryMetricsBackendConfig(pydantic.BaseModel): + """Persisted metrics storage backend configuration.""" + + offer_url: str | None = None + + +class IntegrateMetricsStorageOfferStep(BaseStep, JujuStepHelper): + """Integrate Gnocchi with an external S3 offer. + + Workaround for https://github.com/juju/terraform-provider-juju/issues/119 + """ + + def __init__( + self, + deployment: Deployment, + feature: "TelemetryFeature", + jhelper: JujuHelper, + ): + super().__init__( + "Integrate metrics storage offer", + "Integrating S3 metrics storage offer with Gnocchi", + ) + self.deployment = deployment + self.feature = feature + self.jhelper = jhelper + + def run(self, context: StepContext) -> Result: + """Integrate gnocchi with the S3 offer.""" + if not self.feature.metrics_storage_offer_url: + return Result(ResultType.SKIPPED) + + self.integrate( + OPENSTACK_MODEL, + GNOCCHI_S3_ENDPOINT, + self.feature.metrics_storage_offer_url, + ) + + try: + self.jhelper.wait_application_ready( + "gnocchi", + OPENSTACK_MODEL, + timeout=TELEMETRY_DEPLOY_TIMEOUT, + ) + except (JujuWaitException, TimeoutError) as e: + LOG.debug("Gnocchi not ready after S3 integration", exc_info=True) + return Result(ResultType.FAILED, str(e)) + + return Result(ResultType.COMPLETED) + + +class RemoveMetricsStorageOfferStep(BaseStep, JujuStepHelper): + """Remove the S3 offer integration from Gnocchi. + + Workaround for https://github.com/juju/terraform-provider-juju/issues/119 + """ + + def __init__( + self, + deployment: Deployment, + feature: "TelemetryFeature", + jhelper: JujuHelper, + ): + super().__init__( + "Remove metrics storage offer", + "Removing S3 metrics storage offer from Gnocchi", + ) + self.deployment = deployment + self.feature = feature + self.jhelper = jhelper + self.endpoints = [GNOCCHI_S3_ENDPOINT] + + def _get_relations(self, model: str, endpoints: list[str]) -> list[tuple]: + """Return model relations for the provided endpoints.""" + relations = [] + model_status = self.jhelper.get_model_status(model) + for endpoint in endpoints: + app, relation = endpoint.split(":") + if app not in model_status.apps: + continue + app_status = model_status.apps[app] + if relation in app_status.relations: + relations.append((endpoint, app_status.relations[relation])) + return relations + + def run(self, context: StepContext) -> Result: + """Remove the S3 relation from Gnocchi.""" + relations = self._get_relations(OPENSTACK_MODEL, self.endpoints) + LOG.debug(f"S3 relations to remove: {relations}") + for relation_pair in relations: + self.remove_relation( + OPENSTACK_MODEL, + relation_pair[0], + relation_pair[1], + ) + + return Result(ResultType.COMPLETED) + class TelemetryFeature(OpenStackControlPlaneFeature): version = Version("0.0.1") @@ -42,6 +226,52 @@ class TelemetryFeature(OpenStackControlPlaneFeature): name = "telemetry" tf_plan_location = TerraformPlanLocation.SUNBEAM_TERRAFORM_REPO + def __init__(self) -> None: + super().__init__() + self.metrics_storage_offer_url = "" + + def _load_metrics_config( + self, deployment: Deployment + ) -> TelemetryMetricsBackendConfig: + """Load the metrics storage backend config from cluster DB.""" + client = deployment.get_client() + answers = load_answers(client, TELEMETRY_METRICS_BACKEND_KEY) + return TelemetryMetricsBackendConfig.model_validate(answers) + + def _save_metrics_config( + self, deployment: Deployment, config: TelemetryMetricsBackendConfig + ) -> None: + """Save the metrics storage backend config to cluster DB.""" + client = deployment.get_client() + write_answers(client, TELEMETRY_METRICS_BACKEND_KEY, config.model_dump()) + + def _has_metrics_storage(self, deployment: Deployment) -> bool: + """Check if metrics storage (Ceph or S3) is available for Gnocchi. + + S3 offer takes precedence over microceph: if an S3 offer was + configured (instance state or cluster DB), it wins even when + storage nodes exist. + """ + # S3 offer — instance state (set during enable_cmd or disable flow) + if self.metrics_storage_offer_url: + return True + # S3 offer — cluster DB (persisted from a previous enable) + try: + config = self._load_metrics_config(deployment) + if config.offer_url: + return True + except ConfigItemNotFoundException: + LOG.debug("Failed to load metrics config from cluster DB", exc_info=True) + # Storage nodes with internal Ceph (microceph) + try: + client = deployment.get_client() + return bool( + client.cluster.list_nodes_by_role("storage") + and is_internal_ceph_enabled_feature_aware(deployment, client) + ) + except ConfigItemNotFoundException: + return False + def default_software_overrides(self) -> SoftwareConfig: """Feature software configuration.""" return SoftwareConfig( @@ -89,7 +319,6 @@ def run_enable_plans( tfhelper = deployment.get_tfhelper(self.tfplan) tfhelper_openstack = deployment.get_tfhelper("openstack-plan") tfhelper_hypervisor = deployment.get_tfhelper("hypervisor-plan") - tfhelper_cinder_volume = deployment.get_tfhelper("cinder-volume-plan") jhelper = JujuHelper(deployment.juju_controller) plan1: list[BaseStep] = [] if self.user_manifest: @@ -104,94 +333,54 @@ def run_enable_plans( ) run_plan(plan1, console, show_hints) + # Integrate S3 metrics storage offer with Gnocchi after deployment + if self.metrics_storage_offer_url: + self._save_metrics_config( + deployment, + TelemetryMetricsBackendConfig(offer_url=self.metrics_storage_offer_url), + ) + s3_plan: list[BaseStep] = [ + IntegrateMetricsStorageOfferStep(deployment, self, jhelper), + ] + run_plan(s3_plan, console, show_hints) + openstack_tf_output = tfhelper_openstack.output() extra_tfvars = { "ceilometer-offer-url": openstack_tf_output.get("ceilometer-offer-url") } - extra_tfvars_cinder_volume = {"enable-telemetry-notifications": True} - plan2: list[BaseStep] = [] - plan2.extend( - [ - TerraformInitStep(tfhelper_hypervisor), - # No need to pass any extra terraform vars for this feature - ReapplyHypervisorTerraformPlanStep( - deployment.get_client(), - tfhelper_hypervisor, - jhelper, - self.manifest, - deployment.openstack_machines_model, - extra_tfvars=extra_tfvars, - ), - TerraformInitStep(tfhelper_cinder_volume), - DeployCinderVolumeApplicationStep( - deployment, - deployment.get_client(), - tfhelper_cinder_volume, - jhelper, - self.manifest, - deployment.openstack_machines_model, - extra_tfvars=extra_tfvars_cinder_volume, - ), - ] - ) + plan2: list[BaseStep] = [ + TerraformInitStep(tfhelper_hypervisor), + ReapplyHypervisorTerraformPlanStep( + deployment.get_client(), + tfhelper_hypervisor, + jhelper, + self.manifest, + deployment.openstack_machines_model, + extra_tfvars=extra_tfvars, + deployment=deployment, + ), + ] run_plan(plan2, console, show_hints) - # Deploy specific cinder-volume applications for each storage backend + # Update telemetry notification flag on all existing cinder-volume entries client = deployment.get_client() - storage_backends = client.cluster.get_storage_backends() - - if storage_backends.root: - storage_manager = StorageBackendManager() - tfhelper_storage = deployment.get_tfhelper("storage-plan") - - plan3: list[BaseStep] = [] - plan3.append(TerraformInitStep(tfhelper_storage)) - - # Track principal applications to avoid duplicates - processed_principals = set() - - for backend_metadata in storage_backends.root: - # Get the backend instance from the manager - backend_type = backend_metadata.type - backend_name = backend_metadata.name - - try: - backend_instance = storage_manager.backends().get(backend_type) - if backend_instance: - # Skip if we've already processed this principal application - principal_app = backend_instance.principal_application - if principal_app in processed_principals: - LOG.debug( - f"Skipping {backend_name}: principal application " - f"{principal_app} already processed" - ) - continue - - processed_principals.add(principal_app) - - # Add step to deploy specific cinder-volume for this backend - plan3.append( - DeploySpecificCinderVolumeStep( - deployment, - client, - tfhelper_storage, - jhelper, - self.manifest, - backend_name, - backend_instance, - deployment.openstack_machines_model, - extra_tfvars=extra_tfvars_cinder_volume, - ) - ) - except Exception as e: - LOG.warning( - f"Failed to add specific cinder-volume step for backend " - f"{backend_name}: {e}" - ) + register_storage_terraform_plan(deployment) + tfhelper_storage = deployment.get_tfhelper(STORAGE_TFPLAN) - if len(plan3) > 1: # More than just TerraformInitStep - run_plan(plan3, console, show_hints) + plan3: list[BaseStep] = [ + TerraformInitStep(tfhelper_storage), + UpdateCinderVolumeTelemetryTfvarsStep(client, enable=True), + ReapplyStorageBackendTerraformPlanStep( + deployment, + client, + tfhelper_storage, + jhelper, + self.manifest, + deployment.openstack_machines_model, + ), + ] + run_plan(plan3, console, show_hints) click.echo(f"OpenStack {self.display_name} application enabled.") @@ -199,10 +388,23 @@ def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: """Run plans to disable the feature.""" tfhelper = deployment.get_tfhelper(self.tfplan) tfhelper_hypervisor = deployment.get_tfhelper("hypervisor-plan") - tfhelper_cinder_volume = deployment.get_tfhelper("cinder-volume-plan") jhelper = JujuHelper(deployment.juju_controller) + + # Load persisted S3 config for cleanup + metrics_config = self._load_metrics_config(deployment) + if metrics_config.offer_url: + self.metrics_storage_offer_url = metrics_config.offer_url + s3_removal_plan: list[BaseStep] = [ + RemoveMetricsStorageOfferStep(deployment, self, jhelper), + RemoveSaasApplicationsStep( + jhelper, + OPENSTACK_MODEL, + offering_interfaces=["s3"], + ), + ] + run_plan(s3_removal_plan, console, show_hints) + extra_tfvars = {"ceilometer-offer-url": None} - extra_tfvars_cinder_volume = {"enable-telemetry-notifications": False} plan = [ TerraformInitStep(tfhelper_hypervisor), ReapplyHypervisorTerraformPlanStep( @@ -212,16 +414,7 @@ def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: self.manifest, deployment.openstack_machines_model, extra_tfvars=extra_tfvars, - ), - TerraformInitStep(tfhelper_cinder_volume), - DeployCinderVolumeApplicationStep( - deployment, - deployment.get_client(), - tfhelper_cinder_volume, - jhelper, - self.manifest, - deployment.openstack_machines_model, - extra_tfvars=extra_tfvars_cinder_volume, + deployment=deployment, ), RemoveSaasApplicationsStep( jhelper, @@ -235,65 +428,31 @@ def run_disable_plans(self, deployment: Deployment, show_hints: bool) -> None: run_plan(plan, console, show_hints) - # Update specific cinder-volume applications for each storage backend + # Update telemetry notification flag on all existing cinder-volume entries client = deployment.get_client() - storage_backends = client.cluster.get_storage_backends() - - if storage_backends.root: - storage_manager = StorageBackendManager() - tfhelper_storage = deployment.get_tfhelper("storage-plan") - - plan2: list[BaseStep] = [] - plan2.append(TerraformInitStep(tfhelper_storage)) - - # Track principal applications to avoid duplicates - processed_principals = set() - - for backend_metadata in storage_backends.root: - # Get the backend instance from the manager - backend_type = backend_metadata.type - backend_name = backend_metadata.name - - try: - backend_instance = storage_manager.backends().get(backend_type) - if backend_instance: - # Skip if we've already processed this principal application - principal_app = backend_instance.principal_application - if principal_app in processed_principals: - LOG.debug( - f"Skipping {backend_name}: principal application " - f"{principal_app} already processed" - ) - continue - - processed_principals.add(principal_app) - - # Add step to update specific cinder-volume for this backend - # (this will reapply with enable-telemetry-notifications=False) - plan2.append( - DeploySpecificCinderVolumeStep( - deployment, - client, - tfhelper_storage, - jhelper, - self.manifest, - backend_name, - backend_instance, - deployment.openstack_machines_model, - extra_tfvars=extra_tfvars_cinder_volume, - ) - ) - except Exception as e: - LOG.warning( - f"Failed to add specific cinder-volume step for backend " - f"{backend_name}: {e}" - ) + register_storage_terraform_plan(deployment) + tfhelper_storage = deployment.get_tfhelper(STORAGE_TFPLAN) - if len(plan2) > 1: # More than just TerraformInitStep - run_plan(plan2, console, show_hints) + plan2: list[BaseStep] = [ + TerraformInitStep(tfhelper_storage), + UpdateCinderVolumeTelemetryTfvarsStep(client, enable=False), + ReapplyStorageBackendTerraformPlanStep( + deployment, + client, + tfhelper_storage, + jhelper, + self.manifest, + deployment.openstack_machines_model, + ), + ] + run_plan(plan2, console, show_hints) click.echo(f"OpenStack {self.display_name} application disabled.") + # Clear persisted S3 config after successful disable + if metrics_config.offer_url: + self._save_metrics_config(deployment, TelemetryMetricsBackendConfig()) + def set_application_names(self, deployment: Deployment) -> list: """Application names handled by the terraform plan.""" database_topology = self.get_database_topology(deployment) @@ -302,7 +461,7 @@ def set_application_names(self, deployment: Deployment) -> list: if database_topology == "multi": apps.append("aodh-mysql") - if deployment.get_client().cluster.list_nodes_by_role("storage"): + if self._has_metrics_storage(deployment): apps.extend(["ceilometer", "gnocchi", "gnocchi-mysql-router"]) if database_topology == "multi": apps.append("gnocchi-mysql") @@ -317,13 +476,19 @@ def set_tfvars_on_enable( self, deployment: Deployment, config: FeatureConfig ) -> dict: """Set terraform variables to enable the application.""" - return { + tfvars: dict = { "enable-telemetry": True, } + if self.metrics_storage_offer_url: + tfvars["metrics-storage-offer-url"] = self.metrics_storage_offer_url + return tfvars def set_tfvars_on_disable(self, deployment: Deployment) -> dict: """Set terraform variables to disable the application.""" - return {"enable-telemetry": False} + return { + "enable-telemetry": False, + "metrics-storage-offer-url": "", + } def set_tfvars_on_resize( self, deployment: Deployment, config: FeatureConfig @@ -339,10 +504,69 @@ def get_database_charm_processes(self) -> dict[str, dict[str, int]]: } @click.command() + @click.option( + "--metrics-storage-controller", + type=str, + default=None, + help=( + "Juju controller name for the S3 metrics storage offer" + " (required for cross-controller offers)" + ), + ) + @click.option( + "--metrics-storage-offer", + type=str, + default=None, + help=( + "Juju offer URL for S3-compatible storage backend for Gnocchi" + " (mandatory when microceph is not configured)" + ), + ) @click_option_show_hints @pass_method_obj - def enable_cmd(self, deployment: Deployment, show_hints: bool) -> None: - """Enable OpenStack Telemetry applications.""" + def enable_cmd( + self, + deployment: Deployment, + metrics_storage_controller: str | None, + metrics_storage_offer: str | None, + show_hints: bool, + ) -> None: + """Enable OpenStack Telemetry applications. + + Metrics storage precedence: a configured S3 offer always takes + priority over internal Ceph-backed object storage. To switch back to + internal storage the user + must first disable the telemetry feature, then re-enable it + without --metrics-storage-offer. + """ + client = deployment.get_client() + existing_config = self._load_metrics_config(deployment) + + if metrics_storage_offer: + # Explicit S3 offer provided — use it (overrides any prior config) + if metrics_storage_controller: + self.metrics_storage_offer_url = ( + f"{metrics_storage_controller}:{metrics_storage_offer}" + ) + data_location = self.snap.paths.user_data + preflight_checks: list[Check] = [ + JujuControllerRegistrationCheck( + metrics_storage_controller, data_location + ) + ] + run_preflight_checks(preflight_checks, console) + else: + self.metrics_storage_offer_url = metrics_storage_offer + elif existing_config.offer_url: + # No new offer, but S3 was previously configured — keep it + self.metrics_storage_offer_url = existing_config.offer_url + elif not is_internal_ceph_enabled_feature_aware(deployment, client): + # No S3 (new or existing) and no internal Ceph storage. + raise click.ClickException( + "No internal storage is configured. --metrics-storage-offer is " + "required to provide S3-compatible storage for Gnocchi metrics." + ) + self.enable_feature(deployment, FeatureConfig(), show_hints) @click.command() diff --git a/sunbeam-python/sunbeam/features/tls/ca.py b/sunbeam-python/sunbeam/features/tls/ca.py index 5ec2fd502..77ccd166f 100644 --- a/sunbeam-python/sunbeam/features/tls/ca.py +++ b/sunbeam-python/sunbeam/features/tls/ca.py @@ -15,6 +15,7 @@ ConfigItemNotFoundException, ) from sunbeam.core import questions +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.common import ( FORMAT_TABLE, FORMAT_YAML, @@ -243,7 +244,9 @@ def configure( preseed = ca.config.model_dump(by_alias=True) model = OPENSTACK_MODEL apps_to_monitor = ["traefik", "traefik-public", "keystone"] - if client.cluster.list_nodes_by_role("storage"): + if client.cluster.list_nodes_by_role("storage") and is_internal_ceph_enabled( + client + ): apps_to_monitor.append("traefik-rgw") try: diff --git a/sunbeam-python/sunbeam/features/tls/common.py b/sunbeam-python/sunbeam/features/tls/common.py index 3ef7b147b..3a4773bcc 100644 --- a/sunbeam-python/sunbeam/features/tls/common.py +++ b/sunbeam-python/sunbeam/features/tls/common.py @@ -13,6 +13,7 @@ from sunbeam.clusterd.client import Client from sunbeam.clusterd.service import ConfigItemNotFoundException from sunbeam.core import questions +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.common import ( BaseStep, Result, @@ -198,7 +199,9 @@ def post_disable(self, deployment: Deployment, show_hints: bool) -> None: apps_to_monitor = ["traefik", "traefik-public"] if not deployment.external_keystone_model: apps_to_monitor.append("keystone") - if client.cluster.list_nodes_by_role("storage"): + if client.cluster.list_nodes_by_role("storage") and is_internal_ceph_enabled( + client + ): apps_to_monitor.append("traefik-rgw") plan: list[BaseStep] = [ diff --git a/sunbeam-python/sunbeam/provider/local/commands.py b/sunbeam-python/sunbeam/provider/local/commands.py index 348524ae9..9162db305 100644 --- a/sunbeam-python/sunbeam/provider/local/commands.py +++ b/sunbeam-python/sunbeam/provider/local/commands.py @@ -5,7 +5,7 @@ import json import logging from pathlib import Path -from typing import Tuple, Type +from typing import Any, Tuple, Type import click import yaml @@ -29,6 +29,14 @@ from sunbeam.commands.dashboard_url import retrieve_dashboard_url from sunbeam.commands.proxy import PromptForProxyStep from sunbeam.core import ovn +from sunbeam.core.ceph import ( + SetCephProviderStep, + ensure_default_ceph_feature, + get_default_ceph_bootstrap_steps, + is_internal_ceph_enabled, + is_internal_ceph_enabled_feature_aware, + set_ceph_feature_enabled_state, +) from sunbeam.core.checks import ( Check, DaemonGroupCheck, @@ -84,6 +92,10 @@ feature_gate_option, split_roles_enabled, ) +from sunbeam.features.ceph.microceph import ( + CheckMicrocephDistributionStep, + RemoveMicrocephUnitsStep, +) from sunbeam.provider.base import ProviderBase from sunbeam.provider.common.multiregion import connect_to_region_controller from sunbeam.provider.local.deployment import LOCAL_TYPE, LocalDeployment @@ -98,11 +110,6 @@ ) from sunbeam.steps import cluster_status from sunbeam.steps.bootstrap_state import SetBootstrapped -from sunbeam.steps.cinder_volume import ( - CheckCinderVolumeDistributionStep, - DeployCinderVolumeApplicationStep, - RemoveCinderVolumeUnitsStep, -) from sunbeam.steps.clusterd import ( AskManagementCidrStep, ClusterAddJujuUserStep, @@ -166,12 +173,6 @@ StoreK8SKubeConfigStep, UpdateK8SCloudStep, ) -from sunbeam.steps.microceph import ( - CheckMicrocephDistributionStep, - ConfigureMicrocephOSDStep, - DeployMicrocephApplicationStep, - RemoveMicrocephUnitsStep, -) from sunbeam.steps.microovn import ( DeployMicroOVNApplicationStep, ReapplyMicroOVNOptionalIntegrationsStep, @@ -196,6 +197,10 @@ RemoveSunbeamMachineUnitsStep, ) from sunbeam.steps.sync_feature_gates import SyncFeatureGatesToCluster +from sunbeam.storage.steps import ( + CheckStorageNodeRemovalStep, + RemoveStorageMachineUnitsStep, +) from sunbeam.utils import ( CatchGroup, click_option_show_hints, @@ -218,6 +223,42 @@ def remove_trailing_dot(value: str) -> str: return value.rstrip(".") +def _call_enabled_feature_join_hooks( + deployment: LocalDeployment, + node_info: Any, + name: str, + roles: list[str], + accept_defaults: bool = False, +) -> None: + """Call on_join hook for all enabled features.""" + deployment.get_feature_manager().call_enabled_features_on_join( + deployment, + node_info, + node_name=name, + roles=roles, + status="joined", + accept_defaults=accept_defaults, + ) + + +def _call_enabled_feature_depart_hooks( + deployment: LocalDeployment, + node_info: Any, + name: str, + roles: list[str], + force: bool, +) -> None: + """Call on_depart hook for all enabled features.""" + deployment.get_feature_manager().call_enabled_features_on_depart( + deployment, + node_info, + node_name=name, + roles=roles, + status="departed", + force=force, + ) + + class LocalProvider(ProviderBase): def register_add_cli(self, add: click.Group) -> None: """A local provider cannot add deployments.""" @@ -624,6 +665,16 @@ def deploy_and_migrate_juju_controller( help="Token obtained from the region controller.", type=str, ) +@click.option( + "--no-default-storage", + "no_default_storage", + is_flag=True, + default=False, + help=( + "Do not deploy the default storage backend. " + "Storage role will still be available for external storage backends." + ), +) @click_option_show_hints @click.pass_context def bootstrap( # noqa: C901 @@ -636,6 +687,7 @@ def bootstrap( # noqa: C901 accept_defaults: bool = False, show_hints: bool = False, region_controller_token: str | None = None, + no_default_storage: bool = False, ) -> None: """Bootstrap the local node. @@ -753,6 +805,7 @@ def bootstrap( # noqa: C901 plan.append(SyncFeatureGatesToCluster(client)) plan.append(SaveManagementCidrStep(client, management_cidr)) plan.append(SetOvnProviderStep(client, snap)) + plan.append(SetCephProviderStep(client, no_default_storage=no_default_storage)) plan.append(AddManifestStep(client, manifest_path)) plan.append( PromptForProxyStep( @@ -763,6 +816,8 @@ def bootstrap( # noqa: C901 plan.append(PromptRegionStep(client, manifest, accept_defaults)) plan.append(ValidateIdentityManifest(client, manifest)) run_plan(plan, console, show_hints) + if no_default_storage: + set_ceph_feature_enabled_state(deployment, client, enabled=False) if region_controller_token: connect_to_region_controller( @@ -844,46 +899,23 @@ def bootstrap( # noqa: C901 ) ) - # Deploy Microceph application during bootstrap irrespective of node role. - microceph_tfhelper = deployment.get_tfhelper("microceph-plan") - plan1.append(TerraformInitStep(microceph_tfhelper)) - plan1.append( - DeployMicrocephApplicationStep( - deployment, - client, - microceph_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - cinder_volume_tfhelper = deployment.get_tfhelper("cinder-volume-plan") - plan1.append(TerraformInitStep(cinder_volume_tfhelper)) - plan1.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - openstack_tfhelper = deployment.get_tfhelper("openstack-plan") plan1.append(TerraformInitStep(openstack_tfhelper)) - if is_storage_node: - plan1.append( - ConfigureMicrocephOSDStep( - client, - fqdn, - jhelper, - deployment.openstack_machines_model, - accept_defaults=accept_defaults, - manifest=manifest, - ) + # Inline microceph deploy + OSD config so data.juju_offer.microceph + # exists when DeployControlPlaneStep reads it. Gated by the CLI + # flag and the bootstrap node's storage role (mirrors the + # MicrocephProvider.get_control_plane_tfvars gate on + # storage_node_count > 0). + plan1.extend( + get_default_ceph_bootstrap_steps( + deployment, + enabled=not no_default_storage, + expect_storage_node=is_storage_node, + node_name=fqdn if is_storage_node else None, + accept_defaults=accept_defaults, ) + ) if is_control_node or is_region_controller: plan1.append( @@ -921,30 +953,6 @@ def bootstrap( # noqa: C901 manifest, ) ) - # Redeploy of Microceph is required to fill terraform vars - # related to traefik-rgw/keystone-endpoints offers from - # openstack model - plan1.append( - DeployMicrocephApplicationStep( - deployment, - client, - microceph_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - # Fill AMQP / Keystone / MySQL offers from openstack model - plan1.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) if microovn_necessary: plan1.append( ReapplyMicroOVNOptionalIntegrationsStep( @@ -960,6 +968,14 @@ def bootstrap( # noqa: C901 run_plan(plan1, console, show_hints) + if is_storage_node and not no_default_storage: + ensure_default_ceph_feature( + deployment, + show_hints, + node_name=fqdn, + accept_defaults=accept_defaults, + ) + plan2: list[BaseStep] = [] if is_control_node or is_region_controller: @@ -977,7 +993,6 @@ def bootstrap( # noqa: C901 client, hypervisor_tfhelper, openstack_tfhelper, - cinder_volume_tfhelper, jhelper, manifest, deployment.openstack_machines_model, @@ -1053,6 +1068,7 @@ def configure_sriov( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), ] if manifest and manifest.core.config.pci and manifest.core.config.pci.aliases: @@ -1125,6 +1141,7 @@ def configure_dpdk( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), ] run_plan(plan, console, show_hints) @@ -1472,8 +1489,6 @@ def join( # noqa: C901 proxy_settings = deployment.get_proxy_settings() sunbeam_machine_tfhelper = deployment.get_tfhelper("sunbeam-machine-plan") k8s_tfhelper = deployment.get_tfhelper("k8s-plan") - cinder_volume_tfhelper = deployment.get_tfhelper("cinder-volume-plan") - microceph_tfhelper = deployment.get_tfhelper("microceph-plan") openstack_tfhelper = deployment.get_tfhelper("openstack-plan") hypervisor_tfhelper = deployment.get_tfhelper("hypervisor-plan") @@ -1548,7 +1563,6 @@ def join( # noqa: C901 plan4.append(TerraformInitStep(openstack_tfhelper)) plan4.append(TerraformInitStep(hypervisor_tfhelper)) - plan4.append(TerraformInitStep(cinder_volume_tfhelper)) if microovn_necessary: microovn_tfhelper = deployment.get_tfhelper("microovn-plan") plan4.append(TerraformInitStep(microovn_tfhelper)) @@ -1599,35 +1613,19 @@ def join( # noqa: C901 ) if is_storage_node: - plan4.append(TerraformInitStep(microceph_tfhelper)) - plan4.append( - DeployMicrocephApplicationStep( + # Inline microceph deploy + OSD config must run before the + # first-storage DeployControlPlaneStep below so that the + # microceph offer exists when openstack-plan reads it. The + # post-plan4 feature hooks / ensure_default_ceph_feature call + # still run afterwards and idempotently re-apply microceph + + # deploy the cinder-volume-ceph backend. + plan4.extend( + get_default_ceph_bootstrap_steps( deployment, - client, - microceph_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - plan4.append( - ConfigureMicrocephOSDStep( - client, - name, - jhelper, - deployment.openstack_machines_model, + enabled=is_internal_ceph_enabled(client), + expect_storage_node=True, + node_name=name, accept_defaults=accept_defaults, - manifest=manifest, - ) - ) - plan4.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, ) ) @@ -1647,46 +1645,6 @@ def join( # noqa: C901 ) ) - # Redeploy of Microceph is required to fill terraform vars - # related to traefik-rgw/keystone-endpoints offers from - # openstack model - microceph_tfhelper = deployment.get_tfhelper("microceph-plan") - plan4.append(TerraformInitStep(microceph_tfhelper)) - plan4.append( - DeployMicrocephApplicationStep( - deployment, - client, - microceph_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - # Fill AMQP / Keystone / MySQL offers from openstack model - plan4.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - - plan4.append( - ReapplyHypervisorOptionalIntegrationsStep( - deployment, - client, - hypervisor_tfhelper, - openstack_tfhelper, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - if is_compute_node: hypervisor_tfhelper = deployment.get_tfhelper("hypervisor-plan") plan4.extend( @@ -1697,7 +1655,6 @@ def join( # noqa: C901 client, hypervisor_tfhelper, openstack_tfhelper, - cinder_volume_tfhelper, jhelper, manifest, deployment.openstack_machines_model, @@ -1741,6 +1698,7 @@ def join( # noqa: C901 jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), ] ) @@ -1757,6 +1715,36 @@ def join( # noqa: C901 ) run_plan(plan4, console, show_hints) + node_info = client.cluster.get_node_info(name) + _call_enabled_feature_join_hooks( + deployment, node_info, name, roles_str, accept_defaults=accept_defaults + ) + if is_storage_node and is_internal_ceph_enabled(client): + ensure_default_ceph_feature( + deployment, + show_hints, + node_name=name, + accept_defaults=accept_defaults, + ) + + # Reapply hypervisor optional integrations AFTER + # ensure_default_ceph_feature has registered internal-ceph in + # clusterd, so collect_hypervisor_integrations finds the backend + # and wires the cinder-volume-ceph:ceph-access integration on + # first storage join. + if is_storage_node: + plan5: list[BaseStep] = [ + ReapplyHypervisorOptionalIntegrationsStep( + deployment, + client, + hypervisor_tfhelper, + openstack_tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ] + run_plan(plan5, console, show_hints) click.echo(f"Node joined cluster with roles: {pretty_roles}") @@ -1817,6 +1805,12 @@ def remove(ctx: click.Context, name: str, force: bool, show_hints: bool) -> None deployment: LocalDeployment = ctx.obj client = deployment.get_client() jhelper = JujuHelper(deployment.juju_controller) + try: + node_info = client.cluster.get_node_info(name) + except Exception: + node_info = {"name": name} + node_roles = node_info.get("role", []) if isinstance(node_info, dict) else [] + internal_ceph_enabled = is_internal_ceph_enabled_feature_aware(deployment, client) preflight_checks = [DaemonGroupCheck()] run_preflight_checks(preflight_checks, console) @@ -1828,15 +1822,8 @@ def remove(ctx: click.Context, name: str, force: bool, show_hints: bool) -> None if not force: plan.append(PromptCheckNodeExistStep(client, name)) - plan.extend( - [ - CheckCinderVolumeDistributionStep( - client, - name, - jhelper, - deployment.openstack_machines_model, - force=force, - ), + if internal_ceph_enabled: + plan.append( CheckMicrocephDistributionStep( client, name, @@ -1844,6 +1831,18 @@ def remove(ctx: click.Context, name: str, force: bool, show_hints: bool) -> None deployment.openstack_machines_model, force=force, ), + ) + plan.append( + CheckStorageNodeRemovalStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + force=force, + ), + ) + plan.extend( + [ CheckMysqlK8SDistributionStep( client, name, @@ -1877,12 +1876,21 @@ def remove(ctx: click.Context, name: str, force: bool, show_hints: bool) -> None deployment.openstack_machines_model, force, ), - RemoveCinderVolumeUnitsStep( - client, name, jhelper, deployment.openstack_machines_model - ), + ] + ) + plan.append( + RemoveStorageMachineUnitsStep( + client, name, jhelper, deployment.openstack_machines_model + ), + ) + if internal_ceph_enabled: + plan.append( RemoveMicrocephUnitsStep( client, name, jhelper, deployment.openstack_machines_model ), + ) + plan.extend( + [ RemoveMicroOVNUnitsStep( client, name, jhelper, deployment.openstack_machines_model ), @@ -1927,6 +1935,9 @@ def remove(ctx: click.Context, name: str, force: bool, show_hints: bool) -> None ) run_plan(plan, console, show_hints) + _call_enabled_feature_depart_hooks( + deployment, node_info, name, node_roles, force=force + ) click.echo(f"Removed node {name} from the cluster") # Removing machine does not clean up all deployed juju components. This is # deliberate, see https://bugs.launchpad.net/juju/+bug/1851489. @@ -2059,6 +2070,7 @@ def configure_cmd( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ) ) diff --git a/sunbeam-python/sunbeam/provider/local/deployment.py b/sunbeam-python/sunbeam/provider/local/deployment.py index c98fe4c1e..7272a61df 100644 --- a/sunbeam-python/sunbeam/provider/local/deployment.py +++ b/sunbeam-python/sunbeam/provider/local/deployment.py @@ -17,6 +17,7 @@ URLNotFoundException, ) from sunbeam.commands.proxy import proxy_questions +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.checks import DaemonGroupCheck from sunbeam.core.common import SunbeamException from sunbeam.core.deployment import PROXY_CONFIG_KEY, CertPair, Deployment, Networks @@ -33,6 +34,7 @@ generate_endpoint_preseed_questions, ) from sunbeam.core.questions import QuestionBank, load_answers, show_questions +from sunbeam.features.ceph.microceph import CONFIG_DISKS_KEY, microceph_questions from sunbeam.provider.local.steps import local_external_network_agent_questions from sunbeam.steps.clusterd import ( BOOTSTRAP_CONFIG_KEY, @@ -46,7 +48,6 @@ user_questions, ) from sunbeam.steps.k8s import K8S_ADDONS_CONFIG_KEY, k8s_addons_questions -from sunbeam.steps.microceph import CONFIG_DISKS_KEY, microceph_questions from sunbeam.steps.openstack import ( TOPOLOGY_KEY, database_topology_questions, @@ -288,26 +289,27 @@ def generate_core_config(self, console: Console) -> str: variables = load_answers(client, CONFIG_DISKS_KEY) except ClusterServiceUnavailableException: variables = {} - microceph_content: list[str] = [] - for name, disks in variables.get("microceph_config", {fqdn: None}).items(): - microceph_config_bank = QuestionBank( - questions=microceph_questions(), - console=console, - previous_answers=disks, - ) - lines = show_questions( - microceph_config_bank, - section="microceph_config", - subsection=name, - section_description="MicroCeph config", - ) - # if there's more than one microceph, - # don't rewrite the section and section description - if len(microceph_content) < 2: - microceph_content.extend(lines) - else: - microceph_content.extend(lines[2:]) - preseed_content.extend(microceph_content) + if is_internal_ceph_enabled(client): + microceph_content: list[str] = [] + for name, disks in variables.get("microceph_config", {fqdn: None}).items(): + microceph_config_bank = QuestionBank( + questions=microceph_questions(), + console=console, + previous_answers=disks, + ) + lines = show_questions( + microceph_config_bank, + section="microceph_config", + subsection=name, + section_description="MicroCeph config", + ) + # if there's more than one microceph, + # don't rewrite the section and section description + if len(microceph_content) < 2: + microceph_content.extend(lines) + else: + microceph_content.extend(lines[2:]) + preseed_content.extend(microceph_content) preseed_content_final = "\n".join(preseed_content) return preseed_content_final diff --git a/sunbeam-python/sunbeam/provider/maas/commands.py b/sunbeam-python/sunbeam/provider/maas/commands.py index 0cb5124ef..609639393 100644 --- a/sunbeam-python/sunbeam/provider/maas/commands.py +++ b/sunbeam-python/sunbeam/provider/maas/commands.py @@ -7,7 +7,7 @@ from collections import Counter from datetime import datetime from pathlib import Path -from typing import Sequence, Tuple, Type +from typing import Any, Sequence, Tuple, Type import click import yaml @@ -28,6 +28,13 @@ from sunbeam.commands.dashboard_url import retrieve_dashboard_url from sunbeam.commands.proxy import PromptForProxyStep from sunbeam.core import ovn +from sunbeam.core.ceph import ( + SetCephProviderStep, + ensure_default_ceph_feature, + get_default_ceph_bootstrap_steps, + is_internal_ceph_enabled_feature_aware, + set_ceph_feature_enabled_state, +) from sunbeam.core.checks import ( Check, DiagnosticResultType, @@ -69,6 +76,12 @@ from sunbeam.core.openstack import OPENSTACK_MODEL from sunbeam.core.terraform import TerraformInitStep from sunbeam.feature_gates import feature_gate_option, split_roles_enabled +from sunbeam.features.ceph.microceph import ( + CheckMicrocephDistributionStep, + DestroyMicrocephApplicationStep, + RemoveMicrocephUnitsStep, + SetCephMgrPoolSizeStep, +) from sunbeam.provider.base import ProviderBase from sunbeam.provider.common.multiregion import connect_to_region_controller from sunbeam.provider.maas.client import ( @@ -96,7 +109,6 @@ MaasClusterStatusStep, MaasConfigDPDKStep, MaasConfigSRIOVStep, - MaasConfigureMicrocephOSDStep, MaasCreateLoadBalancerIPPoolsStep, MaasDeployInfraMachinesStep, MaasDeployK8SApplicationStep, @@ -122,12 +134,6 @@ from sunbeam.steps.bootstrap_state import SetBootstrapped from sunbeam.steps.certificates import APPLICATION as CERTIFICATES_APPLICATION from sunbeam.steps.certificates import DeployCertificatesProviderApplicationStep -from sunbeam.steps.cinder_volume import ( - CheckCinderVolumeDistributionStep, - DeployCinderVolumeApplicationStep, - DestroyCinderVolumeApplicationStep, - RemoveCinderVolumeUnitsStep, -) from sunbeam.steps.clusterd import APPLICATION as CLUSTERD_APPLICATION from sunbeam.steps.clusterd import ( DeploySunbeamClusterdApplicationStep, @@ -170,13 +176,6 @@ StoreK8SKubeConfigStep, UpdateK8SCloudStep, ) -from sunbeam.steps.microceph import ( - CheckMicrocephDistributionStep, - DeployMicrocephApplicationStep, - DestroyMicrocephApplicationStep, - RemoveMicrocephUnitsStep, - SetCephMgrPoolSizeStep, -) from sunbeam.steps.microovn import ( DeployMicroOVNApplicationStep, ReapplyMicroOVNOptionalIntegrationsStep, @@ -203,6 +202,10 @@ ) from sunbeam.steps.sync_feature_gates import SyncFeatureGatesToCluster from sunbeam.steps.terraform import CleanTerraformPlansStep +from sunbeam.storage.steps import ( + CheckStorageNodeRemovalStep, + RemoveStorageMachineUnitsStep, +) from sunbeam.utils import ( CatchGroup, DefaultableMappingParameter, @@ -213,6 +216,41 @@ console = Console() +def _call_enabled_feature_join_hooks( + deployment: MaasDeployment, client: Any, node_names: Sequence[str] +) -> None: + """Call on_join hook for all enabled features.""" + feature_manager = deployment.get_feature_manager() + for node_name in sorted(set(node_names)): + try: + node_info = client.cluster.get_node_info(node_name) + except Exception: + node_info = {"name": node_name} + roles = node_info.get("role", []) if isinstance(node_info, dict) else [] + feature_manager.call_enabled_features_on_join( + deployment, + node_info, + node_name=node_name, + roles=roles, + status="joined", + ) + + +def _call_enabled_feature_depart_hooks( + deployment: MaasDeployment, node_info: Any, name: str, force: bool +) -> None: + """Call on_depart hook for all enabled features.""" + roles = node_info.get("role", []) if isinstance(node_info, dict) else [] + deployment.get_feature_manager().call_enabled_features_on_depart( + deployment, + node_info, + node_name=name, + roles=roles, + status="departed", + force=force, + ) + + @click.group("cluster", context_settings=CONTEXT_SETTINGS, cls=CatchGroup) @click.pass_context def cluster(ctx): @@ -559,6 +597,16 @@ def _name_mapper(node: dict) -> str: help="Manifest file.", type=click.Path(exists=True, dir_okay=False, path_type=Path), ) +@click.option( + "--no-default-storage", + "no_default_storage", + is_flag=True, + default=False, + help=( + "Do not deploy the default storage backend. " + "Storage role will still be available for external storage backends." + ), +) @click_option_topology @click_option_database @click_option_show_hints @@ -570,6 +618,7 @@ def deploy( topology: str = "auto", database: str = "auto", show_hints: bool = False, + no_default_storage: bool = False, ) -> None: """Deploy the MAAS-backed deployment. @@ -629,14 +678,16 @@ def deploy( tfhelper_sunbeam_machine = deployment.get_tfhelper("sunbeam-machine-plan") tfhelper_k8s = deployment.get_tfhelper("k8s-plan") - tfhelper_microceph = deployment.get_tfhelper("microceph-plan") - tfhelper_cinder_volume = deployment.get_tfhelper("cinder-volume-plan") tfhelper_openstack_deploy = deployment.get_tfhelper("openstack-plan") tfhelper_hypervisor_deploy = deployment.get_tfhelper("hypervisor-plan") tfhelper_microovn = deployment.get_tfhelper("microovn-plan") ovn_manager = deployment.get_ovn_manager() plan: list[BaseStep] = [] + # Persist the ceph deployment mode BEFORE any plan that would query it, + # so later queries (and the inline helper at plan2-assembly time) see + # the correct mode. Plan assembly occurs after run_plan(plan) below. + plan.append(SetCephProviderStep(client, no_default_storage=no_default_storage)) plan.append(AddManifestStep(client, manifest_path)) plan.append(SyncFeatureGatesToCluster(client)) plan.append( @@ -659,6 +710,9 @@ def deploy( ) run_plan(plan, console, show_hints) + if no_default_storage: + set_ceph_feature_enabled_state(deployment, client, enabled=False) + control = list( map(_name_mapper, client.cluster.list_nodes_by_role(RoleTags.CONTROL.value)) ) @@ -780,38 +834,6 @@ def deploy( plan2.append(AddK8SCloudStep(deployment, jhelper)) plan2.append(PatchCoreDNSStep(deployment, jhelper)) - plan2.append(TerraformInitStep(tfhelper_microceph)) - plan2.append( - DeployMicrocephApplicationStep( - deployment, - client, - tfhelper_microceph, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - plan2.append( - MaasConfigureMicrocephOSDStep( - client, - maas_client, - jhelper, - storage, - manifest, - deployment.openstack_machines_model, - ) - ) - plan2.append(TerraformInitStep(tfhelper_cinder_volume)) - plan2.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - tfhelper_cinder_volume, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) # Deploy MicroOVN and subordinate openstack-network-agents on network nodes microovn_necessary = ovn_manager.is_microovn_necessary_maas( nb_network, nb_compute, nb_control @@ -839,6 +861,17 @@ def deploy( accept_defaults=accept_defaults, ) ) + # Inline microceph deploy so data.juju_offer.microceph exists when + # DeployControlPlaneStep reads it. Gated explicitly on the CLI flag + # rather than re-querying clusterd because plan2 is built before the + # plan above has completed. + plan2.extend( + get_default_ceph_bootstrap_steps( + deployment, + enabled=not no_default_storage, + expect_storage_node=bool(storage), + ) + ) plan2.append( DeployControlPlaneStep( deployment, @@ -863,97 +896,83 @@ def deploy( ovn_manager, ) ) - plan2.append( + run_plan(plan2, console, show_hints) + if ( + storage + and not no_default_storage + and is_internal_ceph_enabled_feature_aware(deployment, client) + ): + ensure_default_ceph_feature( + deployment, + show_hints, + maas_client=maas_client, + storage=storage, + ) + + plan3: list[BaseStep] = [ SetKeystoneSAMLCertAndKeyStep( deployment=deployment, tfhelper=tfhelper_openstack_deploy, jhelper=jhelper, manifest=manifest, - ) - ) - plan2.append( + ), DeployIdentityProvidersStep( deployment, tfhelper_openstack_deploy, jhelper, manifest, - ) - ) - - plan2.append( + ), OpenStackPatchLoadBalancerServicesIPPoolStep( client, deployment.public_api_label - ) - ) - # Redeploy of Microceph is required to fill terraform vars - # related to traefik-rgw/keystone-endpoints offers from - # openstack model - plan2.append( - DeployMicrocephApplicationStep( - deployment, - client, - tfhelper_microceph, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - # Fill AMQP / Keystone / MySQL offers from openstack model - plan2.append( - DeployCinderVolumeApplicationStep( - deployment, - client, - tfhelper_cinder_volume, - jhelper, - manifest, - deployment.openstack_machines_model, - ) - ) - plan2.append(OpenStackPatchLoadBalancerServicesIPStep(client, ovn_manager)) + ), + OpenStackPatchLoadBalancerServicesIPStep(client, ovn_manager), + ] if nb_compute: - plan2.append(TerraformInitStep(tfhelper_hypervisor_deploy)) - plan2.append( + plan3.append(TerraformInitStep(tfhelper_hypervisor_deploy)) + plan3.append( DeployHypervisorApplicationStep( deployment, client, tfhelper_hypervisor_deploy, tfhelper_openstack_deploy, - tfhelper_cinder_volume, jhelper, manifest, deployment.openstack_machines_model, ) ) - plan2 += [ - MaasConfigSRIOVStep( - deployment, - client, - jhelper, - deployment.openstack_machines_model, - manifest, - accept_defaults, - ), - MaasConfigDPDKStep( - deployment, - client, - jhelper, - deployment.openstack_machines_model, - manifest, - accept_defaults, - ), - ReapplyHypervisorTerraformPlanStep( - client, - tfhelper_hypervisor_deploy, - jhelper, - manifest, - model=deployment.openstack_machines_model, - ), - ] + plan3.extend( + [ + MaasConfigSRIOVStep( + deployment, + client, + jhelper, + deployment.openstack_machines_model, + manifest, + accept_defaults, + ), + MaasConfigDPDKStep( + deployment, + client, + jhelper, + deployment.openstack_machines_model, + manifest, + accept_defaults, + ), + ReapplyHypervisorTerraformPlanStep( + client, + tfhelper_hypervisor_deploy, + jhelper, + manifest, + model=deployment.openstack_machines_model, + deployment=deployment, + ), + ] + ) if manifest and manifest.core.config.pci and manifest.core.config.pci.aliases: - plan2.append( + plan3.append( ReapplyOpenStackTerraformPlanStep( deployment, client, @@ -964,8 +983,13 @@ def deploy( ) ) - plan2.append(SetBootstrapped(client)) - run_plan(plan2, console, show_hints) + plan3.append(SetBootstrapped(client)) + run_plan(plan3, console, show_hints) + _call_enabled_feature_join_hooks( + deployment, + client, + list(set(control + compute + network + storage + region_controllers)), + ) console.print( f"Deployment complete with {nb_control} control, " @@ -1075,6 +1099,7 @@ def configure_cmd( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), MaasSetOpenStackNetworkAgentsStep( client, @@ -1613,6 +1638,11 @@ def remove_node(ctx: click.Context, name: str, force: bool, show_hints: bool) -> deployment: MaasDeployment = ctx.obj client = deployment.get_client() jhelper = JujuHelper(deployment.juju_controller) + try: + node_info = client.cluster.get_node_info(name) + except Exception: + node_info = {"name": name} + internal_ceph_enabled = is_internal_ceph_enabled_feature_aware(deployment, client) preflight_checks = [ LocalShareCheck(), @@ -1621,42 +1651,51 @@ def remove_node(ctx: click.Context, name: str, force: bool, show_hints: bool) -> check_plan: list[BaseStep] = [ JujuLoginStep(deployment.juju_account), - CheckCinderVolumeDistributionStep( - client, - name, - jhelper, - deployment.openstack_machines_model, - force=force, - ), - CheckMicrocephDistributionStep( - client, - name, - jhelper, - deployment.openstack_machines_model, - force=force, - ), - CheckMysqlK8SDistributionStep( - client, - name, - jhelper, - deployment.openstack_machines_model, - force=force, - ), - CheckOvnK8SDistributionStep( - client, - name, - jhelper, - deployment.openstack_machines_model, - force=force, - ), - CheckRabbitmqK8SDistributionStep( + ] + if internal_ceph_enabled: + check_plan.append( + CheckMicrocephDistributionStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + force=force, + ), + ) + check_plan.append( + CheckStorageNodeRemovalStep( client, name, jhelper, deployment.openstack_machines_model, force=force, ), - ] + ) + check_plan.extend( + [ + CheckMysqlK8SDistributionStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + force=force, + ), + CheckOvnK8SDistributionStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + force=force, + ), + CheckRabbitmqK8SDistributionStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + force=force, + ), + ] + ) run_plan(check_plan, console, show_hints) @@ -1673,50 +1712,73 @@ def remove_node(ctx: click.Context, name: str, force: bool, show_hints: bool) -> deployment.openstack_machines_model, force, ), - RemoveCinderVolumeUnitsStep( - client, name, jhelper, deployment.openstack_machines_model - ), - RemoveMicrocephUnitsStep( - client, name, jhelper, deployment.openstack_machines_model - ), - CordonK8SUnitStep(client, name, jhelper, deployment.openstack_machines_model), - DrainK8SUnitStep( - client, name, jhelper, deployment.openstack_machines_model, remove_pvc=True - ), - RemoveK8SUnitsStep(client, name, jhelper, deployment.openstack_machines_model), - EnsureCiliumDeviceByHostStep( - deployment, - client, - jhelper, - deployment.openstack_machines_model, - ), - EnsureL2AdvertisementByHostStep( - deployment, - client, - jhelper, - deployment.openstack_machines_model, - Networks.INTERNAL, - deployment.internal_ip_pool, - ), - EnsureL2AdvertisementByHostStep( - deployment, - client, - jhelper, - deployment.openstack_machines_model, - Networks.PUBLIC, - deployment.public_ip_pool, - ), - RemoveSunbeamMachineUnitsStep( - client, name, jhelper, deployment.openstack_machines_model - ), - RemoveJujuMachineStep( + ] + plan.append( + RemoveStorageMachineUnitsStep( client, name, jhelper, deployment.openstack_machines_model ), - MaasRemoveMachineFromClusterdStep(client, name), - SetCephMgrPoolSizeStep(client, jhelper, deployment.openstack_machines_model), - ] + ) + if internal_ceph_enabled: + plan.append( + RemoveMicrocephUnitsStep( + client, name, jhelper, deployment.openstack_machines_model + ), + ) + plan.extend( + [ + CordonK8SUnitStep( + client, name, jhelper, deployment.openstack_machines_model + ), + DrainK8SUnitStep( + client, + name, + jhelper, + deployment.openstack_machines_model, + remove_pvc=True, + ), + RemoveK8SUnitsStep( + client, name, jhelper, deployment.openstack_machines_model + ), + EnsureCiliumDeviceByHostStep( + deployment, + client, + jhelper, + deployment.openstack_machines_model, + ), + EnsureL2AdvertisementByHostStep( + deployment, + client, + jhelper, + deployment.openstack_machines_model, + Networks.INTERNAL, + deployment.internal_ip_pool, + ), + EnsureL2AdvertisementByHostStep( + deployment, + client, + jhelper, + deployment.openstack_machines_model, + Networks.PUBLIC, + deployment.public_ip_pool, + ), + RemoveSunbeamMachineUnitsStep( + client, name, jhelper, deployment.openstack_machines_model + ), + RemoveJujuMachineStep( + client, name, jhelper, deployment.openstack_machines_model + ), + MaasRemoveMachineFromClusterdStep(client, name), + ] + ) + if internal_ceph_enabled: + plan.append( + SetCephMgrPoolSizeStep( + client, jhelper, deployment.openstack_machines_model + ), + ) run_plan(plan, console, show_hints) + _call_enabled_feature_depart_hooks(deployment, node_info, name, force) click.echo( f"Removed node {name} from the cluster." " Run `sunbeam cluster resize` to scale down the cluster" @@ -1788,7 +1850,6 @@ def destroy_deployment_cmd( openstack_tfhelper = deployment.get_tfhelper("openstack-plan") microceph_tfhelper = deployment.get_tfhelper("microceph-plan") - cinder_volume_tfhelper = deployment.get_tfhelper("cinder-volume-plan") k8s_tfhelper = deployment.get_tfhelper("k8s-plan") if client and clusterd_up: # note(gboutry): can't use terraform if no clusterd is up @@ -1803,27 +1864,28 @@ def destroy_deployment_cmd( manifest, deployment.openstack_machines_model, ), - TerraformInitStep(cinder_volume_tfhelper), - DestroyCinderVolumeApplicationStep( - client, - cinder_volume_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ), TerraformInitStep(openstack_tfhelper), DestroyControlPlaneStep(deployment, openstack_tfhelper, jhelper), RemoveSaasApplicationsStep( jhelper, deployment.openstack_machines_model, OPENSTACK_MODEL ), - TerraformInitStep(microceph_tfhelper), - DestroyMicrocephApplicationStep( - client, - microceph_tfhelper, - jhelper, - manifest, - deployment.openstack_machines_model, - ), + ] + ) + if is_internal_ceph_enabled_feature_aware(deployment, client): + plan.extend( + [ + TerraformInitStep(microceph_tfhelper), + DestroyMicrocephApplicationStep( + client, + microceph_tfhelper, + jhelper, + manifest, + deployment.openstack_machines_model, + ), + ] + ) + plan.extend( + [ TerraformInitStep(k8s_tfhelper), DestroyK8SApplicationStep( client, @@ -1912,6 +1974,7 @@ def configure_sriov( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), ] if manifest and manifest.core.config.pci and manifest.core.config.pci.aliases: @@ -1971,6 +2034,7 @@ def configure_dpdk( jhelper, manifest, model=deployment.openstack_machines_model, + deployment=deployment, ), ] run_plan(plan, console, show_hints) diff --git a/sunbeam-python/sunbeam/provider/maas/steps.py b/sunbeam-python/sunbeam/provider/maas/steps.py index 71c81d99c..e68157b77 100644 --- a/sunbeam-python/sunbeam/provider/maas/steps.py +++ b/sunbeam-python/sunbeam/provider/maas/steps.py @@ -21,10 +21,10 @@ from snaphelpers import Snap import sunbeam.core.questions +import sunbeam.features.ceph.microceph as microceph import sunbeam.provider.maas.client as maas_client import sunbeam.provider.maas.deployment as maas_deployment import sunbeam.steps.k8s as k8s -import sunbeam.steps.microceph as microceph import sunbeam.utils as sunbeam_utils from sunbeam.clusterd.client import Client from sunbeam.clusterd.service import NodeNotExistInClusterException diff --git a/sunbeam-python/sunbeam/steps/cinder_volume.py b/sunbeam-python/sunbeam/steps/cinder_volume.py deleted file mode 100644 index bb5db05b1..000000000 --- a/sunbeam-python/sunbeam/steps/cinder_volume.py +++ /dev/null @@ -1,347 +0,0 @@ -# SPDX-FileCopyrightText: 2025 - Canonical Ltd -# SPDX-License-Identifier: Apache-2.0 - -import logging -from typing import Any - -import sunbeam.steps.microceph as microceph -from sunbeam import versions -from sunbeam.clusterd.client import Client -from sunbeam.clusterd.service import ( - NodeNotExistInClusterException, -) -from sunbeam.core.common import BaseStep, Result, ResultType, Role, StepContext -from sunbeam.core.deployment import Deployment, Networks -from sunbeam.core.juju import ( - ApplicationNotFoundException, - JujuHelper, -) -from sunbeam.core.manifest import CharmManifest, Manifest -from sunbeam.core.steps import ( - DeployMachineApplicationStep, - DestroyMachineApplicationStep, - RemoveMachineUnitsStep, -) -from sunbeam.core.terraform import TerraformException, TerraformHelper - -LOG = logging.getLogger(__name__) -CONFIG_KEY = "TerraformVarsCinderVolumePlan" -APPLICATION = "cinder-volume" -CINDER_VOLUME_APP_TIMEOUT = 1200 -CINDER_VOLUME_UNIT_TIMEOUT = ( - 1800 # 30 minutes, adding / removing units can take a long time -) - - -def get_mandatory_control_plane_offers( - tfhelper: TerraformHelper, -) -> dict[str, str | None]: - """Get mandatory control plane offers.""" - openstack_tf_output = tfhelper.output() - - tfvars = { - "keystone-offer-url": openstack_tf_output.get("keystone-offer-url"), - "database-offer-url": openstack_tf_output.get( - "cinder-volume-database-offer-url" - ), - "amqp-offer-url": openstack_tf_output.get("rabbitmq-offer-url"), - } - return tfvars - - -def get_optional_control_plane_offers( - tfhelper: TerraformHelper, -) -> dict[str, str | None]: - """Get optional control plane offers.""" - openstack_tf_output = tfhelper.output() - - tfvars = { - "cert-distributor-offer-url": openstack_tf_output.get( - "cert-distributor-offer-url" - ), - } - return tfvars - - -class DeployCinderVolumeApplicationStep(DeployMachineApplicationStep): - """Deploy Cinder Volume application using Terraform.""" - - def __init__( - self, - deployment: Deployment, - client: Client, - tfhelper: TerraformHelper, - jhelper: JujuHelper, - manifest: Manifest, - model: str, - extra_tfvars: dict | None = None, - ): - super().__init__( - deployment, - client, - tfhelper, - jhelper, - manifest, - CONFIG_KEY, - APPLICATION, - model, - [Role.STORAGE], - "Deploy Cinder Volume", - "Deploying Cinder Volume", - ) - self._offers: dict[str, str | None] = {} - self._optional_offers: dict[str, str | None] = {} - self.override_tfvars: dict[str, Any] = extra_tfvars or {} - - def get_application_timeout(self) -> int: - """Return application timeout in seconds.""" - return CINDER_VOLUME_APP_TIMEOUT - - def get_accepted_application_status(self) -> list[str]: - """Return accepted application status.""" - accepted_status = super().get_accepted_application_status() - offers = self._get_offers() - if not offers or not all(offers.values()): - accepted_status.append("blocked") - return accepted_status - - def _get_offers(self): - if not self._offers: - self._offers = get_mandatory_control_plane_offers( - self.deployment.get_tfhelper("openstack-plan") - ) - return self._offers - - def _get_optional_offers(self): - if not self._optional_offers: - self._optional_offers = get_optional_control_plane_offers( - self.deployment.get_tfhelper("openstack-plan") - ) - return self._optional_offers - - def extra_tfvars(self) -> dict: - """Extra terraform vars to pass to terraform apply.""" - storage_nodes = self.client.cluster.list_nodes_by_role("storage") - tfvars: dict[str, Any] = { - "endpoint_bindings": [ - { - "space": self.deployment.get_space(Networks.MANAGEMENT), - }, - { - "endpoint": "amqp", - "space": self.deployment.get_space(Networks.INTERNAL), - }, - { - "endpoint": "database", - "space": self.deployment.get_space(Networks.INTERNAL), - }, - { - "endpoint": "cinder-volume", - "space": self.deployment.get_space(Networks.MANAGEMENT), - }, - { - "endpoint": "identity-credentials", - "space": self.deployment.get_space(Networks.INTERNAL), - }, - { - "endpoint": "receive-ca-cert", - "space": self.deployment.get_space(Networks.INTERNAL), - }, - { - # relation to cinder-api - "endpoint": "storage-backend", - "space": self.deployment.get_space(Networks.INTERNAL), - }, - ], - "cinder_volume_ceph_endpoint_bindings": [ - { - "space": self.deployment.get_space(Networks.MANAGEMENT), - }, - { - # relation between hypervisor and cinder-volume-ceph - # providing credentials to access Ceph - "space": self.deployment.get_space(Networks.MANAGEMENT), - "endpoint": "ceph-access", - }, - { - "space": self.deployment.get_space(Networks.STORAGE), - "endpoint": "ceph", - }, - ], - "charm_cinder_volume_config": {"snap-channel": versions.OPENSTACK_CHANNEL}, - "charm_cinder_volume_ceph_config": { - "ceph-osd-replication-count": microceph.ceph_replica_scale( - len(storage_nodes) - ), - }, - } - - charm_manifest: CharmManifest | None = self.manifest.core.software.charms.get( - APPLICATION - ) - if charm_manifest and charm_manifest.config: - tfvars["charm_cinder_volume_config"].update(charm_manifest.config) - - # This may not be required ideally as Cinder volume is deployed always - # before user can enable or disable telemetry. - feature_manager = self.deployment.get_feature_manager() - if feature_manager.is_feature_enabled(self.deployment, "telemetry"): - tfvars["enable-telemetry-notifications"] = True - else: - tfvars["enable-telemetry-notifications"] = False - - if len(storage_nodes): - microceph_tfhelper = self.deployment.get_tfhelper("microceph-plan") - microceph_tf_output = microceph_tfhelper.output() - - ceph_application_name = microceph_tf_output.get("ceph-application-name") - - if ceph_application_name: - tfvars["ceph-application-name"] = ceph_application_name - tfvars.update(self._get_offers()) - tfvars.update(self._get_optional_offers()) - - # Any tfvars that needs override will take precedence from self.override_tfvars - # Example usage: When telemetry is enabled/disabled, telemetry feature can set - # enable-telemetry-notifications using override_tfvars - tfvars.update(self.override_tfvars) - - return tfvars - - -class RemoveCinderVolumeUnitsStep(RemoveMachineUnitsStep): - """Remove Cinder Volume Unit.""" - - def __init__( - self, client: Client, names: list[str] | str, jhelper: JujuHelper, model: str - ): - super().__init__( - client, - names, - jhelper, - CONFIG_KEY, - APPLICATION, - model, - "Remove Cinder Volume unit(s)", - "Removing Cinder Volume unit(s) from machine", - ) - - def get_unit_timeout(self) -> int: - """Return unit timeout in seconds.""" - return CINDER_VOLUME_UNIT_TIMEOUT - - -class CheckCinderVolumeDistributionStep(BaseStep): - _APPLICATION = APPLICATION - - def __init__( - self, - client: Client, - name: str, - jhelper: JujuHelper, - model: str, - force: bool = False, - ): - super().__init__( - "Check Cinder Volume distribution", - "Check if node is hosting units of Cinder Volume", - ) - self.client = client - self.node = name - self.jhelper = jhelper - self.model = model - self.force = force - - def is_skip(self, context: StepContext) -> Result: - """Determines if the step should be skipped or not. - - :return: ResultType.SKIPPED if the Step should be skipped, - ResultType.COMPLETED or ResultType.FAILED otherwise - """ - try: - node_info = self.client.cluster.get_node_info(self.node) - except NodeNotExistInClusterException: - return Result( - ResultType.SKIPPED, f"Node {self.node} is not found in the cluster" - ) - - if Role.STORAGE.name.lower() not in node_info.get("role", ""): - LOG.debug("Node %s is not a storage node", self.node) - return Result(ResultType.SKIPPED) - try: - app = self.jhelper.get_application(self._APPLICATION, self.model) - except ApplicationNotFoundException: - LOG.debug("Failed to get application", exc_info=True) - return Result( - ResultType.SKIPPED, - f"Application {self._APPLICATION} has not been deployed yet", - ) - - for unit_name, unit in app.units.items(): - if unit.machine == str(node_info.get("machineid")): - LOG.debug("Unit %s is running on node %s", unit_name, self.node) - break - else: - LOG.debug("No %s units found on %s", self._APPLICATION, self.node) - return Result(ResultType.SKIPPED) - nb_storage_nodes = len(self.client.cluster.list_nodes_by_role("storage")) - if nb_storage_nodes == 1 and not self.force: - return Result( - ResultType.FAILED, - "Cannot remove the last cinder-volume," - "--force to override, volume capabilities" - " will be lost.", - ) - - return Result(ResultType.COMPLETED) - - -class DestroyCinderVolumeApplicationStep(DestroyMachineApplicationStep): - """Destroy Cinder Volume application using Terraform.""" - - def __init__( - self, - client: Client, - tfhelper: TerraformHelper, - jhelper: JujuHelper, - manifest: Manifest, - model: str, - ): - super().__init__( - client, - tfhelper, - jhelper, - manifest, - CONFIG_KEY, - [APPLICATION], - model, - "Destroy Cinder Volume", - "Destroying Cinder Volume", - ) - - def get_application_timeout(self) -> int: - """Return application timeout in seconds.""" - return CINDER_VOLUME_APP_TIMEOUT - - def run(self, context: StepContext) -> Result: - """Destroy Cinder Volume application.""" - # note(gboutry):this is a workaround for - # https://github.com/juju/terraform-provider-juju/issues/473 - try: - resources = self.tfhelper.state_list() - except TerraformException as e: - LOG.debug(f"Failed to list terraform state: {str(e)}") - return Result(ResultType.FAILED, "Failed to list terraform state") - - for resource in resources: - if "integration" in resource: - try: - self.tfhelper.state_rm(resource) - except TerraformException as e: - LOG.debug(f"Failed to remove resource {resource}: {str(e)}") - return Result( - ResultType.FAILED, - f"Failed to remove resource {resource} from state", - ) - - return super().run(context) diff --git a/sunbeam-python/sunbeam/steps/cluster_status.py b/sunbeam-python/sunbeam/steps/cluster_status.py index e20f3b0e4..4a1667002 100644 --- a/sunbeam-python/sunbeam/steps/cluster_status.py +++ b/sunbeam-python/sunbeam/steps/cluster_status.py @@ -22,7 +22,7 @@ from sunbeam.core.deployment import Deployment from sunbeam.core.juju import JujuHelper, ModelNotFoundException from sunbeam.core.steps import BaseStep -from sunbeam.steps import clusterd, hypervisor, k8s, microceph, microovn +from sunbeam.steps import clusterd, hypervisor, k8s, microovn from sunbeam.utils import merge_dict LOG = logging.getLogger(__name__) @@ -125,13 +125,16 @@ def models(self) -> list[str]: def applications_to_columns(self) -> dict: """Mapping of applications to columns.""" - return { + ceph_provider = self.deployment.get_ceph_provider() + mapping = { clusterd.APPLICATION: "clusterd", k8s.APPLICATION: "control", hypervisor.APPLICATION: "compute", - microceph.APPLICATION: "storage", microovn.APPLICATION: "network", } + if ceph_provider.application_name: + mapping[ceph_provider.application_name] = ceph_provider.status_column + return mapping @abc.abstractmethod def _update_microcluster_status(self, status: dict, microcluster_status: dict): diff --git a/sunbeam-python/sunbeam/steps/hypervisor.py b/sunbeam-python/sunbeam/steps/hypervisor.py index 479ef8365..8cc07efae 100644 --- a/sunbeam-python/sunbeam/steps/hypervisor.py +++ b/sunbeam-python/sunbeam/steps/hypervisor.py @@ -48,6 +48,7 @@ ) from sunbeam.lazy import LazyImport from sunbeam.steps.configure import get_external_network_configs +from sunbeam.storage.manager import StorageBackendManager if typing.TYPE_CHECKING: import openstack @@ -75,7 +76,6 @@ def __init__( client: Client, tfhelper: TerraformHelper, openstack_tfhelper: TerraformHelper, - cinder_volume_tfhelper: TerraformHelper, jhelper: JujuHelper, manifest: Manifest, model: str, @@ -94,14 +94,12 @@ def __init__( "Deploying OpenStack Hypervisor", ) self.openstack_tfhelper = openstack_tfhelper - self.cinder_volume_tfhelper = cinder_volume_tfhelper self.ovn_manager = deployment.get_ovn_manager() def extra_tfvars(self) -> dict: """Extra terraform vars to pass to terraform apply.""" openstack_tf_output = self.openstack_tfhelper.output() - storage_nodes = self.client.cluster.list_nodes_by_role("storage") # Always pass Offer URLs as extravars instead of terraform backend # so that sunbeam has control to remove the CMR integrations by passing # null value. @@ -120,12 +118,11 @@ def extra_tfvars(self) -> dict: juju_offers.add("ovn-relay-offer-url") extra_tfvars = {offer: openstack_tf_output.get(offer) for offer in juju_offers} - if len(storage_nodes) > 0: - cinder_volume_tf_output = self.cinder_volume_tfhelper.output() - - app_name_key = "cinder-volume-ceph-application-name" - if app_name := cinder_volume_tf_output.get(app_name_key): - extra_tfvars[app_name_key] = app_name + manager = StorageBackendManager() + integrations = manager.collect_hypervisor_integrations( + self.deployment, self.client + ) + extra_tfvars["extra_integrations"] = [i.to_dict() for i in integrations] extra_tfvars.update( { @@ -203,7 +200,7 @@ def tf_apply_extra_args(self) -> list: "-target=juju_integration.hypervisor-cert-distributor", "-target=juju_integration.hypervisor-certs", "-target=juju_integration.hypervisor-ceilometer", - "-target=juju_integration.hypervisor-cinder-ceph", + "-target=juju_integration.hypervisor-extra-integration", "-target=juju_integration.hypervisor-masakari", "-target=juju_integration.hypervisor-barbican", ] @@ -344,7 +341,8 @@ def __init__( jhelper: JujuHelper, manifest: Manifest, model: str, - extra_tfvars: dict = {}, + extra_tfvars: dict | None = None, + deployment: Deployment | None = None, ): super().__init__( "Reapply OpenStack Hypervisor Terraform plan", @@ -355,7 +353,8 @@ def __init__( self.jhelper = jhelper self.manifest = manifest self.model = model - self.extra_tfvars = extra_tfvars + self.extra_tfvars = extra_tfvars.copy() if extra_tfvars else {} + self.deployment = deployment def is_skip(self, context: StepContext) -> Result: """Determines if the step should be skipped or not. @@ -401,6 +400,15 @@ def run(self, context: StepContext) -> Result: LOG.debug("Adding DPDK configuration: %s", dpdk_config) self.extra_tfvars["charm_config"].update(dpdk_config) + if self.deployment: + manager = StorageBackendManager() + integrations = manager.collect_hypervisor_integrations( + self.deployment, self.client + ) + self.extra_tfvars["extra_integrations"] = [ + i.to_dict() for i in integrations + ] + statuses = ["active", "unknown"] if len(self.client.cluster.list_nodes_by_role("storage")) < 1: LOG.debug("No storage nodes found, allowing hypervisor waiting status") diff --git a/sunbeam-python/sunbeam/steps/maintenance.py b/sunbeam-python/sunbeam/steps/maintenance.py index 94ffdb4c9..6bbde527b 100644 --- a/sunbeam-python/sunbeam/steps/maintenance.py +++ b/sunbeam-python/sunbeam/steps/maintenance.py @@ -24,12 +24,12 @@ UnitNotFoundException, ) from sunbeam.core.watcher import WatcherActionFailedException +from sunbeam.features.ceph.microceph import APPLICATION as _MICROCEPH_APPLICATION from sunbeam.steps.k8s import ( CordonK8SUnitStep, DrainK8SUnitStep, UncordonK8SUnitStep, ) -from sunbeam.steps.microceph import APPLICATION as _MICROCEPH_APPLICATION if TYPE_CHECKING: from watcherclient import v1 as watcher diff --git a/sunbeam-python/sunbeam/steps/openstack.py b/sunbeam-python/sunbeam/steps/openstack.py index d9c8ce950..7b23d8799 100644 --- a/sunbeam-python/sunbeam/steps/openstack.py +++ b/sunbeam-python/sunbeam/steps/openstack.py @@ -11,10 +11,10 @@ import tenacity from rich.console import Console -import sunbeam.steps.microceph as microceph from sunbeam.clusterd.client import Client from sunbeam.clusterd.service import ConfigItemNotFoundException from sunbeam.core import ovn +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.common import ( RAM_32_GB_IN_KB, BaseStep, @@ -560,6 +560,7 @@ def __init__( self.client = deployment.get_client() self.storage_manager = deployment.get_storage_manager() self.ovn_manager = deployment.get_ovn_manager() + self.ceph_provider = deployment.get_ceph_provider() self.tfhelper = tfhelper self.jhelper = jhelper self.manifest = manifest @@ -574,20 +575,14 @@ def __init__( def get_storage_tfvars(self, storage_nodes: list[dict]) -> dict: """Create terraform variables related to storage.""" + model_with_owner = self.get_model_name_with_owner(self.machine_model) tfvars: dict[str, str | bool | int | list[str]] = {} - if storage_nodes: - model_with_owner = self.get_model_name_with_owner(self.machine_model) - tfvars["enable-ceph"] = True - tfvars["ceph-offer-url"] = f"{model_with_owner}.{microceph.APPLICATION}" - tfvars["ceph-nfs-offer-url"] = ( - f"{model_with_owner}.{microceph.NFS_OFFER_NAME}" - ) - tfvars["ceph-rgw-offer-url"] = ( - f"{model_with_owner}.{microceph.RGW_OFFER_NAME}" - ) - tfvars["ceph-osd-replication-count"] = microceph.ceph_replica_scale( - len(storage_nodes) + tfvars.update( + self.ceph_provider.get_control_plane_tfvars( + model_with_owner, len(storage_nodes) ) + ) + if storage_nodes: tfvars["enable-cinder-volume"] = True urls = [f"{model_with_owner}.cinder-volume"] principal_apps = self.storage_manager.list_principal_applications( @@ -601,7 +596,6 @@ def get_storage_tfvars(self, storage_nodes: list[dict]) -> dict: tfvars["cinder-volume-offer-urls"] = urls else: - tfvars["enable-ceph"] = False tfvars["enable-cinder-volume"] = False return tfvars @@ -883,7 +877,9 @@ def services(self): # The region controller cluster is not expected to have ovn-relay. # Microovn based deployments do not use ovn-relay. services.append("ovn-relay") - if self.client.cluster.list_nodes_by_role("storage"): + if self.client.cluster.list_nodes_by_role( + "storage" + ) and is_internal_ceph_enabled(self.client): services.append("traefik-rgw") return services @@ -903,7 +899,9 @@ def __init__( def services(self): """List of services to patch.""" services = ["traefik-public"] - if self.client.cluster.list_nodes_by_role("storage"): + if self.client.cluster.list_nodes_by_role( + "storage" + ) and is_internal_ceph_enabled(self.client): services.append("traefik-rgw") return services @@ -934,6 +932,7 @@ def __init__( self.client = client self.storage_manager = deployment.get_storage_manager() self.ovn_manager = deployment.get_ovn_manager() + self.ceph_provider = deployment.get_ceph_provider() self.tfhelper = tfhelper self.jhelper = jhelper self.manifest = manifest @@ -943,20 +942,14 @@ def __init__( def get_storage_tfvars(self, storage_nodes: list[dict]) -> dict: """Create terraform variables related to storage.""" + model_with_owner = self.get_model_name_with_owner(self.machine_model) tfvars: dict[str, str | bool | int | list[str]] = {} - if storage_nodes: - model_with_owner = self.get_model_name_with_owner(self.machine_model) - tfvars["enable-ceph"] = True - tfvars["ceph-offer-url"] = f"{model_with_owner}.{microceph.APPLICATION}" - tfvars["ceph-nfs-offer-url"] = ( - f"{model_with_owner}.{microceph.NFS_OFFER_NAME}" - ) - tfvars["ceph-rgw-offer-url"] = ( - f"{model_with_owner}.{microceph.RGW_OFFER_NAME}" - ) - tfvars["ceph-osd-replication-count"] = microceph.ceph_replica_scale( - len(storage_nodes) + tfvars.update( + self.ceph_provider.get_control_plane_tfvars( + model_with_owner, len(storage_nodes) ) + ) + if storage_nodes: tfvars["enable-cinder-volume"] = True urls = [f"{model_with_owner}.cinder-volume"] principal_apps = self.storage_manager.list_principal_applications( @@ -970,7 +963,6 @@ def get_storage_tfvars(self, storage_nodes: list[dict]) -> dict: tfvars["cinder-volume-offer-urls"] = urls else: - tfvars["enable-ceph"] = False tfvars["enable-cinder-volume"] = False return tfvars diff --git a/sunbeam-python/sunbeam/steps/upgrades/inter_channel.py b/sunbeam-python/sunbeam/steps/upgrades/inter_channel.py index 5ed1c3a11..29d3faafa 100644 --- a/sunbeam-python/sunbeam/steps/upgrades/inter_channel.py +++ b/sunbeam-python/sunbeam/steps/upgrades/inter_channel.py @@ -23,14 +23,14 @@ ) from sunbeam.core.manifest import Manifest from sunbeam.core.terraform import TerraformException, TerraformHelper -from sunbeam.steps.cinder_volume import CONFIG_KEY as CINDER_VOLUME_CONFIG_KEY +from sunbeam.features.ceph.microceph import CONFIG_KEY as MICROCEPH_CONFIG_KEY from sunbeam.steps.hypervisor import CONFIG_KEY as HYPERVISOR_CONFIG_KEY from sunbeam.steps.k8s import K8S_CONFIG_KEY -from sunbeam.steps.microceph import CONFIG_KEY as MICROCEPH_CONFIG_KEY from sunbeam.steps.openstack import CONFIG_KEY as OPENSTACK_CONFIG_KEY from sunbeam.steps.openstack import OPENSTACK_DEPLOY_TIMEOUT from sunbeam.steps.sunbeam_machine import CONFIG_KEY as SUNBEAM_MACHINE_CONFIG_KEY from sunbeam.steps.upgrades.base import UpgradeCoordinator, UpgradeFeatures +from sunbeam.storage.steps import STORAGE_BACKEND_TFVAR_CONFIG_KEY from sunbeam.versions import ( MISC_CHARMS_K8S, MYSQL_CHARMS_K8S, @@ -308,7 +308,7 @@ def __init__( ) -class UpgradeCinderVolumeCharm(UpgradeMachineCharm): +class UpgradeStorageBackendCharms(UpgradeMachineCharm): def __init__( self, client: Client, @@ -317,7 +317,10 @@ def __init__( manifest: Manifest, model: str, ): - """Create instance of UpgradeCinderVolumeCharm class. + """Create instance of UpgradeStorageBackendCharms class. + + Upgrades the cinder-volume charm managed by the storage-backend + Terraform plan. :client: Client to connect to clusterdb :jhelper: Helper for interacting with pylibjuju @@ -325,15 +328,15 @@ def __init__( :model: Name of model containing charms. """ super().__init__( - "Upgrade Cinder Volume charm", - "Upgrading cinder-volume charm", + "Upgrade Storage Backend charms", + "Upgrading storage backend charms", client, tfhelper, jhelper, manifest, model, - ["cinder-volume", "cinder-volume-ceph"], - CINDER_VOLUME_CONFIG_KEY, + ["cinder-volume"], + STORAGE_BACKEND_TFVAR_CONFIG_KEY, 1200, ) @@ -461,9 +464,9 @@ def get_plan(self) -> list[BaseStep]: self.manifest, self.deployment.openstack_machines_model, ), - UpgradeCinderVolumeCharm( + UpgradeStorageBackendCharms( self.client, - get_tf("cinder-volume-plan"), + get_tf("storage-backend-plan"), self.jhelper, self.manifest, self.deployment.openstack_machines_model, diff --git a/sunbeam-python/sunbeam/steps/upgrades/intra_channel.py b/sunbeam-python/sunbeam/steps/upgrades/intra_channel.py index 6f3abe81a..582da601b 100644 --- a/sunbeam-python/sunbeam/steps/upgrades/intra_channel.py +++ b/sunbeam-python/sunbeam/steps/upgrades/intra_channel.py @@ -7,6 +7,7 @@ from rich.console import Console from sunbeam.clusterd.client import Client +from sunbeam.core.ceph import is_internal_ceph_enabled from sunbeam.core.common import ( BaseStep, Result, @@ -26,8 +27,8 @@ from sunbeam.core.manifest import Manifest from sunbeam.core.openstack import OPENSTACK_MODEL from sunbeam.core.terraform import TerraformInitStep +from sunbeam.features.ceph.microceph import DeployMicrocephApplicationStep from sunbeam.features.interface.v1.base import is_maas_deployment -from sunbeam.steps.cinder_volume import DeployCinderVolumeApplicationStep from sunbeam.steps.hypervisor import ReapplyHypervisorTerraformPlanStep from sunbeam.steps.k8s import ( DeployK8SApplicationStep, @@ -35,7 +36,6 @@ EnsureDefaultL2AdvertisementMutedStep, EnsureL2AdvertisementByHostStep, ) -from sunbeam.steps.microceph import DeployMicrocephApplicationStep from sunbeam.steps.microovn import DeployMicroOVNApplicationStep from sunbeam.steps.mysql import MySQLCharmUpgradeStep from sunbeam.steps.openstack import ( @@ -46,6 +46,13 @@ ) from sunbeam.steps.sunbeam_machine import DeploySunbeamMachineApplicationStep from sunbeam.steps.upgrades.base import UpgradeCoordinator, UpgradeFeatures +from sunbeam.steps.upgrades.storage_migration import ( + BackfillCephFeatureStateStep, + ImportCephResourcesToStorageFrameworkStep, + MigrateCinderVolumeToStorageFrameworkStep, +) +from sunbeam.storage.base import STORAGE_TFPLAN, register_storage_terraform_plan +from sunbeam.storage.steps import ReapplyStorageBackendTerraformPlanStep LOG = logging.getLogger(__name__) console = Console() @@ -401,40 +408,50 @@ class LatestInChannelCoordinator(UpgradeCoordinator): def get_plan(self) -> list[BaseStep]: """Return the upgrade plan.""" - plan = [ + plan: list[BaseStep] = [ LatestInChannel(self.deployment, self.jhelper, self.manifest), ReapplyInfraModelConfigStep(self.deployment, self.jhelper, self.manifest), RefreshSnapStep(self.deployment, self.jhelper), - # Microceph introduces new offer urls for rgw and so microceph - # plan need to be applied before openstack plan - TerraformInitStep(self.deployment.get_tfhelper("microceph-plan")), - DeployMicrocephApplicationStep( - self.deployment, - self.client, - self.deployment.get_tfhelper("microceph-plan"), - self.jhelper, - self.manifest, - self.deployment.openstack_machines_model, - ), - TerraformInitStep(self.deployment.get_tfhelper("openstack-plan")), - ReapplyOpenStackTerraformPlanStep( - self.deployment, - self.client, - self.deployment.get_tfhelper("openstack-plan"), - self.jhelper, - self.manifest, - self.deployment.openstack_machines_model, - ), - TerraformInitStep(self.deployment.get_tfhelper("sunbeam-machine-plan")), - DeploySunbeamMachineApplicationStep( - self.deployment, - self.client, - self.deployment.get_tfhelper("sunbeam-machine-plan"), - self.jhelper, - self.manifest, - self.deployment.openstack_machines_model, - ), ] + # Microceph introduces new offer urls for rgw and so microceph + # plan need to be applied before openstack plan + microceph_necessary = is_internal_ceph_enabled(self.client) + if microceph_necessary: + plan.extend( + [ + TerraformInitStep(self.deployment.get_tfhelper("microceph-plan")), + DeployMicrocephApplicationStep( + self.deployment, + self.client, + self.deployment.get_tfhelper("microceph-plan"), + self.jhelper, + self.manifest, + self.deployment.openstack_machines_model, + ), + ] + ) + plan.extend( + [ + TerraformInitStep(self.deployment.get_tfhelper("openstack-plan")), + ReapplyOpenStackTerraformPlanStep( + self.deployment, + self.client, + self.deployment.get_tfhelper("openstack-plan"), + self.jhelper, + self.manifest, + self.deployment.openstack_machines_model, + ), + TerraformInitStep(self.deployment.get_tfhelper("sunbeam-machine-plan")), + DeploySunbeamMachineApplicationStep( + self.deployment, + self.client, + self.deployment.get_tfhelper("sunbeam-machine-plan"), + self.jhelper, + self.manifest, + self.deployment.openstack_machines_model, + ), + ] + ) if is_maas_deployment(self.deployment): from sunbeam.provider.maas.client import MaasClient # noqa: PLC0415 @@ -542,22 +559,55 @@ def get_plan(self) -> list[BaseStep]: ] ) + if microceph_necessary: + plan.extend( + [ + TerraformInitStep(self.deployment.get_tfhelper("microceph-plan")), + DeployMicrocephApplicationStep( + self.deployment, + self.client, + self.deployment.get_tfhelper("microceph-plan"), + self.jhelper, + self.manifest, + self.deployment.openstack_machines_model, + ), + ] + ) + # Migrate old cinder-volume plan state to the unified storage + # framework before reapplying the storage backend plan. + old_cv_tfhelper = self.deployment.get_tfhelper("cinder-volume-plan") plan.extend( [ - TerraformInitStep(self.deployment.get_tfhelper("microceph-plan")), - DeployMicrocephApplicationStep( + TerraformInitStep(old_cv_tfhelper), + MigrateCinderVolumeToStorageFrameworkStep( self.deployment, self.client, - self.deployment.get_tfhelper("microceph-plan"), + old_cv_tfhelper, self.jhelper, self.manifest, self.deployment.openstack_machines_model, ), - TerraformInitStep(self.deployment.get_tfhelper("cinder-volume-plan")), - DeployCinderVolumeApplicationStep( + ] + ) + + # Register and reapply the storage backend terraform plan + # so that cinder-volume and backend charms pick up upgrades. + register_storage_terraform_plan(self.deployment) + storage_tfhelper = self.deployment.get_tfhelper(STORAGE_TFPLAN) + plan.extend( + [ + TerraformInitStep(storage_tfhelper), + ImportCephResourcesToStorageFrameworkStep( + self.deployment, + self.client, + storage_tfhelper, + self.jhelper, + self.deployment.openstack_machines_model, + ), + ReapplyStorageBackendTerraformPlanStep( self.deployment, self.client, - self.deployment.get_tfhelper("cinder-volume-plan"), + storage_tfhelper, self.jhelper, self.manifest, self.deployment.openstack_machines_model, @@ -569,6 +619,11 @@ def get_plan(self) -> list[BaseStep]: self.jhelper, self.manifest, self.deployment.openstack_machines_model, + deployment=self.deployment, + ), + BackfillCephFeatureStateStep( + self.deployment, + self.client, ), UpgradeFeatures(self.deployment, upgrade_release=False), ] diff --git a/sunbeam-python/sunbeam/steps/upgrades/storage_migration.py b/sunbeam-python/sunbeam/steps/upgrades/storage_migration.py new file mode 100644 index 000000000..344b3a1b3 --- /dev/null +++ b/sunbeam-python/sunbeam/steps/upgrades/storage_migration.py @@ -0,0 +1,579 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +"""Migration step for cinder-volume terraform state. + +Moves resources from the retired ``deploy-cinder-volume`` plan into the +unified ``deploy-storage`` plan and registers the ``internal-ceph`` +backend in clusterd. +""" + +import logging + +from sunbeam.clusterd.client import Client +from sunbeam.clusterd.service import ( + ConfigItemNotFoundException, + StorageBackendNotFoundException, +) +from sunbeam.core.ceph import ( + INTERNAL_CEPH_BACKEND_NAME, + CephDeploymentMode, + is_internal_ceph_enabled, + load_ceph_config, + write_ceph_config, +) +from sunbeam.core.common import ( + BaseStep, + Result, + ResultType, + Role, + StepContext, + read_config, + update_config, +) +from sunbeam.core.deployment import Deployment, Networks +from sunbeam.core.juju import JujuHelper +from sunbeam.core.manifest import Manifest +from sunbeam.core.terraform import TerraformException, TerraformHelper +from sunbeam.features.ceph.feature import DEFAULT_STORAGE_RECONCILED_KEY +from sunbeam.features.ceph.microceph import ceph_replica_scale +from sunbeam.features.interface.v1.base import EnableDisableFeature +from sunbeam.storage.backends.internal_ceph.backend import ( + InternalCephBackend, + InternalCephConfig, +) +from sunbeam.storage.base import PRINCIPAL_HA_APPLICATION +from sunbeam.storage.steps import ( + STORAGE_BACKEND_TFVAR_CONFIG_KEY, + get_mandatory_control_plane_offers, + get_optional_control_plane_offers, +) +from sunbeam.versions import CINDER_VOLUME_CHARM + +LOG = logging.getLogger(__name__) + +STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY = "StorageBackendLegacyImportIds" + +LEGACY_TO_STORAGE_IMPORT_ADDRESS_MAP = { + "juju_application.cinder-volume": ( + 'module.cinder-volume["cinder-volume"].juju_application.cinder-volume' + ), + "juju_offer.storage-backend-offer": ( + 'module.cinder-volume["cinder-volume"].juju_offer.storage-backend-offer' + ), + "juju_integration.cinder-volume-identity[0]": ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-identity[0]' + ), + "juju_integration.cinder-volume-amqp[0]": ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-amqp[0]' + ), + "juju_integration.cinder-volume-database[0]": ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-database[0]' + ), + "juju_integration.cinder-volume-cert-distributor[0]": ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-cert-distributor[0]' + ), + "juju_application.cinder-volume-ceph": ( + 'module.backends["internal-ceph"].juju_application.storage-backend' + ), + "juju_integration.cinder-volume-ceph-to-cinder-volume": ( + 'module.backends["internal-ceph"].juju_integration.storage-backend-to-cinder-volume' + ), + "juju_integration.cinder-volume-ceph-to-ceph[0]": ( + 'module.backends["internal-ceph"].juju_integration.backend-extra-integration["microceph-ceph"]' + ), +} + + +class BackfillCephFeatureStateStep(BaseStep): + """Backfill Ceph feature metadata for existing internal Ceph deployments.""" + + def __init__(self, deployment: Deployment, client: Client): + super().__init__( + "Backfill Ceph feature state", + "Backfilling Ceph feature metadata for internal Ceph deployments", + ) + self.deployment = deployment + self.client = client + + def run(self, context: StepContext) -> Result: + """Backfill Ceph feature metadata when internal Ceph is managed.""" + if not is_internal_ceph_enabled(self.client): + return Result(ResultType.COMPLETED) + + # Persist CephConfig.mode explicitly so that any future strict + # equality comparison (config.mode == CephDeploymentMode.MICROCEPH) + # works on upgraded clusters where the mode was never written. + config = load_ceph_config(self.client) + if config.mode is None: + config.mode = CephDeploymentMode.MICROCEPH + write_ceph_config(self.client, config) + + feature = self.deployment.get_feature_manager().resolve_feature("ceph") + if not isinstance(feature, EnableDisableFeature): + LOG.debug("Failed to resolve ceph feature for state backfill.") + return Result(ResultType.COMPLETED) + + feature.update_feature_info( + self.client, + { + "enabled": "true", + DEFAULT_STORAGE_RECONCILED_KEY: "true", + }, + ) + return Result(ResultType.COMPLETED) + + +class ImportCephResourcesToStorageFrameworkStep(BaseStep): + """Import legacy internal-Ceph resources into the storage plan state.""" + + def __init__( + self, + deployment: Deployment, + client: Client, + tfhelper: TerraformHelper, + jhelper: JujuHelper, + model: str, + ): + super().__init__( + "Import Ceph resources to storage framework", + "Importing legacy Ceph resources into storage-backend Terraform state", + ) + self.deployment = deployment + self.client = client + self.tfhelper = tfhelper + self.jhelper = jhelper + self.model = model + + def is_skip(self, context: StepContext) -> Result: + """Skip when there is no migrated internal-ceph backend to import.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + return Result(ResultType.SKIPPED, "No storage backend config found.") + + if INTERNAL_CEPH_BACKEND_NAME not in tfvars.get("backends", {}): + return Result(ResultType.SKIPPED, "No internal-ceph backend to import.") + + if not tfvars.get("cinder-volumes"): + return Result( + ResultType.SKIPPED, + "No cinder-volume resources configured for import.", + ) + + return Result(ResultType.COMPLETED) + + def _build_imports( + self, + tfvars: dict, + *, + model_uuid: str, + model_name: str, + ) -> list[tuple[str, str]]: + """Build the list of legacy Ceph resources to import.""" + # Use the known HA principal — legacy Ceph always deployed under + # cinder-volume (HA). Do NOT use next(iter(...)) which could + # pick a non-HA principal from another backend. + principal = PRINCIPAL_HA_APPLICATION + cinder_volume = tfvars["cinder-volumes"][principal] + backend = tfvars["backends"][INTERNAL_CEPH_BACKEND_NAME] + backend_application_name = backend.get("application_name", "cinder-volume-ceph") + + imports = { + f'module.cinder-volume["{principal}"].juju_application.cinder-volume': ( + f"{model_uuid}:{cinder_volume['application_name']}" + ), + f'module.cinder-volume["{principal}"].juju_offer.storage-backend-offer': ( + f"{model_name}.{cinder_volume['application_name']}" + ), + 'module.backends["internal-ceph"].juju_application.storage-backend': ( + f"{model_uuid}:{backend_application_name}" + ), + ( + 'module.backends["internal-ceph"].juju_integration.' + "storage-backend-to-cinder-volume" + ): ( + f"{model_uuid}:{principal}:cinder-volume:" + f"{backend_application_name}:cinder-volume" + ), + ( + 'module.backends["internal-ceph"].juju_integration.' + 'backend-extra-integration["microceph-ceph"]' + ): (f"{model_uuid}:microceph:ceph:{backend_application_name}:ceph"), + } + + try: + legacy_import_ids = read_config( + self.client, STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY + ) + except ConfigItemNotFoundException: + legacy_import_ids = {} + + for old_address, import_id in legacy_import_ids.items(): + if new_address := LEGACY_TO_STORAGE_IMPORT_ADDRESS_MAP.get(old_address): + imports[new_address] = import_id + + return list(imports.items()) + + def run(self, context: StepContext) -> Result: + """Import migrated Ceph resources into the new storage plan state.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + return Result(ResultType.COMPLETED) + + self.tfhelper.write_tfvars(tfvars) + + model_info = self.jhelper.get_model(self.model) + imports = self._build_imports( + tfvars, + model_uuid=model_info["model-uuid"], + model_name=model_info["name"], + ) + + try: + try: + existing_resources = set(self.tfhelper.state_list()) + except TerraformException as e: + if "No state file was found" in str(e): + existing_resources = set() + else: + raise + for address, resource_id in imports: + if address in existing_resources: + continue + self.tfhelper.import_resource(address, resource_id) + existing_resources.add(address) + except TerraformException as e: + LOG.error("Failed to import legacy Ceph resources: %s", e) + return Result(ResultType.FAILED, str(e)) + + # Once imports succeed, the legacy-id map in clusterd has served + # its purpose and should be removed so it does not linger past + # the migration window. + try: + self.client.cluster.delete_config(STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY) + except ConfigItemNotFoundException: + pass + + return Result(ResultType.COMPLETED) + + +class MigrateCinderVolumeToStorageFrameworkStep(BaseStep): + """Migrate cinder-volume resources to the storage framework. + + Existing deployments manage cinder-volume and cinder-volume-ceph via + the ``deploy-cinder-volume`` Terraform plan. This step: + + 1. Removes all resources from the old plan's state (no destruction). + 2. Registers the ``internal-ceph`` backend in clusterd. + 3. Populates ``TerraformVarsStorageBackends`` so that the + ``deploy-storage`` plan can adopt the running Juju resources. + + The subsequent ``ReapplyStorageBackendTerraformPlanStep`` in the + upgrade flow runs ``terraform apply`` on the new plan, which + reconciles the existing Juju applications into the new state. + """ + + def __init__( + self, + deployment: Deployment, + client: Client, + old_tfhelper: TerraformHelper, + jhelper: JujuHelper, + manifest: Manifest, + model: str, + ): + super().__init__( + "Migrate cinder-volume to storage framework", + "Migrating cinder-volume terraform state to unified storage plan", + ) + self.deployment = deployment + self.client = client + self.old_tfhelper = old_tfhelper + self.jhelper = jhelper + self.manifest = manifest + self.model = model + + def _old_plan_has_resources(self) -> bool: + """Return True if the old cinder-volume plan has resources.""" + try: + resources = self.old_tfhelper.state_list() + return len(resources) > 0 + except TerraformException: + LOG.debug( + "Failed to list old cinder-volume plan state", + exc_info=True, + ) + return False + + def is_skip(self, context: StepContext) -> Result: + """Skip when migration is not needed. + + Migration is skipped when the old plan has no resources. + When the storage framework already has backends (e.g. PureStorage), + the migration merges the internal-ceph entries into the existing + config rather than overwriting it. + """ + if not self._old_plan_has_resources(): + LOG.debug("Old cinder-volume plan has no resources; skipping migration.") + return Result( + ResultType.SKIPPED, + "Old cinder-volume plan has no resources.", + ) + + return Result(ResultType.COMPLETED) + + def _clear_old_state(self) -> None: + """Remove all resources from the old terraform plan state.""" + resources = self.old_tfhelper.state_list() + for resource in resources: + # Skip data sources (they are read-only references) + if resource.startswith("data."): + continue + LOG.debug("Removing resource %r from old cinder-volume plan", resource) + self.old_tfhelper.state_rm(resource) + + def _capture_legacy_import_ids(self) -> dict[str, str]: + """Capture exact legacy resource IDs from the old terraform state.""" + state = self.old_tfhelper.pull_state() + import_ids: dict[str, str] = {} + + for resource in state.get("resources", []): + mode = resource.get("mode") + resource_type = resource.get("type") + resource_name = resource.get("name") + if mode != "managed" or not resource_type or not resource_name: + continue + + for instance in resource.get("instances", []): + index_key = instance.get("index_key") + address = f"{resource_type}.{resource_name}" + if index_key is not None: + address = f"{address}[{index_key}]" + attributes = instance.get("attributes", {}) + import_id = attributes.get("id") + if import_id: + import_ids[address] = import_id + + return import_ids + + def _register_internal_ceph_backend(self, model_uuid: str) -> None: + """Register the internal-ceph backend in clusterd.""" + backend = InternalCephBackend() + storage_nodes = self.client.cluster.list_nodes_by_role("storage") + replication_count = ceph_replica_scale(len(storage_nodes)) + + config = InternalCephConfig.model_validate( + {"ceph_osd_replication_count": replication_count}, by_name=True + ) + config_dict = config.model_dump(exclude_none=True, by_alias=True) + config_key = backend.config_key(INTERNAL_CEPH_BACKEND_NAME) + update_config(self.client, config_key, config_dict) + try: + self.client.cluster.get_storage_backend(INTERNAL_CEPH_BACKEND_NAME) + self.client.cluster.update_storage_backend( + name=INTERNAL_CEPH_BACKEND_NAME, + backend_type=backend.backend_type, + config=config_dict, + principal=backend.principal_application, + model_uuid=model_uuid, + ) + except StorageBackendNotFoundException: + self.client.cluster.add_storage_backend( + name=INTERNAL_CEPH_BACKEND_NAME, + backend_type=backend.backend_type, + config=config_dict, + principal=backend.principal_application, + model_uuid=model_uuid, + ) + + backend.enable_backend(self.client) + + def _build_storage_tfvars(self, model_uuid: str) -> dict: + """Build TerraformVarsStorageBackends for the new plan. + + Merges the internal-ceph backend and its HA principal into any + existing storage framework config so that third-party backends + (PureStorage, Hitachi, etc.) are preserved. + """ + # Read existing config to merge into + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + tfvars = {} + + tfvars.setdefault("model", model_uuid) + tfvars.setdefault("cinder-volumes", {}) + tfvars.setdefault("backends", {}) + + backend = InternalCephBackend() + + # --- cinder-volume HA principal entry --- + principal = PRINCIPAL_HA_APPLICATION + + # Only create the HA principal entry if it doesn't already exist; + # another HA backend (e.g. PureStorage) may have set it up already. + if principal not in tfvars["cinder-volumes"]: + storage_nodes = self.client.cluster.list_nodes_by_role( + Role.STORAGE.name.lower() + ) + machine_ids = sorted( + (str(node["machineid"]) for node in storage_nodes), key=int + ) + + cinder_volume_charm = self.manifest.core.software.charms.get( + CINDER_VOLUME_CHARM + ) + charm_config: dict = {} + charm_channel = None + charm_revision = None + if cinder_volume_charm: + charm_channel = cinder_volume_charm.channel + charm_revision = cinder_volume_charm.revision + if cinder_volume_charm.config: + charm_config.update(cinder_volume_charm.config) + + charm_config["snap-name"] = backend.snap_name + + cinder_volume_entry = { + "application_name": principal, + "charm_channel": charm_channel, + "charm_revision": charm_revision, + "charm_config": charm_config, + "machine_ids": machine_ids, + "endpoint_bindings": [ + {"space": self.deployment.get_space(Networks.MANAGEMENT)}, + { + "endpoint": "amqp", + "space": self.deployment.get_space(Networks.INTERNAL), + }, + { + "endpoint": "database", + "space": self.deployment.get_space(Networks.INTERNAL), + }, + { + "endpoint": "cinder-volume", + "space": self.deployment.get_space(Networks.MANAGEMENT), + }, + { + "endpoint": "identity-credentials", + "space": self.deployment.get_space(Networks.INTERNAL), + }, + { + "endpoint": "receive-ca-cert", + "space": self.deployment.get_space(Networks.INTERNAL), + }, + { + "endpoint": "storage-backend", + "space": self.deployment.get_space(Networks.INTERNAL), + }, + ], + } + + # Add control plane offers + try: + openstack_tfhelper = self.deployment.get_tfhelper("openstack-plan") + cinder_volume_entry.update( + get_mandatory_control_plane_offers(openstack_tfhelper) + ) + cinder_volume_entry.update( + get_optional_control_plane_offers(openstack_tfhelper) + ) + except Exception: + LOG.debug( + "Could not get control plane offers; " + "they will be populated on next apply", + exc_info=True, + ) + + # Telemetry flag + feature_manager = self.deployment.get_feature_manager() + cinder_volume_entry["enable-telemetry-notifications"] = ( + feature_manager.is_feature_enabled(self.deployment, "telemetry") + ) + + tfvars["cinder-volumes"][principal] = cinder_volume_entry + + # --- internal-ceph backend entry (always merged) --- + storage_nodes = self.client.cluster.list_nodes_by_role( + Role.STORAGE.name.lower() + ) + replication_count = ceph_replica_scale(len(storage_nodes)) + config = InternalCephConfig.model_validate( + {"ceph_osd_replication_count": replication_count}, by_name=True + ) + backend_tfvars = backend.build_terraform_vars( + self.deployment, + self.manifest, + INTERNAL_CEPH_BACKEND_NAME, + config, + ) + + tfvars["backends"][INTERNAL_CEPH_BACKEND_NAME] = backend_tfvars + + return tfvars + + def run(self, context: StepContext) -> Result: + """Execute the migration. + + Ordering is critical for retry safety: every clusterd write is + idempotent and happens BEFORE the irreversible old-state clear. + Any failure up to the final clear leaves the old plan state + intact, so `is_skip` will re-run the whole flow on retry. + """ + model_info = self.jhelper.get_model(self.model) + model_uuid = model_info["model-uuid"] + + # 1. Capture legacy IDs from the old state (read-only). + try: + legacy_import_ids = self._capture_legacy_import_ids() + update_config( + self.client, + STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY, + legacy_import_ids, + ) + except TerraformException as e: + LOG.error("Failed to capture legacy cinder-volume state IDs: %s", e) + return Result( + ResultType.FAILED, + f"Failed to capture legacy resource IDs: {e}", + ) + + # 2. Register internal-ceph in clusterd (idempotent). + try: + self._register_internal_ceph_backend(model_uuid) + except Exception as e: + LOG.error("Failed to register internal-ceph backend: %s", e) + return Result( + ResultType.FAILED, + f"Failed to register internal-ceph backend: {e}", + ) + + # 3. Populate TerraformVarsStorageBackends (idempotent merge). + try: + tfvars = self._build_storage_tfvars(model_uuid) + update_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY, tfvars) + except Exception as e: + LOG.error("Failed to populate storage backend tfvars: %s", e) + return Result( + ResultType.FAILED, + f"Failed to populate storage backend tfvars: {e}", + ) + + # 4. Clear the old plan state LAST — this is irreversible and + # is what makes `is_skip` return SKIPPED on subsequent runs. + try: + self._clear_old_state() + except TerraformException as e: + LOG.error("Failed to clear old cinder-volume plan state: %s", e) + return Result( + ResultType.FAILED, + f"Failed to clear old terraform state: {e}", + ) + + LOG.info( + "Successfully migrated cinder-volume terraform state " + "to the unified storage framework." + ) + return Result(ResultType.COMPLETED) diff --git a/sunbeam-python/sunbeam/storage/backends/internal_ceph/__init__.py b/sunbeam-python/sunbeam/storage/backends/internal_ceph/__init__.py new file mode 100644 index 000000000..4abcbe9b5 --- /dev/null +++ b/sunbeam-python/sunbeam/storage/backends/internal_ceph/__init__.py @@ -0,0 +1,8 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +"""Internal Ceph backend for Sunbeam storage.""" + +from sunbeam.storage.backends.internal_ceph.backend import InternalCephBackend + +__all__ = ["InternalCephBackend"] diff --git a/sunbeam-python/sunbeam/storage/backends/internal_ceph/backend.py b/sunbeam-python/sunbeam/storage/backends/internal_ceph/backend.py new file mode 100644 index 000000000..79e36fc40 --- /dev/null +++ b/sunbeam-python/sunbeam/storage/backends/internal_ceph/backend.py @@ -0,0 +1,128 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +"""Internal Ceph storage backend implementation. + +This backend deploys cinder-volume-ceph as a subordinate of the shared HA +cinder-volume principal and wires it to the locally-deployed microceph. + +It is managed exclusively by CephFeature and is not exposed via the +``sunbeam storage add`` CLI. +""" + +import logging +from typing import Annotated + +from pydantic import Field + +from sunbeam.core.deployment import Deployment, Networks +from sunbeam.core.manifest import StorageBackendConfig +from sunbeam.storage.base import ( + BackendIntegration, + HypervisorIntegration, + StorageBackendBase, +) + +LOG = logging.getLogger(__name__) + + +class InternalCephConfig(StorageBackendConfig): + """Configuration for the internal-ceph storage backend.""" + + ceph_osd_replication_count: Annotated[ + int, + Field( + default=1, + description="Ceph OSD replication count", + ), + ] + + +class InternalCephBackend(StorageBackendBase): + """Internal Ceph storage backend. + + Deploys cinder-volume-ceph as an HA-aware subordinate backend. + Declares an extra integration to microceph (ceph relation) and a + hypervisor integration for ceph-access. + """ + + backend_type = "internal-ceph" + display_name = "Internal Ceph" + generally_available = True + + @property + def charm_name(self) -> str: + """Return the charm name for this backend.""" + return "cinder-volume-ceph" + + @property + def charm_channel(self) -> str: + """Return the charm channel for this backend.""" + return "2024.1/stable" + + @property + def charm_revision(self) -> str | None: + """Return the charm revision for this backend.""" + return None + + @property + def charm_base(self) -> str: + """Return the charm base for this backend.""" + return "ubuntu@24.04" + + @property + def supports_ha(self) -> bool: + """Return whether this backend supports HA deployments.""" + return True + + def config_type(self) -> type[StorageBackendConfig]: + """Return the configuration class for internal ceph backend.""" + return InternalCephConfig + + def get_endpoint_bindings(self, deployment: Deployment) -> list[dict[str, str]]: + """Endpoint bindings for the cinder-volume-ceph charm. + + Includes the standard default space, ceph-access (MANAGEMENT) + and ceph (STORAGE) endpoints. + """ + return [ + {"space": deployment.get_space(Networks.MANAGEMENT)}, + { + "endpoint": "ceph-access", + "space": deployment.get_space(Networks.MANAGEMENT), + }, + { + "endpoint": "ceph", + "space": deployment.get_space(Networks.STORAGE), + }, + ] + + def get_application_name(self, backend_name: str) -> str: + """Return the Juju application name for the internal Ceph backend.""" + return self.charm_name + + def get_units(self) -> int | None: + """Return None so Terraform models the subordinate correctly.""" + return None + + def get_extra_integrations(self, deployment: Deployment) -> set[BackendIntegration]: + """Return the microceph ceph integration.""" + return { + BackendIntegration( + application_name="microceph", + endpoint_name="ceph", + backend_endpoint_name="ceph", + ) + } + + def get_hypervisor_integrations( + self, deployment: Deployment + ) -> set[HypervisorIntegration]: + """Return the ceph-access hypervisor integration.""" + return { + HypervisorIntegration( + application_name="cinder-volume-ceph", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph-access", + ) + } diff --git a/sunbeam-python/sunbeam/storage/base.py b/sunbeam-python/sunbeam/storage/base.py index fc8f96389..8349dcf75 100644 --- a/sunbeam-python/sunbeam/storage/base.py +++ b/sunbeam-python/sunbeam/storage/base.py @@ -10,6 +10,7 @@ import re import types import typing +from dataclasses import asdict, dataclass from pathlib import Path from typing import Any @@ -60,6 +61,86 @@ PRINCIPAL_HA_APPLICATION = "cinder-volume" PRINCIPAL_NON_HA_APPLICATION = "cinder-volume-noha" +STORAGE_TFPLAN = "storage-backend-plan" +STORAGE_TFPLAN_DIR = "deploy-storage-backend" + + +def register_storage_terraform_plan(deployment: Deployment) -> "TerraformHelper": + """Register the unified storage backend terraform plan with the deployment. + + Copies the plan source into the deployment's plan directory and wires + up a TerraformHelper on the deployment's tfhelpers map keyed by + STORAGE_TFPLAN. Returns the helper for convenience. + + This is the module-level entrypoint that callers outside of + StorageBackendBase should use (e.g. telemetry, upgrade steps). + Previously the equivalent logic was only reachable by instantiating + StorageBackendBase directly, which forced a ``# type: ignore`` and + coupled unrelated code to the abstract base. + """ + import shutil + + from sunbeam.core.terraform import TerraformHelper + + # Six parents walk from + # $SNAP/lib/python3.X/site-packages/sunbeam/storage/base.py + # back to $SNAP, which is where etc/deploy-storage lives in the + # packaged runtime. This path calculation intentionally mirrors + # the historical StorageBackendBase.register_terraform_plan so the + # snap layout continues to resolve correctly. + plan_source = ( + Path(__file__).parent.parent.parent.parent.parent.parent / "etc/deploy-storage" + ) + if not plan_source.exists(): + raise FileNotFoundError(f"Terraform plan not found at {plan_source}") + + dst = deployment.plans_directory / STORAGE_TFPLAN_DIR + shutil.copytree(plan_source, dst, dirs_exist_ok=True) + + env: dict[str, str] = {} + env.update(deployment._get_juju_clusterd_env()) + env.update(deployment.get_proxy_settings()) + + tfhelper = TerraformHelper( + path=dst, + plan=STORAGE_TFPLAN, + tfvar_map={}, + backend="http", + env=env, + clusterd_address=deployment.get_clusterd_http_address(), + ) + deployment._tfhelpers[STORAGE_TFPLAN] = tfhelper + return tfhelper + + +@dataclass(frozen=True) +class BackendIntegration: + """An additional juju integration a backend needs. + + Beyond the standard subordinate relation to cinder-volume. + """ + + application_name: str + endpoint_name: str + backend_endpoint_name: str + + def to_dict(self) -> dict[str, str]: + """Return the integration as a dictionary.""" + return asdict(self) + + +@dataclass(frozen=True) +class HypervisorIntegration: + """An integration the hypervisor needs with a storage backend.""" + + application_name: str + endpoint_name: str + hypervisor_endpoint_name: str + + def to_dict(self) -> dict[str, str]: + """Return the integration as a dictionary.""" + return asdict(self) + def validate_juju_application_name(name: str) -> bool: """Validate that a name is a valid Juju application name. @@ -115,8 +196,8 @@ class StorageBackendBase(FeatureGateMixin, typing.Generic[BackendConfig]): def __init__(self) -> None: """Initialize storage backend.""" - self.tfplan = "storage-backend-plan" - self.tfplan_dir = "deploy-storage-backend" + self.tfplan = STORAGE_TFPLAN + self.tfplan_dir = STORAGE_TFPLAN_DIR self._manifest: Manifest | None = None def check_enabled(self, client: Client | None, snap: Snap) -> bool: @@ -238,7 +319,7 @@ def create_deploy_step( def create_destroy_step( self, deployment: Deployment, - client, + client: Client, tfhelper: TerraformHelper, jhelper: JujuHelper, manifest: Manifest, @@ -259,43 +340,7 @@ def create_destroy_step( def register_terraform_plan(self, deployment: Deployment) -> None: """Register storage backend Terraform plan with deployment system.""" - import shutil - - from sunbeam.core.terraform import TerraformHelper - - # Get the plan source path - backend_self_contained = ( - Path(__file__).parent.parent.parent.parent.parent.parent - / "etc/deploy-storage" # / "backends" / self.name / self.tfplan_dir - ) - - if backend_self_contained.exists(): - plan_source = backend_self_contained - else: - raise FileNotFoundError( - f"Terraform plan not found at {backend_self_contained}" - ) - - # Copy plan to deployment's plans directory - dst = deployment.plans_directory / self.tfplan_dir - shutil.copytree(plan_source, dst, dirs_exist_ok=True) - - # Create TerraformHelper - env = {} - env.update(deployment._get_juju_clusterd_env()) - env.update(deployment.get_proxy_settings()) - - tfhelper = TerraformHelper( - path=dst, - plan=self.tfplan, - tfvar_map={}, - backend="http", - env=env, - clusterd_address=deployment.get_clusterd_http_address(), - ) - - # Register the helper with the deployment's tfhelpers - deployment._tfhelpers[self.tfplan] = tfhelper + register_storage_terraform_plan(deployment) def add_backend_instance( self, @@ -630,6 +675,27 @@ def get_endpoint_bindings(self, deployment: Deployment) -> list[dict[str, str]]: }, ] + def get_extra_integrations(self, deployment: Deployment) -> set[BackendIntegration]: + """Additional juju integrations the backend needs. + + Beyond the standard subordinate relation to cinder-volume. + """ + return set() + + def get_application_name(self, backend_name: str) -> str: + """Return the Juju application name used for this backend.""" + return backend_name + + def get_units(self) -> int | None: + """Return the requested unit count for this backend application.""" + return 1 + + def get_hypervisor_integrations( + self, deployment: Deployment + ) -> set[HypervisorIntegration]: + """Integrations the hypervisor needs with this backend.""" + return set() + def build_terraform_vars( self, deployment: Deployment, @@ -637,7 +703,7 @@ def build_terraform_vars( backend_name: str, config: BackendConfig, ) -> dict[str, Any]: - """Generate Terraform variables for Pure Storage backend deployment.""" + """Generate Terraform variables for this storage backend deployment.""" # Map our configuration fields to the correct charm configuration option names config_dict = config.model_dump(exclude_none=True, by_alias=True) @@ -674,6 +740,8 @@ def build_terraform_vars( # Build Terraform variables to match the plan's expected format tfvars = { + "application_name": self.get_application_name(backend_name), + "units": self.get_units(), "principal_application": self.principal_application, "charm_name": self.charm_name, "charm_base": self.charm_base, @@ -684,6 +752,9 @@ def build_terraform_vars( "secrets": secret_fields, } + extra_integrations = self.get_extra_integrations(deployment) + tfvars["extra_integrations"] = [i.to_dict() for i in extra_integrations] + return tfvars # Common utility methods (Abstraction 2: IP/FQDN validation) diff --git a/sunbeam-python/sunbeam/storage/manager.py b/sunbeam-python/sunbeam/storage/manager.py index 9e632203d..098e69def 100644 --- a/sunbeam-python/sunbeam/storage/manager.py +++ b/sunbeam-python/sunbeam/storage/manager.py @@ -12,11 +12,12 @@ from rich.table import Table from snaphelpers import Snap +from sunbeam.clusterd.client import Client from sunbeam.core.deployment import Deployment from sunbeam.core.juju import JujuHelper from sunbeam.core.manifest import StorageInstanceManifest from sunbeam.errors import SunbeamException -from sunbeam.storage.base import StorageBackendBase +from sunbeam.storage.base import HypervisorIntegration, StorageBackendBase from sunbeam.storage.models import BackendNotFoundException, StorageBackendInfo from sunbeam.storage.service import StorageBackendService @@ -322,3 +323,44 @@ def list_principal_applications( principal_apps.append(model_app_tuple) return principal_apps + + def collect_hypervisor_integrations( + self, + deployment: Deployment, + client: Client, + ) -> set[HypervisorIntegration]: + """Collect hypervisor integrations from all registered backends. + + Queries clusterd for registered storage backend instances, then + for each registered backend type that exists in the manager's + loaded backends, calls get_hypervisor_integrations() and returns + the union of all results. + + Args: + deployment: The current deployment. + client: The clusterd client. + + Returns: + Set of HypervisorIntegration from all registered backends. + """ + integrations: set[HypervisorIntegration] = set() + + registered = client.cluster.get_storage_backends() + + # Collect the unique backend types that have instances deployed + registered_types: set[str] = set() + for backend in registered.root: + registered_types.add(backend.type) + + for backend_type in registered_types: + backend_impl = self._backends.get(backend_type) + if backend_impl is None: + LOG.debug( + "Backend type %r registered in clusterd but not " + "loaded in manager, skipping", + backend_type, + ) + continue + integrations.update(backend_impl.get_hypervisor_integrations(deployment)) + + return integrations diff --git a/sunbeam-python/sunbeam/storage/steps.py b/sunbeam-python/sunbeam/storage/steps.py index 601971900..3b8117e0c 100644 --- a/sunbeam-python/sunbeam/storage/steps.py +++ b/sunbeam-python/sunbeam/storage/steps.py @@ -18,6 +18,7 @@ from sunbeam.clusterd.client import Client from sunbeam.clusterd.service import ( ConfigItemNotFoundException, + NodeNotExistInClusterException, StorageBackendNotFoundException, ) from sunbeam.core.common import ( @@ -32,6 +33,7 @@ ) from sunbeam.core.deployment import Deployment, Networks from sunbeam.core.juju import ( + ApplicationNotFoundException, ControllerNotFoundException, ControllerNotReachableException, JujuException, @@ -47,17 +49,12 @@ load_answers, write_answers, ) +from sunbeam.core.steps import RemoveMachineUnitsStep from sunbeam.core.terraform import ( TerraformException, TerraformHelper, TerraformStateLockedException, ) -from sunbeam.steps.cinder_volume import ( - APPLICATION, - CINDER_VOLUME_APP_TIMEOUT, - get_mandatory_control_plane_offers, - get_optional_control_plane_offers, -) from sunbeam.storage.models import SecretDictField from sunbeam.versions import CINDER_VOLUME_CHARM @@ -67,6 +64,38 @@ LOG = logging.getLogger(__name__) console = Console() +CINDER_VOLUME_APP_TIMEOUT = 1200 + + +def get_mandatory_control_plane_offers( + tfhelper: TerraformHelper, +) -> dict[str, str | None]: + """Get mandatory control plane offers.""" + openstack_tf_output = tfhelper.output() + + tfvars = { + "keystone-offer-url": openstack_tf_output.get("keystone-offer-url"), + "database-offer-url": openstack_tf_output.get( + "cinder-volume-database-offer-url" + ), + "amqp-offer-url": openstack_tf_output.get("rabbitmq-offer-url"), + } + return tfvars + + +def get_optional_control_plane_offers( + tfhelper: TerraformHelper, +) -> dict[str, str | None]: + """Get optional control plane offers.""" + openstack_tf_output = tfhelper.output() + + tfvars = { + "cert-distributor-offer-url": openstack_tf_output.get( + "cert-distributor-offer-url" + ), + } + return tfvars + class ValidateStoragePrerequisitesStep(BaseStep): """Validate that Sunbeam is bootstrapped and storage role is deployed.""" @@ -165,25 +194,6 @@ def run(self, context: StepContext) -> Result: "before deploying storage backends.", ) - # 4. Check if cinder-volume application exists in OpenStack model - try: - cinder_volume_app = self.jhelper.get_application( - "cinder-volume", self.OPENSTACK_MACHINE_MODEL - ) - if not cinder_volume_app: - return Result( - ResultType.FAILED, - "cinder-volume application not found in OpenStack model. " - "Please deploy OpenStack storage services first.", - ) - except Exception as e: - LOG.debug(f"Failed to check cinder-volume application: {e}") - return Result( - ResultType.FAILED, - "Unable to verify cinder-volume application. " - "Please ensure OpenStack storage services are deployed.", - ) - return Result(ResultType.COMPLETED) except Exception as e: @@ -548,14 +558,14 @@ def run(self, context: StepContext) -> Result: LOG.warning(f"No configuration found for backend {self.backend_name}") tfvars = {} - backends = tfvars.get("backends", {}) + backends = tfvars.setdefault("backends", {}) # Drop backend from current configuration backends.pop(self.backend_name, None) # For removal: update config and apply atomically LOG.info(f"Performing removal for backend {self.backend_name}") - LOG.info(f"Remaining backends after removal: {list(tfvars['backends'].keys())}") + LOG.info(f"Remaining backends after removal: {list(backends.keys())}") # First update the configuration update_config( @@ -567,8 +577,7 @@ def run(self, context: StepContext) -> Result: try: LOG.info( - f"Writing Terraform variables with backends: " - f"{list(tfvars.get('backends', {}).keys())}" + f"Writing Terraform variables with backends: {list(backends.keys())}" ) self.tfhelper.update_tfvars_and_apply_tf( self.client, @@ -582,7 +591,6 @@ def run(self, context: StepContext) -> Result: LOG.debug("Error: Terraform state locked") raise e except TerraformException: - # Restore the backend configuration if apply fails LOG.debug("Terraform apply failed", exc_info=True) return Result( ResultType.FAILED, @@ -648,20 +656,17 @@ def __init__( def is_skip(self, context: StepContext) -> Result: """Determine if the step should be skipped. + Always proceed when storage nodes exist so that `run()` can + refresh `machine_ids` for scale-out. Previously this skipped + when a principal entry already existed, which prevented a + newly-joined storage node from getting a cinder-volume unit. + Returns: Result indicating whether to skip the step. """ nodes = self.client.cluster.list_nodes_by_role(Role.STORAGE.name.lower()) if not nodes: return Result(ResultType.FAILED, "No storage nodes found in the cluster.") - # For faster checks, skip if currently deployed backend - # supports main cinder-volume - if self.backend_instance.principal_application == APPLICATION: - return Result( - ResultType.SKIPPED, - f"Backend {self.backend_name} supports main cinder-volume;" - " skipping specific cinder-volume deployment.", - ) return Result(ResultType.COMPLETED) @@ -688,16 +693,10 @@ def run(self, context: StepContext) -> Result: tfvars = {} application_name = self.backend_instance.principal_application - machine_ids = ( - tfvars.get("cinder-volumes", {}) - .get(application_name, {}) - .get("machine_ids") - ) - if not machine_ids: - nodes = self.client.cluster.list_nodes_by_role(Role.STORAGE.name.lower()) - machine_ids = sorted((node["machineid"] for node in nodes), key=int) - if not self.backend_instance.supports_ha: - machine_ids = machine_ids[:1] + nodes = self.client.cluster.list_nodes_by_role(Role.STORAGE.name.lower()) + machine_ids = sorted((node["machineid"] for node in nodes), key=int) + if not self.backend_instance.supports_ha: + machine_ids = machine_ids[:1] if not tfvars.get("model"): tfvars["model"] = self.jhelper.get_model(self.model)["model-uuid"] @@ -832,24 +831,43 @@ def __init__( def is_skip(self, context: StepContext) -> Result: """Determine if the step should be skipped. + Skip when another backend still uses the same principal + application, or when there is nothing to destroy (the + principal entry does not exist in cinder-volumes tfvars). + Returns: Result indicating whether to skip the step. """ - if self.backend_instance.principal_application == APPLICATION: + try: + tfvars = read_config(self.client, self.backend_instance.tfvar_config_key) + except ConfigItemNotFoundException: return Result( ResultType.SKIPPED, - f"Backend {self.backend_name} does not use specific cinder-volume;" - " skipping specific cinder-volume destruction.", + "No storage configuration found; nothing to destroy.", ) - backends = self.client.cluster.get_storage_backends() - for backend in backends.root: - if self.backend_instance.principal_application == backend.principal: + + principal = self.backend_instance.principal_application + + # Check if any OTHER backend uses the same principal application + backends = tfvars.get("backends", {}) + for name, backend_vars in backends.items(): + if name == self.backend_name: + continue + if backend_vars.get("principal_application") == principal: return Result( ResultType.SKIPPED, - "Another backend is using the same cinder-volume instance;" - " skipping specific cinder-volume destruction.", + f"Another backend {name!r} still uses principal" + f" {principal!r}; skipping destruction.", ) + # No other backend needs the principal; check if it exists + if principal not in tfvars.get("cinder-volumes", {}): + return Result( + ResultType.SKIPPED, + f"Principal {principal!r} not found in cinder-volumes;" + " nothing to destroy.", + ) + return Result(ResultType.COMPLETED) def run(self, context: StepContext) -> Result: @@ -893,3 +911,178 @@ def run(self, context: StepContext) -> Result: def get_application_timeout(self) -> int: """Return application timeout in seconds.""" return CINDER_VOLUME_APP_TIMEOUT # 20 minutes, same as cinder-volume + + +STORAGE_BACKEND_TFVAR_CONFIG_KEY = "TerraformVarsStorageBackends" + + +class ReapplyStorageBackendTerraformPlanStep(BaseStep): + """Reapply the storage-backend Terraform plan. + + This step re-applies the storage-backend plan using the existing + Terraform variables stored in clusterd. It is used during upgrades + to pick up charm channel / revision changes without rebuilding the + full configuration from scratch. + """ + + def __init__( + self, + deployment: Deployment, + client: Client, + tfhelper: TerraformHelper, + jhelper: JujuHelper, + manifest: Manifest, + model: str, + ): + super().__init__( + "Reapply Storage Backend Terraform plan", + "Reapplying Storage Backend Terraform plan", + ) + self.deployment = deployment + self.client = client + self.tfhelper = tfhelper + self.jhelper = jhelper + self.manifest = manifest + self.model = model + + def is_skip(self, context: StepContext) -> Result: + """Skip when no storage backends are configured.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + return Result(ResultType.SKIPPED, "No storage backends configured.") + + if not tfvars.get("backends") and not tfvars.get("cinder-volumes"): + return Result(ResultType.SKIPPED, "No storage backends configured.") + + return Result(ResultType.COMPLETED) + + @tenacity.retry( + wait=tenacity.wait_fixed(60), + stop=tenacity.stop_after_delay(300), + retry=tenacity.retry_if_exception_type(TerraformStateLockedException), + retry_error_callback=friendly_terraform_lock_retry_callback, + ) + def run(self, context: StepContext) -> Result: + """Reapply the storage backend Terraform plan.""" + try: + tfvars = read_config(self.client, STORAGE_BACKEND_TFVAR_CONFIG_KEY) + except ConfigItemNotFoundException: + LOG.debug("No storage backend config found, nothing to reapply.") + return Result(ResultType.COMPLETED) + + try: + self.tfhelper.update_tfvars_and_apply_tf( + self.client, + self.manifest, + tfvar_config=STORAGE_BACKEND_TFVAR_CONFIG_KEY, + override_tfvars=tfvars, + reporter=context.reporter, + ) + except TerraformStateLockedException: + raise + except TerraformException as e: + LOG.exception("Error reapplying storage backend plan") + return Result(ResultType.FAILED, str(e)) + + return Result(ResultType.COMPLETED) + + +CINDER_VOLUME_UNIT_TIMEOUT = 1800 # 30 minutes + + +class CheckStorageNodeRemovalStep(BaseStep): + """Check if a storage node can safely be removed. + + Prevents removing the last storage node when cinder-volume + is deployed, unless ``--force`` is specified. + """ + + def __init__( + self, + client: Client, + node_name: str, + jhelper: JujuHelper, + model: str, + force: bool = False, + ): + super().__init__( + "Check cinder-volume distribution", + "Checking if node hosts cinder-volume units", + ) + self.client = client + self.node = node_name + self.jhelper = jhelper + self.model = model + self.force = force + + def is_skip(self, context: StepContext) -> Result: + """Skip when the departing node is not a storage node.""" + try: + node_info = self.client.cluster.get_node_info(self.node) + except NodeNotExistInClusterException: + return Result( + ResultType.SKIPPED, + f"Node {self.node} is not found in the cluster", + ) + + if Role.STORAGE.name.lower() not in node_info.get("role", ""): + LOG.debug("Node %s is not a storage node", self.node) + return Result(ResultType.SKIPPED) + + # Check if cinder-volume application exists + try: + app = self.jhelper.get_application("cinder-volume", self.model) + except ApplicationNotFoundException: + LOG.debug("cinder-volume application not deployed") + return Result(ResultType.SKIPPED) + + # Check if this node hosts a cinder-volume unit + machine_id = str(node_info.get("machineid")) + for unit_name, unit in app.units.items(): + if unit.machine == machine_id: + LOG.debug("Unit %s is running on node %s", unit_name, self.node) + break + else: + LOG.debug("No cinder-volume units found on %s", self.node) + return Result(ResultType.SKIPPED) + + return Result(ResultType.COMPLETED) + + def run(self, context: StepContext) -> Result: + """Check whether removal would leave cinder-volume without nodes.""" + nb_storage_nodes = len(self.client.cluster.list_nodes_by_role("storage")) + if nb_storage_nodes <= 1 and not self.force: + return Result( + ResultType.FAILED, + "Cannot remove the last storage node hosting cinder-volume." + " Use --force to override; volume capabilities will be lost.", + ) + + return Result(ResultType.COMPLETED) + + +class RemoveStorageMachineUnitsStep(RemoveMachineUnitsStep): + """Remove cinder-volume units from a departing storage node.""" + + def __init__( + self, + client: Client, + node_name: str, + jhelper: JujuHelper, + model: str, + ): + super().__init__( + client, + node_name, + jhelper, + STORAGE_BACKEND_TFVAR_CONFIG_KEY, + "cinder-volume", + model, + "Remove cinder-volume units", + "Removing cinder-volume units from departing node", + ) + + def get_unit_timeout(self) -> int: + """Return unit timeout in seconds.""" + return CINDER_VOLUME_UNIT_TIMEOUT diff --git a/sunbeam-python/sunbeam/versions.py b/sunbeam-python/sunbeam/versions.py index 425ddff0e..8882239d9 100644 --- a/sunbeam-python/sunbeam/versions.py +++ b/sunbeam-python/sunbeam/versions.py @@ -104,6 +104,7 @@ def determine_version() -> str: "k8s-plan": "deploy-k8s", "microceph-plan": "deploy-microceph", "microovn-plan": "deploy-microovn", + # Kept for upgrade migration (MigrateCinderVolumeToStorageFrameworkStep) "cinder-volume-plan": "deploy-cinder-volume", "openstack-plan": "deploy-openstack", "hypervisor-plan": "deploy-openstack-hypervisor", @@ -280,22 +281,6 @@ class VarMap(TypedDict, total=False): }, } } -DEPLOY_CINDER_VOLUME_TFVAR_MAP: VarMap = { - "charms": { - CINDER_VOLUME_CHARM: { - "channel": "charm_cinder_volume_channel", - "revision": "charm_cinder_volume_revision", - "config": "charm_cinder_volume_config", - }, - "cinder-volume-ceph": { - "channel": "charm_cinder_volume_ceph_channel", - "revision": "charm_cinder_volume_ceph_revision", - "config": "charm_cinder_volume_ceph_config", - }, - } -} - - MANIFEST_ATTRIBUTES_TFVAR_MAP: dict[str, VarMap] = { "sunbeam-machine-plan": DEPLOY_SUNBEAM_MACHINE_TFVAR_MAP, "k8s-plan": DEPLOY_K8S_TFVAR_MAP, @@ -303,5 +288,4 @@ class VarMap(TypedDict, total=False): "microovn-plan": DEPLOY_MICROOVN_TFVAR_MAP, "openstack-plan": DEPLOY_OPENSTACK_TFVAR_MAP, "hypervisor-plan": DEPLOY_OPENSTACK_HYPERVISOR_TFVAR_MAP, - "cinder-volume-plan": DEPLOY_CINDER_VOLUME_TFVAR_MAP, } diff --git a/sunbeam-python/tests/unit/sunbeam/core/test_terraform.py b/sunbeam-python/tests/unit/sunbeam/core/test_terraform.py index 4daf7d446..bb176dd6c 100644 --- a/sunbeam-python/tests/unit/sunbeam/core/test_terraform.py +++ b/sunbeam-python/tests/unit/sunbeam/core/test_terraform.py @@ -3,6 +3,7 @@ import functools import json +import subprocess from pathlib import Path from unittest.mock import MagicMock, Mock, patch @@ -83,6 +84,48 @@ def read_config(): class TestTerraformHelper: + def test_state_list_preserves_terraform_stderr(self, tmp_path): + """Terraform stderr is preserved in the raised exception.""" + tfhelper = TerraformHelper( + path=tmp_path, + plan="storage-backend-plan", + tfvar_map={}, + ) + + error = subprocess.CalledProcessError( + returncode=1, + cmd=["terraform", "state", "list"], + stderr="No state file was found!", + ) + + with patch("subprocess.run", side_effect=error): + with pytest.raises(TerraformException, match="No state file was found!"): + tfhelper.state_list() + + def test_import_resource_runs_non_interactively(self, tmp_path): + """Import should never prompt for input.""" + tfhelper = TerraformHelper( + path=tmp_path, + plan="storage-backend-plan", + tfvar_map={}, + ) + + with patch("subprocess.run") as mock_run: + tfhelper.import_resource("module.example.resource", "id-123") + + cmd = ( + mock_run.call_args.kwargs["args"] + if "args" in mock_run.call_args.kwargs + else mock_run.call_args.args[0] + ) + assert cmd == [ + tfhelper.terraform, + "import", + "-input=false", + "module.example.resource", + "id-123", + ] + def test_update_tfvars_and_apply_tf( self, mocker, diff --git a/sunbeam-python/tests/unit/sunbeam/features/ceph/test_ceph_feature.py b/sunbeam-python/tests/unit/sunbeam/features/ceph/test_ceph_feature.py new file mode 100644 index 000000000..7f07d30d1 --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/features/ceph/test_ceph_feature.py @@ -0,0 +1,631 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +import json +from unittest.mock import ANY, Mock, call, patch + +import click +import pytest + +from sunbeam.clusterd.service import ConfigItemNotFoundException +from sunbeam.core.ceph import CephDeploymentMode, SetCephProviderStep +from sunbeam.core.terraform import TerraformInitStep +from sunbeam.feature_manager import FeatureManager +from sunbeam.features.ceph import feature as ceph_feature +from sunbeam.features.ceph.microceph import ( + ConfigureMicrocephOSDStep, + DeployMicrocephApplicationStep, + DestroyMicrocephApplicationStep, +) +from sunbeam.provider.maas.steps import MaasConfigureMicrocephOSDStep +from sunbeam.steps.openstack import DeployControlPlaneStep +from sunbeam.storage.steps import ( + BaseStorageBackendDeployStep, + BaseStorageBackendDestroyStep, + DeploySpecificCinderVolumeStep, + DestroySpecificCinderVolumeStep, +) + + +class TestCephFeature: + def test_feature_is_discovered(self): + manager = FeatureManager() + + assert "ceph" in manager.features() + assert isinstance(manager.features()["ceph"], ceph_feature.CephFeature) + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + @patch.object(ceph_feature, "update_config") + def test_run_enable_plans(self, mock_update_config, _mock_jhelper, mock_run_plan): + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + deployment.get_client.return_value = Mock() + deployment.get_client.return_value.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + {"machineid": "2"}, + ] + + feature = ceph_feature.CephFeature() + with patch.object( + feature, + "_get_internal_ceph_backend", + ) as mock_get_backend: + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.config_key.return_value = "Storage-internal-ceph" + mock_backend.create_deploy_step.return_value = Mock( + spec=BaseStorageBackendDeployStep + ) + mock_get_backend.return_value = mock_backend + + feature.run_enable_plans(deployment, Mock(), False) + + steps = mock_run_plan.call_args.args[0] + + # First 3 steps: microceph deployment + assert isinstance(steps[0], SetCephProviderStep) + assert steps[0].wanted_mode == CephDeploymentMode.MICROCEPH + assert isinstance(steps[1], TerraformInitStep) + assert isinstance(steps[2], DeployMicrocephApplicationStep) + + # Next 5 steps: internal-ceph backend registration + assert isinstance(steps[3], TerraformInitStep) # storage backend tf init + assert isinstance(steps[4], TerraformInitStep) # openstack tf init + assert isinstance(steps[5], DeploySpecificCinderVolumeStep) + assert isinstance(steps[6], BaseStorageBackendDeployStep) + assert isinstance(steps[7], DeployControlPlaneStep) + + assert len(steps) == 8 + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + @patch.object(ceph_feature, "update_config") + def test_run_enable_plans_stores_config( + self, mock_update_config, _mock_jhelper, mock_run_plan + ): + """Verify that enable stores the InternalCephConfig in clusterd.""" + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + ] + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object( + feature, + "_get_internal_ceph_backend", + ) as mock_get_backend: + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.config_key.return_value = "Storage-internal-ceph" + mock_backend.create_deploy_step.return_value = Mock( + spec=BaseStorageBackendDeployStep + ) + mock_get_backend.return_value = mock_backend + + feature.run_enable_plans(deployment, Mock(), False) + + # ceph_replica_scale(2) == 2, stored with kebab-case alias + mock_update_config.assert_called_once_with( + client, + "Storage-internal-ceph", + {"ceph-osd-replication-count": 2}, + ) + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + def test_run_disable_plans(self, _mock_jhelper, mock_run_plan): + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with ( + patch.object( + feature, + "_get_internal_ceph_backend", + ) as mock_get_backend, + patch.object(feature, "update_feature_info") as mock_update_info, + ): + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.create_destroy_step.return_value = Mock( + spec=BaseStorageBackendDestroyStep + ) + mock_get_backend.return_value = mock_backend + + feature.run_disable_plans(deployment, False) + + # Phase 1: destroy backend (mode still MICROCEPH) + assert mock_run_plan.call_count == 4 + destroy_steps = mock_run_plan.call_args_list[0].args[0] + assert isinstance(destroy_steps[0], TerraformInitStep) + assert isinstance(destroy_steps[1], BaseStorageBackendDestroyStep) + assert isinstance(destroy_steps[2], DestroySpecificCinderVolumeStep) + + # Phase 2a: set mode to NONE + mode_steps = mock_run_plan.call_args_list[1].args[0] + assert len(mode_steps) == 1 + assert isinstance(mode_steps[0], SetCephProviderStep) + assert mode_steps[0].wanted_mode == CephDeploymentMode.NONE + + # Phase 2b: reapply control plane (now sees NoCephProvider) + cp_steps = mock_run_plan.call_args_list[2].args[0] + assert isinstance(cp_steps[0], TerraformInitStep) + assert isinstance(cp_steps[1], DeployControlPlaneStep) + + # Phase 3: destroy MicroCeph + mc_steps = mock_run_plan.call_args_list[3].args[0] + assert isinstance(mc_steps[0], TerraformInitStep) + assert isinstance(mc_steps[1], DestroyMicrocephApplicationStep) + + # The ceph_disabling marker is set before phase 1 and cleared + # after phase 3. + assert mock_update_info.call_args_list[0] == call( + client, {"ceph_disabling": "true"} + ) + assert mock_update_info.call_args_list[-1] == call( + client, {"ceph_disabling": "false"} + ) + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + def test_run_disable_plans_does_not_clear_marker_on_failure( + self, _mock_jhelper, mock_run_plan + ): + """Marker survives across retries when a phase raises. + + If a phase raises, the ceph_disabling marker stays set so the + next retry still short-circuits is_internal_ceph_enabled_feature_aware + and the retry re-enters each idempotent phase safely. + """ + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + + # Raise on the last phase (phase 3 destroy microceph). + def fake_run_plan(plan, *_args, **_kwargs): + if mock_run_plan.call_count >= 4: + raise RuntimeError("phase 3 failure") + + mock_run_plan.side_effect = fake_run_plan + + with ( + patch.object( + feature, + "_get_internal_ceph_backend", + ) as mock_get_backend, + patch.object(feature, "update_feature_info") as mock_update_info, + ): + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.create_destroy_step.return_value = Mock( + spec=BaseStorageBackendDestroyStep + ) + mock_get_backend.return_value = mock_backend + + with pytest.raises(RuntimeError): + feature.run_disable_plans(deployment, False) + + # Marker was set at the top. + assert mock_update_info.call_args_list[0] == call( + client, {"ceph_disabling": "true"} + ) + # Marker is NOT cleared because phase 3 raised. + for call_args in mock_update_info.call_args_list: + assert call_args != call(client, {"ceph_disabling": "false"}) + + @patch.object(ceph_feature.CephFeature, "run_enable_plans") + def test_enable_default_storage_skips_when_mode_does_not_require_internal_ceph( + self, mock_run_enable_plans + ): + deployment = Mock() + client = Mock() + client.cluster.get_config.return_value = json.dumps( + {"mode": CephDeploymentMode.NONE} + ) + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object(feature, "get_feature_info", return_value={}): + with patch.object(feature, "update_feature_info") as mock_update: + feature.enable_default_storage(deployment, False) + + mock_run_enable_plans.assert_not_called() + mock_update.assert_not_called() + + @patch.object(ceph_feature.CephFeature, "run_enable_plans") + def test_enable_default_storage_reconciles_when_mode_requires_internal_ceph( + self, mock_run_enable_plans + ): + deployment = Mock() + client = Mock() + client.cluster.get_config.return_value = json.dumps( + {"mode": CephDeploymentMode.MICROCEPH} + ) + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object(feature, "get_feature_info", return_value={}): + with patch.object(feature, "update_feature_info") as mock_update: + feature.enable_default_storage(deployment, False) + + mock_run_enable_plans.assert_called_once_with( + deployment, ANY, False, provider_kwargs={} + ) + mock_update.assert_called_once_with( + client, + { + "enabled": "true", + ceph_feature.DEFAULT_STORAGE_RECONCILED_KEY: "true", + }, + ) + + @patch.object(ceph_feature.CephFeature, "run_enable_plans") + def test_enable_default_storage_reconciles_optimistic_enabled_state( + self, mock_run_enable_plans + ): + deployment = Mock() + client = Mock() + client.cluster.get_config.return_value = json.dumps( + {"mode": CephDeploymentMode.MICROCEPH} + ) + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object( + feature, + "get_feature_info", + return_value={"enabled": "true"}, + ): + with patch.object(feature, "update_feature_info") as mock_update: + feature.enable_default_storage(deployment, False) + + mock_run_enable_plans.assert_called_once_with( + deployment, ANY, False, provider_kwargs={} + ) + mock_update.assert_called_once_with( + client, + { + "enabled": "true", + ceph_feature.DEFAULT_STORAGE_RECONCILED_KEY: "true", + }, + ) + + @patch.object(ceph_feature.CephFeature, "run_enable_plans") + def test_enable_default_storage_skips_when_fully_reconciled( + self, mock_run_enable_plans + ): + deployment = Mock() + client = Mock() + client.cluster.get_config.return_value = json.dumps( + {"mode": CephDeploymentMode.MICROCEPH} + ) + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object( + feature, + "get_feature_info", + return_value={ + "enabled": "true", + ceph_feature.DEFAULT_STORAGE_RECONCILED_KEY: "true", + }, + ): + with patch.object(feature, "update_feature_info") as mock_update: + feature.enable_default_storage(deployment, False) + + mock_run_enable_plans.assert_not_called() + mock_update.assert_not_called() + + @patch.object(ceph_feature.CephFeature, "run_enable_plans") + def test_enable_feature_marks_default_storage_reconciled( + self, mock_run_enable_plans + ): + deployment = Mock() + client = Mock() + client.cluster.get_config.return_value = json.dumps( + {"mode": CephDeploymentMode.MICROCEPH} + ) + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + feature_state = {} + click_context = Mock() + click_context.parent = None + + def fake_update_feature_info(_client, info): + feature_state.update(info) + + with patch.object(feature, "pre_enable"): + with patch.object( + feature, + "get_feature_info", + side_effect=lambda _client: feature_state.copy(), + ): + with patch.object( + feature, + "update_feature_info", + side_effect=fake_update_feature_info, + ): + with patch( + "sunbeam.features.interface.v1.base.click.get_current_context", + return_value=click_context, + ): + feature.enable_feature(deployment, Mock(), False) + feature.enable_default_storage(deployment, False) + + assert feature_state == { + "enabled": "true", + ceph_feature.DEFAULT_STORAGE_RECONCILED_KEY: "true", + } + assert mock_run_enable_plans.call_count == 1 + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + @patch.object(ceph_feature, "update_config") + def test_enable_default_storage_local_includes_local_disk_step( + self, _mock_update_config, _mock_jhelper, mock_run_plan + ): + deployment = Mock() + deployment.type = "local" + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + client.cluster.list_nodes_by_role.return_value = [{"machineid": "0"}] + client.cluster.get_config.side_effect = ConfigItemNotFoundException("missing") + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object(feature, "_get_internal_ceph_backend") as mock_get_backend: + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.config_key.return_value = "Storage-internal-ceph" + mock_backend.create_deploy_step.return_value = Mock( + spec=BaseStorageBackendDeployStep + ) + mock_get_backend.return_value = mock_backend + + feature.enable_default_storage( + deployment, + False, + node_name="node-1", + accept_defaults=True, + ) + + steps = mock_run_plan.call_args.args[0] + assert any(isinstance(step, ConfigureMicrocephOSDStep) for step in steps) + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "click", Mock()) + @patch.object(ceph_feature, "update_config") + def test_enable_default_storage_maas_includes_maas_disk_step( + self, _mock_update_config, _mock_jhelper, mock_run_plan + ): + deployment = Mock() + deployment.type = "maas" + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + client.cluster.list_nodes_by_role.return_value = [{"machineid": "0"}] + client.cluster.get_config.side_effect = ConfigItemNotFoundException("missing") + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + with patch.object(feature, "_get_internal_ceph_backend") as mock_get_backend: + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.config_key.return_value = "Storage-internal-ceph" + mock_backend.create_deploy_step.return_value = Mock( + spec=BaseStorageBackendDeployStep + ) + mock_get_backend.return_value = mock_backend + + feature.enable_default_storage( + deployment, + False, + maas_client=Mock(), + storage=["node-1"], + ) + + steps = mock_run_plan.call_args.args[0] + assert any(isinstance(step, MaasConfigureMicrocephOSDStep) for step in steps) + + +class TestCephFeatureGetBootstrapDeploySteps: + """Tests for CephFeature.get_bootstrap_deploy_steps. + + These steps are inlined into provider bootstrap / join plans to + guarantee that the microceph offer exists before + DeployControlPlaneStep reads data.juju_offer.microceph. + """ + + def _make_deployment(self): + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + deployment.get_client.return_value = Mock() + return deployment + + @patch.object(ceph_feature, "JujuHelper") + def test_returns_empty_when_disabled(self, _mock_jhelper): + feature = ceph_feature.CephFeature() + steps = feature.get_bootstrap_deploy_steps( + self._make_deployment(), + enabled=False, + expect_storage_node=True, + node_name="node-1", + ) + assert steps == [] + + @patch.object(ceph_feature, "JujuHelper") + def test_returns_empty_when_no_storage_expected(self, _mock_jhelper): + """Must return [] when no storage node is expected. + + Mirrors MicrocephProvider.get_control_plane_tfvars returning + enable-ceph=False when storage_node_count == 0. + """ + feature = ceph_feature.CephFeature() + steps = feature.get_bootstrap_deploy_steps( + self._make_deployment(), + enabled=True, + expect_storage_node=False, + ) + assert steps == [] + + @patch.object(ceph_feature, "JujuHelper") + def test_includes_terraform_init_and_microceph_deploy(self, _mock_jhelper): + feature = ceph_feature.CephFeature() + steps = feature.get_bootstrap_deploy_steps( + self._make_deployment(), + enabled=True, + expect_storage_node=True, + ) + assert len(steps) == 2 + assert isinstance(steps[0], TerraformInitStep) + assert isinstance(steps[1], DeployMicrocephApplicationStep) + + @patch.object(ceph_feature, "JujuHelper") + def test_appends_osd_step_when_node_name_given(self, _mock_jhelper): + feature = ceph_feature.CephFeature() + steps = feature.get_bootstrap_deploy_steps( + self._make_deployment(), + enabled=True, + expect_storage_node=True, + node_name="node-1", + ) + assert len(steps) == 3 + assert isinstance(steps[2], ConfigureMicrocephOSDStep) + + +class TestCephFeatureDisableForceFlag: + """Tests for the --force flag on disable_cmd.""" + + def _make_click_context(self, deployment): + """Create a click context with the deployment as obj.""" + ctx = click.Context(click.Command("test"), obj=deployment) + return ctx + + def test_disable_without_force_raises_error(self): + """disable_cmd without --force should raise ClickException.""" + feature = ceph_feature.CephFeature() + deployment = Mock() + + with self._make_click_context(deployment): + with pytest.raises(click.ClickException, match="data loss"): + # pass_method_obj injects deployment from click context + feature.disable_cmd.callback(feature, force=False, show_hints=False) + + def test_disable_with_force_proceeds(self): + """disable_cmd with --force should call disable_feature.""" + feature = ceph_feature.CephFeature() + deployment = Mock() + + with self._make_click_context(deployment): + with patch.object(feature, "disable_feature") as mock_disable: + # pass_method_obj injects deployment from click context + feature.disable_cmd.callback(feature, force=True, show_hints=False) + mock_disable.assert_called_once_with(deployment, False) + + def test_disable_cmd_has_force_option(self): + """disable_cmd should have a --force flag.""" + feature = ceph_feature.CephFeature() + cmd = feature.disable_cmd + param_names = [p.name for p in cmd.params] + assert "force" in param_names + + +class TestCephFeatureOnJoin: + """Tests for the on_join hook.""" + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + @patch.object(ceph_feature, "update_config") + def test_on_join_reconciles_storage_when_already_enabled( + self, _mock_update_config, _mock_jhelper, mock_run_plan + ): + """on_join reapplies storage backend when feature is already reconciled.""" + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + deployment.get_client.return_value = client + client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + ] + + feature = ceph_feature.CephFeature() + # Mark feature as already reconciled + feature.get_feature_info = Mock( + return_value={"enabled": "true", "default_storage_reconciled": "true"} + ) + + with patch.object(feature, "_get_internal_ceph_backend") as mock_get_backend: + mock_backend = Mock() + mock_backend.tfplan = "storage-backend-plan" + mock_backend.config_key.return_value = "Storage-internal-ceph" + mock_backend.create_deploy_step.return_value = Mock( + spec=BaseStorageBackendDeployStep + ) + mock_get_backend.return_value = mock_backend + + feature.on_join( + deployment, + {"name": "node-2", "role": ["storage"]}, + roles=["storage"], + ) + + # Should be called twice: once for MicroCeph, once for storage reconciliation + assert mock_run_plan.call_count == 2 + + # Second call should include storage backend steps + reconcile_steps = mock_run_plan.call_args_list[1].args[0] + assert any( + isinstance(s, DeploySpecificCinderVolumeStep) for s in reconcile_steps + ) + assert any(isinstance(s, DeployControlPlaneStep) for s in reconcile_steps) + + @patch.object(ceph_feature, "run_plan") + @patch.object(ceph_feature, "JujuHelper") + def test_on_join_skips_reconciliation_when_not_yet_enabled( + self, _mock_jhelper, mock_run_plan + ): + """on_join does not reconcile storage when feature not yet reconciled.""" + deployment = Mock() + deployment.openstack_machines_model = "openstack" + deployment.get_manifest.return_value = Mock() + client = Mock() + deployment.get_client.return_value = client + + feature = ceph_feature.CephFeature() + # Feature not yet reconciled + feature.get_feature_info = Mock(return_value={}) + + feature.on_join( + deployment, + {"name": "node-1", "role": ["storage"]}, + roles=["storage"], + ) + + # Only the MicroCeph plan should run + assert mock_run_plan.call_count == 1 diff --git a/sunbeam-python/tests/unit/sunbeam/features/maintenance/test_commands.py b/sunbeam-python/tests/unit/sunbeam/features/maintenance/test_commands.py index f8d9f959d..363d3c519 100644 --- a/sunbeam-python/tests/unit/sunbeam/features/maintenance/test_commands.py +++ b/sunbeam-python/tests/unit/sunbeam/features/maintenance/test_commands.py @@ -109,6 +109,10 @@ def test_dry_run_creates_watcher_step_with_correct_migration_params( patch("sunbeam.features.maintenance.commands.run_plan") as mock_run_plan, patch("sunbeam.features.maintenance.commands.JujuHelper"), patch("sunbeam.features.maintenance.commands.OperationViewer"), + patch( + "sunbeam.features.maintenance.commands.is_internal_ceph_enabled_feature_aware", + return_value=False, + ), ): # Set up mock class name for get_step_message mock_create_watcher_step.__name__ = "CreateWatcherHostMaintenanceAuditStep" diff --git a/sunbeam-python/tests/unit/sunbeam/features/test_base.py b/sunbeam-python/tests/unit/sunbeam/features/test_base.py index c31b6d9a6..29552554a 100644 --- a/sunbeam-python/tests/unit/sunbeam/features/test_base.py +++ b/sunbeam-python/tests/unit/sunbeam/features/test_base.py @@ -1,6 +1,7 @@ # SPDX-FileCopyrightText: 2023 - Canonical Ltd # SPDX-License-Identifier: Apache-2.0 +import logging from unittest.mock import Mock, patch import click @@ -311,6 +312,14 @@ def run_disable_plans(self, deployment) -> None: class TestEnableDisableFeature: + def test_on_join_default_hook(self, deployment): + feature = DummyFeature() + assert feature.on_join(deployment, "node-1") is None + + def test_on_depart_default_hook(self, deployment): + feature = DummyFeature() + assert feature.on_depart(deployment, "node-1") is None + def test_check_enabled_feature_is_compatible_with_compatible_requirement( self, deployment, mocker ): @@ -609,3 +618,75 @@ def test_is_feature_enabled_when_feature_is_not_enable_disable_type( match="Feature test_feature is not of type EnableDisable", ): manager.is_feature_enabled(deployment, "test_feature") + + def test_call_enabled_features_on_join(self, deployment): + manager = FeatureManager() + enabled_feature = Mock(spec=EnableDisableFeature, name="enabled_feature") + enabled_feature.name = "enabled_feature" + enabled_feature.is_enabled.return_value = True + disabled_feature = Mock(spec=EnableDisableFeature, name="disabled_feature") + disabled_feature.name = "disabled_feature" + disabled_feature.is_enabled.return_value = False + base_feature = Mock(spec=BaseFeature, name="base_feature") + + with patch.object( + manager, + "features", + return_value={ + "enabled_feature": enabled_feature, + "disabled_feature": disabled_feature, + "base_feature": base_feature, + }, + ): + manager.call_enabled_features_on_join( + deployment, "node-1", token="token-value" + ) + + enabled_feature.on_join.assert_called_once_with( + deployment, "node-1", token="token-value" + ) + disabled_feature.on_join.assert_not_called() + + def test_call_enabled_features_on_depart_propagates_hook_error(self, deployment): + manager = FeatureManager() + failing_feature = Mock(spec=EnableDisableFeature, name="failing_feature") + failing_feature.name = "failing_feature" + failing_feature.is_enabled.return_value = True + failing_feature.on_depart.side_effect = RuntimeError("hook failed") + + with patch.object( + manager, + "features", + return_value={"failing_feature": failing_feature}, + ): + with pytest.raises(RuntimeError, match="hook failed"): + manager.call_enabled_features_on_depart(deployment, "node-1") + + def test_call_enabled_features_on_depart_logs_and_continues_on_enable_error( + self, deployment, caplog + ): + manager = FeatureManager() + failing_feature = Mock(spec=EnableDisableFeature, name="failing_feature") + failing_feature.name = "failing_feature" + failing_feature.is_enabled.side_effect = RuntimeError("enablement failed") + succeeding_feature = Mock(spec=EnableDisableFeature, name="succeeding_feature") + succeeding_feature.name = "succeeding_feature" + succeeding_feature.is_enabled.return_value = True + + with patch.object( + manager, + "features", + return_value={ + "failing_feature": failing_feature, + "succeeding_feature": succeeding_feature, + }, + ): + with caplog.at_level(logging.DEBUG): + manager.call_enabled_features_on_depart(deployment, "node-1") + + failing_feature.on_depart.assert_not_called() + succeeding_feature.on_depart.assert_called_once_with(deployment, "node-1") + assert ( + "Failed to check if feature 'failing_feature' is enabled for hook " + "'on_depart'" + ) in caplog.text diff --git a/sunbeam-python/tests/unit/sunbeam/features/test_telemetry.py b/sunbeam-python/tests/unit/sunbeam/features/test_telemetry.py index acbdd3722..fdfbd6fda 100644 --- a/sunbeam-python/tests/unit/sunbeam/features/test_telemetry.py +++ b/sunbeam-python/tests/unit/sunbeam/features/test_telemetry.py @@ -1,10 +1,13 @@ # SPDX-FileCopyrightText: 2025 - Canonical Ltd # SPDX-License-Identifier: Apache-2.0 +import json from unittest.mock import Mock, patch import pytest +from sunbeam.clusterd.service import ConfigItemNotFoundException +from sunbeam.core.common import ResultType from sunbeam.features.telemetry import feature as telemetry_feature @@ -16,308 +19,307 @@ def deployment(): client = deploy.get_client.return_value client.cluster.list_nodes_by_role.return_value = [{"name": "node1", "machineid": 1}] + # Return empty config for metrics backend (no S3 offer configured) + client.cluster.get_config.return_value = json.dumps({}) return deploy -@pytest.fixture() -def mock_storage_backends(): - """Mock storage backends with different principal applications.""" - backend1 = Mock() - backend1.name = "backend1" - backend1.type = "type1" - backend1.principal = "cinder-volume-noha" - - backend2 = Mock() - backend2.name = "backend2" - backend2.type = "type2" - backend2.principal = "cinder-volume-noha" # Same principal as backend1 - - backend3 = Mock() - backend3.name = "backend3" - backend3.type = "type3" - backend3.principal = "cinder-volume" # Different principal +class TestUpdateCinderVolumeTelemetryTfvarsStep: + """Test the UpdateCinderVolumeTelemetryTfvarsStep.""" - return [backend1, backend2, backend3] - - -@pytest.fixture() -def mock_backend_instances(): - """Mock backend instances from StorageBackendManager.""" - instance1 = Mock() - instance1.principal_application = "cinder-volume-noha" - - instance2 = Mock() - instance2.principal_application = "cinder-volume-noha" + @patch("sunbeam.features.telemetry.feature.read_config") + @patch("sunbeam.features.telemetry.feature.update_config") + def test_run_enables_telemetry_on_all_cinder_volumes( + self, + mock_update_config, + mock_read_config, + step_context, + ): + """Enabling telemetry should set flag=True on every cinder-volume entry.""" + client = Mock() + mock_read_config.return_value = { + "backends": {"backend-a": {}}, + "cinder-volumes": { + "cinder-volume": {"application_name": "cinder-volume"}, + "cinder-volume-noha": {"application_name": "cinder-volume-noha"}, + }, + } + + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=True + ) + result = step.run(step_context) + + assert result.result_type == ResultType.COMPLETED + written = mock_update_config.call_args[0][2] + for entry in written["cinder-volumes"].values(): + assert entry["enable-telemetry-notifications"] is True + + @patch("sunbeam.features.telemetry.feature.read_config") + @patch("sunbeam.features.telemetry.feature.update_config") + def test_run_disables_telemetry_on_all_cinder_volumes( + self, + mock_update_config, + mock_read_config, + step_context, + ): + """Disabling telemetry should set flag=False on every cinder-volume entry.""" + client = Mock() + mock_read_config.return_value = { + "backends": {"backend-a": {}}, + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "enable-telemetry-notifications": True, + }, + }, + } + + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=False + ) + result = step.run(step_context) + + assert result.result_type == ResultType.COMPLETED + written = mock_update_config.call_args[0][2] + assert ( + written["cinder-volumes"]["cinder-volume"]["enable-telemetry-notifications"] + is False + ) + + @patch("sunbeam.features.telemetry.feature.read_config") + def test_is_skip_when_no_config(self, mock_read_config, step_context): + """Step should skip when no storage backend config exists.""" + client = Mock() + mock_read_config.side_effect = ConfigItemNotFoundException("not found") + + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=True + ) + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + @patch("sunbeam.features.telemetry.feature.read_config") + def test_is_skip_when_no_cinder_volumes(self, mock_read_config, step_context): + """Step should skip when cinder-volumes is empty.""" + client = Mock() + mock_read_config.return_value = { + "backends": {"backend-a": {}}, + "cinder-volumes": {}, + } + + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=True + ) + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + @patch("sunbeam.features.telemetry.feature.read_config") + def test_is_skip_returns_completed_when_entries_exist( + self, mock_read_config, step_context + ): + """Step should not skip when cinder-volume entries exist.""" + client = Mock() + mock_read_config.return_value = { + "cinder-volumes": {"cinder-volume": {"application_name": "cinder-volume"}}, + } + + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=True + ) + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + @patch("sunbeam.features.telemetry.feature.read_config") + @patch("sunbeam.features.telemetry.feature.update_config") + def test_run_completes_when_no_cinder_volumes( + self, mock_update_config, mock_read_config, step_context + ): + """Run should complete gracefully when no cinder-volume entries.""" + client = Mock() + mock_read_config.return_value = {"backends": {}, "cinder-volumes": {}} - instance3 = Mock() - instance3.principal_application = "cinder-volume" + step = telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep( + client, enable=True + ) + result = step.run(step_context) - return { - "type1": instance1, - "type2": instance2, - "type3": instance3, - } + assert result.result_type == ResultType.COMPLETED + mock_update_config.assert_not_called() -class TestTelemetryFeatureDeduplication: - """Test deduplication logic in telemetry feature enable/disable plans.""" +class TestTelemetryFeatureEnableDisablePlans: + """Test that enable/disable plans use the correct storage backend plan name.""" + @patch("sunbeam.features.telemetry.feature.register_storage_terraform_plan") @patch("sunbeam.features.telemetry.feature.JujuHelper") - @patch("sunbeam.features.telemetry.feature.StorageBackendManager") - @patch("sunbeam.features.telemetry.feature.DeploySpecificCinderVolumeStep") @patch("sunbeam.features.telemetry.feature.run_plan") - def test_run_enable_plans_deduplicates_shared_principals( + def test_run_enable_plans_uses_storage_backend_plan( self, mock_run_plan, - mock_deploy_step_class, - mock_storage_manager_class, mock_jhelper_class, + mock_register_storage, deployment, - mock_storage_backends, - mock_backend_instances, ): - """Test that enable plans deduplicates backends sharing the same principal.""" - # Setup mocks - client = deployment.get_client.return_value - storage_backends_root = Mock() - storage_backends_root.root = mock_storage_backends - client.cluster.get_storage_backends.return_value = storage_backends_root - - # Mock StorageBackendManager - mock_storage_manager = mock_storage_manager_class.return_value - mock_storage_manager.backends.return_value = mock_backend_instances - - # Mock tfhelpers + """Enable plans should use storage-backend-plan, not storage-plan.""" tfhelper = Mock() tfhelper_openstack = Mock() tfhelper_openstack.output.return_value = {"ceilometer-offer-url": "url"} tfhelper_hypervisor = Mock() - tfhelper_cinder_volume = Mock() tfhelper_storage = Mock() deployment.get_tfhelper.side_effect = lambda plan: { "telemetry-plan": tfhelper, "openstack-plan": tfhelper_openstack, "hypervisor-plan": tfhelper_hypervisor, - "cinder-volume-plan": tfhelper_cinder_volume, - "storage-plan": tfhelper_storage, + "storage-backend-plan": tfhelper_storage, }[plan] - # Create feature and run enable plans feature = telemetry_feature.TelemetryFeature() feature._manifest = Mock() feature.run_enable_plans(deployment, Mock(), False) - # Verify DeploySpecificCinderVolumeStep was called only twice - # (once for cinder-volume-noha, once for cinder-volume) - # NOT three times (which would be without deduplication) - assert mock_deploy_step_class.call_count == 2 - - # Verify the principals that were processed - principals_processed = set() - for call in mock_deploy_step_class.call_args_list: - backend_instance = call[0][6] # 7th positional arg is backend_instance - principals_processed.add(backend_instance.principal_application) - - assert principals_processed == {"cinder-volume-noha", "cinder-volume"} + # Verify that get_tfhelper was called with "storage-backend-plan" + # (it should NOT raise KeyError for "storage-plan") + calls = deployment.get_tfhelper.call_args_list + plan_names = [call[0][0] for call in calls] + assert "storage-backend-plan" in plan_names + assert "storage-plan" not in plan_names + @patch("sunbeam.features.telemetry.feature.register_storage_terraform_plan") @patch("sunbeam.features.telemetry.feature.JujuHelper") - @patch("sunbeam.features.telemetry.feature.StorageBackendManager") - @patch("sunbeam.features.telemetry.feature.DeploySpecificCinderVolumeStep") @patch("sunbeam.features.telemetry.feature.run_plan") - def test_run_disable_plans_deduplicates_shared_principals( + def test_run_disable_plans_uses_storage_backend_plan( self, mock_run_plan, - mock_deploy_step_class, - mock_storage_manager_class, mock_jhelper_class, + mock_register_storage, deployment, - mock_storage_backends, - mock_backend_instances, ): - """Test that disable plans deduplicates backends sharing the same principal.""" - # Setup mocks - client = deployment.get_client.return_value - storage_backends_root = Mock() - storage_backends_root.root = mock_storage_backends - client.cluster.get_storage_backends.return_value = storage_backends_root - - # Mock StorageBackendManager - mock_storage_manager = mock_storage_manager_class.return_value - mock_storage_manager.backends.return_value = mock_backend_instances - - # Mock tfhelpers + """Disable plans should use storage-backend-plan, not storage-plan.""" tfhelper = Mock() tfhelper.state_list.return_value = [] tfhelper_openstack = Mock() tfhelper_hypervisor = Mock() - tfhelper_cinder_volume = Mock() tfhelper_storage = Mock() deployment.get_tfhelper.side_effect = lambda plan: { "telemetry-plan": tfhelper, "openstack-plan": tfhelper_openstack, "hypervisor-plan": tfhelper_hypervisor, - "cinder-volume-plan": tfhelper_cinder_volume, - "storage-plan": tfhelper_storage, + "storage-backend-plan": tfhelper_storage, }[plan] - # Create feature and run disable plans feature = telemetry_feature.TelemetryFeature() feature._manifest = Mock() feature.run_disable_plans(deployment, False) - # Verify DeploySpecificCinderVolumeStep was called only twice - # (once for cinder-volume-noha, once for cinder-volume) - assert mock_deploy_step_class.call_count == 2 - - # Verify the principals that were processed - principals_processed = set() - for call in mock_deploy_step_class.call_args_list: - backend_instance = call[0][6] # 7th positional arg is backend_instance - principals_processed.add(backend_instance.principal_application) - - assert principals_processed == {"cinder-volume-noha", "cinder-volume"} + calls = deployment.get_tfhelper.call_args_list + plan_names = [call[0][0] for call in calls] + assert "storage-backend-plan" in plan_names + assert "storage-plan" not in plan_names + @patch("sunbeam.features.telemetry.feature.register_storage_terraform_plan") @patch("sunbeam.features.telemetry.feature.JujuHelper") - @patch("sunbeam.features.telemetry.feature.StorageBackendManager") @patch("sunbeam.features.telemetry.feature.run_plan") - def test_run_enable_plans_no_storage_backends( + def test_run_enable_plans_includes_update_and_reapply_steps( self, mock_run_plan, - mock_storage_manager_class, mock_jhelper_class, + mock_register_storage, deployment, ): - """Test that enable plans works when there are no storage backends.""" - # Setup mocks - client = deployment.get_client.return_value - storage_backends_root = Mock() - storage_backends_root.root = [] # No backends - client.cluster.get_storage_backends.return_value = storage_backends_root - - # Mock tfhelpers - tfhelper = Mock() - tfhelper_openstack = Mock() - tfhelper_openstack.output.return_value = {"ceilometer-offer-url": "url"} - tfhelper_hypervisor = Mock() - tfhelper_cinder_volume = Mock() - - deployment.get_tfhelper.side_effect = lambda plan: { - "telemetry-plan": tfhelper, - "openstack-plan": tfhelper_openstack, - "hypervisor-plan": tfhelper_hypervisor, - "cinder-volume-plan": tfhelper_cinder_volume, - }[plan] + """Enable plan3 should include update and reapply steps. - # Create feature and run enable plans - feature = telemetry_feature.TelemetryFeature() - feature._manifest = Mock() - feature.run_enable_plans(deployment, Mock(), False) + Checks for UpdateCinderVolumeTelemetryTfvarsStep and + ReapplyStorageBackendTerraformPlanStep. + """ + from sunbeam.storage.steps import ReapplyStorageBackendTerraformPlanStep - # Verify run_plan was called for plan1 and plan2, but not plan3 - # (plan3 is for storage backends which we don't have) - assert mock_run_plan.call_count == 2 - - @patch("sunbeam.features.telemetry.feature.JujuHelper") - @patch("sunbeam.features.telemetry.feature.StorageBackendManager") - @patch("sunbeam.features.telemetry.feature.DeploySpecificCinderVolumeStep") - @patch("sunbeam.features.telemetry.feature.run_plan") - def test_run_enable_plans_passes_extra_tfvars( - self, - mock_run_plan, - mock_deploy_step_class, - mock_storage_manager_class, - mock_jhelper_class, - deployment, - mock_storage_backends, - mock_backend_instances, - ): - """Test that enable plans passes correct extra_tfvars to steps.""" - # Setup mocks - client = deployment.get_client.return_value - storage_backends_root = Mock() - storage_backends_root.root = mock_storage_backends - client.cluster.get_storage_backends.return_value = storage_backends_root - - # Mock StorageBackendManager - mock_storage_manager = mock_storage_manager_class.return_value - mock_storage_manager.backends.return_value = mock_backend_instances - - # Mock tfhelpers tfhelper = Mock() tfhelper_openstack = Mock() tfhelper_openstack.output.return_value = {"ceilometer-offer-url": "url"} tfhelper_hypervisor = Mock() - tfhelper_cinder_volume = Mock() tfhelper_storage = Mock() deployment.get_tfhelper.side_effect = lambda plan: { "telemetry-plan": tfhelper, "openstack-plan": tfhelper_openstack, "hypervisor-plan": tfhelper_hypervisor, - "cinder-volume-plan": tfhelper_cinder_volume, - "storage-plan": tfhelper_storage, + "storage-backend-plan": tfhelper_storage, }[plan] - # Create feature and run enable plans feature = telemetry_feature.TelemetryFeature() feature._manifest = Mock() feature.run_enable_plans(deployment, Mock(), False) - # Verify all DeploySpecificCinderVolumeStep calls have correct extra_tfvars - for call in mock_deploy_step_class.call_args_list: - extra_tfvars = call[1]["extra_tfvars"] - assert extra_tfvars == {"enable-telemetry-notifications": True} - + # run_plan is called 3 times: plan1, plan2, plan3 + assert mock_run_plan.call_count == 3 + + # plan3 is the last call + plan3_steps = mock_run_plan.call_args_list[2][0][0] + step_types = [type(s) for s in plan3_steps] + assert telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep in step_types + assert ReapplyStorageBackendTerraformPlanStep in step_types + + # Verify the update step has enable=True + update_steps = [ + s + for s in plan3_steps + if isinstance(s, telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep) + ] + assert len(update_steps) == 1 + assert update_steps[0].enable is True + + @patch("sunbeam.features.telemetry.feature.register_storage_terraform_plan") @patch("sunbeam.features.telemetry.feature.JujuHelper") - @patch("sunbeam.features.telemetry.feature.StorageBackendManager") - @patch("sunbeam.features.telemetry.feature.DeploySpecificCinderVolumeStep") @patch("sunbeam.features.telemetry.feature.run_plan") - def test_run_disable_plans_passes_extra_tfvars( + def test_run_disable_plans_includes_update_and_reapply_steps( self, mock_run_plan, - mock_deploy_step_class, - mock_storage_manager_class, mock_jhelper_class, + mock_register_storage, deployment, - mock_storage_backends, - mock_backend_instances, ): - """Test that disable plans passes correct extra_tfvars to steps.""" - # Setup mocks - client = deployment.get_client.return_value - storage_backends_root = Mock() - storage_backends_root.root = mock_storage_backends - client.cluster.get_storage_backends.return_value = storage_backends_root - - # Mock StorageBackendManager - mock_storage_manager = mock_storage_manager_class.return_value - mock_storage_manager.backends.return_value = mock_backend_instances - - # Mock tfhelpers + """Disable plan2 should include update and reapply steps. + + Checks for UpdateCinderVolumeTelemetryTfvarsStep and + ReapplyStorageBackendTerraformPlanStep. + """ + from sunbeam.storage.steps import ReapplyStorageBackendTerraformPlanStep + tfhelper = Mock() tfhelper.state_list.return_value = [] tfhelper_openstack = Mock() tfhelper_hypervisor = Mock() - tfhelper_cinder_volume = Mock() tfhelper_storage = Mock() deployment.get_tfhelper.side_effect = lambda plan: { "telemetry-plan": tfhelper, "openstack-plan": tfhelper_openstack, "hypervisor-plan": tfhelper_hypervisor, - "cinder-volume-plan": tfhelper_cinder_volume, - "storage-plan": tfhelper_storage, + "storage-backend-plan": tfhelper_storage, }[plan] - # Create feature and run disable plans feature = telemetry_feature.TelemetryFeature() feature._manifest = Mock() feature.run_disable_plans(deployment, False) - # Verify all DeploySpecificCinderVolumeStep calls have correct extra_tfvars - for call in mock_deploy_step_class.call_args_list: - extra_tfvars = call[1]["extra_tfvars"] - assert extra_tfvars == {"enable-telemetry-notifications": False} + # run_plan is called: plan (disable main), plan2 (storage update) + assert mock_run_plan.call_count == 2 + + # plan2 is the last call + plan2_steps = mock_run_plan.call_args_list[1][0][0] + step_types = [type(s) for s in plan2_steps] + assert telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep in step_types + assert ReapplyStorageBackendTerraformPlanStep in step_types + + # Verify the update step has enable=False + update_steps = [ + s + for s in plan2_steps + if isinstance(s, telemetry_feature.UpdateCinderVolumeTelemetryTfvarsStep) + ] + assert len(update_steps) == 1 + assert update_steps[0].enable is False diff --git a/sunbeam-python/tests/unit/sunbeam/provider/local/test_commands.py b/sunbeam-python/tests/unit/sunbeam/provider/local/test_commands.py new file mode 100644 index 000000000..066239682 --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/provider/local/test_commands.py @@ -0,0 +1,160 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +from unittest.mock import Mock + +import pytest + +from sunbeam.core import ceph as ceph_module +from sunbeam.features.interface.v1.base import EnableDisableFeature +from sunbeam.provider.local import commands as local_commands + + +@pytest.mark.parametrize( + ("enabled", "expected"), + [ + (True, {"enabled": "true"}), + (False, {"enabled": "false"}), + ], +) +def test_set_ceph_feature_enabled_state_updates_feature_info( + enabled: bool, expected: dict +): + deployment = Mock() + client = Mock() + ceph_feature = Mock(spec=EnableDisableFeature) + deployment.get_feature_manager.return_value.resolve_feature.return_value = ( + ceph_feature + ) + + ceph_module.set_ceph_feature_enabled_state(deployment, client, enabled=enabled) + + ceph_feature.update_feature_info.assert_called_once_with(client, expected) + + +def test_is_internal_ceph_enabled_feature_aware_uses_feature_state(mocker): + deployment = Mock() + deployment.get_feature_manager.return_value.is_feature_enabled.return_value = False + mocker.patch.object(ceph_module, "is_internal_ceph_enabled", return_value=True) + + result = ceph_module.is_internal_ceph_enabled_feature_aware(deployment, Mock()) + + assert result is True + + +def test_is_internal_ceph_enabled_feature_aware_returns_false_while_disabling( + mocker, +): + """Disabling marker must short-circuit feature-aware check to False. + + The ceph_disabling marker must short-circuit to False so callers + don't observe the transient window where mode=NONE has been written + but feature_enabled is still True. + """ + deployment = Mock() + feature = Mock(spec=EnableDisableFeature) + feature.get_feature_info.return_value = {"ceph_disabling": "true"} + deployment.get_feature_manager.return_value.resolve_feature.return_value = feature + deployment.get_feature_manager.return_value.is_feature_enabled.return_value = True + mocker.patch.object(ceph_module, "is_internal_ceph_enabled", return_value=True) + + result = ceph_module.is_internal_ceph_enabled_feature_aware(deployment, Mock()) + + assert result is False + + +def test_call_enabled_feature_join_hooks_passes_node_context(): + deployment = Mock() + node_info = {"name": "node-1", "role": ["compute"]} + + local_commands._call_enabled_feature_join_hooks( + deployment, node_info, "node-1", ["compute"], accept_defaults=True + ) + + deployment.get_feature_manager.return_value.call_enabled_features_on_join.assert_called_once_with( + deployment, + node_info, + node_name="node-1", + roles=["compute"], + status="joined", + accept_defaults=True, + ) + + +def test_call_enabled_feature_depart_hooks_passes_node_context(): + deployment = Mock() + node_info = {"name": "node-1", "role": ["storage"]} + + local_commands._call_enabled_feature_depart_hooks( + deployment, node_info, "node-1", ["storage"], force=True + ) + + deployment.get_feature_manager.return_value.call_enabled_features_on_depart.assert_called_once_with( + deployment, + node_info, + node_name="node-1", + roles=["storage"], + status="departed", + force=True, + ) + + +def test_get_default_ceph_bootstrap_steps_delegates_to_feature(): + """The core helper resolves the ceph feature and forwards the flags.""" + deployment = Mock() + feature = Mock() + feature.get_bootstrap_deploy_steps = Mock(return_value=["STEP"]) + deployment.get_feature_manager.return_value.resolve_feature.return_value = feature + + result = ceph_module.get_default_ceph_bootstrap_steps( + deployment, + enabled=True, + expect_storage_node=True, + node_name="node-1", + accept_defaults=True, + ) + + assert result == ["STEP"] + feature.get_bootstrap_deploy_steps.assert_called_once_with( + deployment, + enabled=True, + expect_storage_node=True, + node_name="node-1", + accept_defaults=True, + ) + + +def test_get_default_ceph_bootstrap_steps_returns_empty_when_feature_missing(): + """A missing or incompatible ceph feature must not crash callers.""" + deployment = Mock() + deployment.get_feature_manager.return_value.resolve_feature.return_value = None + + result = ceph_module.get_default_ceph_bootstrap_steps( + deployment, + enabled=True, + expect_storage_node=True, + ) + + assert result == [] + + +def test_ensure_default_ceph_feature_calls_feature(): + deployment = Mock() + ceph_feature = Mock() + deployment.get_feature_manager.return_value.resolve_feature.return_value = ( + ceph_feature + ) + + ceph_module.ensure_default_ceph_feature( + deployment, + show_hints=False, + node_name="node-1", + accept_defaults=True, + ) + + ceph_feature.enable_default_storage.assert_called_once_with( + deployment, + False, + node_name="node-1", + accept_defaults=True, + ) diff --git a/sunbeam-python/tests/unit/sunbeam/provider/maas/test_commands.py b/sunbeam-python/tests/unit/sunbeam/provider/maas/test_commands.py new file mode 100644 index 000000000..0cd6bd56e --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/provider/maas/test_commands.py @@ -0,0 +1,83 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +from unittest.mock import Mock + +from sunbeam.core import ceph as ceph_module +from sunbeam.provider.maas import commands as maas_commands + + +def test_is_internal_ceph_enabled_feature_aware_uses_feature_state(mocker): + deployment = Mock() + deployment.get_feature_manager.return_value.is_feature_enabled.return_value = False + mocker.patch.object(ceph_module, "is_internal_ceph_enabled", return_value=True) + + result = ceph_module.is_internal_ceph_enabled_feature_aware(deployment, Mock()) + + assert result is True + + +def test_call_enabled_feature_join_hooks_passes_node_context(): + deployment = Mock() + feature_manager = Mock() + deployment.get_feature_manager.return_value = feature_manager + + client = Mock() + client.cluster.get_node_info.side_effect = lambda name: { + "name": name, + "role": [f"role-{name}"], + } + + maas_commands._call_enabled_feature_join_hooks( + deployment, client, ["node-2", "node-1", "node-1"] + ) + + assert feature_manager.call_enabled_features_on_join.call_count == 2 + _, first_kwargs = feature_manager.call_enabled_features_on_join.call_args_list[0] + _, second_kwargs = feature_manager.call_enabled_features_on_join.call_args_list[1] + assert first_kwargs["node_name"] == "node-1" + assert first_kwargs["roles"] == ["role-node-1"] + assert first_kwargs["status"] == "joined" + assert second_kwargs["node_name"] == "node-2" + assert second_kwargs["roles"] == ["role-node-2"] + assert second_kwargs["status"] == "joined" + + +def test_call_enabled_feature_depart_hooks_passes_node_context(): + deployment = Mock() + node_info = {"name": "node-1", "role": ["storage"]} + + maas_commands._call_enabled_feature_depart_hooks( + deployment, node_info, "node-1", force=True + ) + + deployment.get_feature_manager.return_value.call_enabled_features_on_depart.assert_called_once_with( + deployment, + node_info, + node_name="node-1", + roles=["storage"], + status="departed", + force=True, + ) + + +def test_ensure_default_ceph_feature_calls_feature(): + deployment = Mock() + ceph_feature = Mock() + deployment.get_feature_manager.return_value.resolve_feature.return_value = ( + ceph_feature + ) + + ceph_module.ensure_default_ceph_feature( + deployment, + show_hints=False, + maas_client=Mock(), + storage=["node-1"], + ) + + ceph_feature.enable_default_storage.assert_called_once_with( + deployment, + False, + maas_client=ceph_feature.enable_default_storage.call_args.kwargs["maas_client"], + storage=["node-1"], + ) diff --git a/sunbeam-python/tests/unit/sunbeam/steps/test_cinder_volume.py b/sunbeam-python/tests/unit/sunbeam/steps/test_cinder_volume.py deleted file mode 100644 index 930c10ad9..000000000 --- a/sunbeam-python/tests/unit/sunbeam/steps/test_cinder_volume.py +++ /dev/null @@ -1,363 +0,0 @@ -# SPDX-FileCopyrightText: 2025 - Canonical Ltd -# SPDX-License-Identifier: Apache-2.0 - -from unittest.mock import MagicMock, Mock, patch - -import pytest - -from sunbeam.steps.cinder_volume import ( - CINDER_VOLUME_APP_TIMEOUT, - CINDER_VOLUME_UNIT_TIMEOUT, - DeployCinderVolumeApplicationStep, - RemoveCinderVolumeUnitsStep, -) - - -# Common fixtures -# Additional fixtures specific to cinder volume tests -@pytest.fixture -def os_tfhelper(): - """OpenStack tfhelper mock.""" - return MagicMock() - - -@pytest.fixture -def mceph_tfhelper(): - """MicroCeph tfhelper mock.""" - return MagicMock() - - -@pytest.fixture -def deployment_with_tfhelpers(basic_deployment, os_tfhelper, mceph_tfhelper): - """Deployment mock with configured tfhelpers.""" - basic_deployment.get_tfhelper.side_effect = lambda plan: { - "microceph-plan": mceph_tfhelper, - "openstack-plan": os_tfhelper, - }[plan] - return basic_deployment - - -class TestDeployCinderVolumeApplicationStep: - @pytest.fixture - def deploy_cinder_volume_step( - self, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ): - """Create DeployCinderVolumeApplicationStep instance for testing.""" - return DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ) - - def test_get_unit_timeout(self, deploy_cinder_volume_step): - assert ( - deploy_cinder_volume_step.get_application_timeout() - == CINDER_VOLUME_APP_TIMEOUT - ) - - @patch( - "sunbeam.steps.cinder_volume.get_mandatory_control_plane_offers", - return_value={"keystone-offer-url": "url"}, - ) - def test_get_offers( - self, mandatory_control_plane_offers, deploy_cinder_volume_step - ): - assert deploy_cinder_volume_step._offers == {} - deploy_cinder_volume_step._get_offers() - mandatory_control_plane_offers.assert_called_once() - assert ( - deploy_cinder_volume_step._offers - == mandatory_control_plane_offers.return_value - ) - mandatory_control_plane_offers.reset_mock() - deploy_cinder_volume_step._get_offers() - # Should not call again - mandatory_control_plane_offers.assert_not_called() - - def test_get_accepted_application_status(self, deploy_cinder_volume_step): - deploy_cinder_volume_step._get_offers = Mock( - return_value={"keystone-offer-url": None} - ) - - accepted_status = deploy_cinder_volume_step.get_accepted_application_status() - assert "blocked" in accepted_status - - def test_get_accepted_application_status_with_offers( - self, deploy_cinder_volume_step - ): - deploy_cinder_volume_step._get_offers = Mock( - return_value={"keystone-offer-url": "url"} - ) - - accepted_status = deploy_cinder_volume_step.get_accepted_application_status() - assert "blocked" not in accepted_status - - @patch("sunbeam.steps.cinder_volume.microceph.ceph_replica_scale", return_value=3) - def test_extra_tfvars( - self, - mock_ceph_replica_scale, - deploy_cinder_volume_step, - basic_client, - mceph_tfhelper, - ): - basic_client.cluster.list_nodes_by_role.return_value = ["node1"] - mceph_tfhelper.output.return_value = {"ceph-application-name": "ceph-app"} - tfvars = deploy_cinder_volume_step.extra_tfvars() - assert tfvars["ceph-application-name"] == "ceph-app" - assert ( - tfvars["charm_cinder_volume_ceph_config"]["ceph-osd-replication-count"] == 3 - ) - - def test_extra_tfvars_after_openstack_model( - self, - deploy_cinder_volume_step, - basic_client, - os_tfhelper, - mceph_tfhelper, - basic_manifest, - ): - basic_client.cluster.list_nodes_by_role.return_value = ["node1"] - os_tfhelper.output.return_value = { - "keystone-offer-url": "keystone-offer", - "cinder-volume-database-offer-url": "database-offer", - "rabbitmq-offer-url": "amqp-offer", - "cert-distributor-offer-url": "cert-distributor-offer", - } - mceph_tfhelper.output.return_value = {"ceph-application-name": "ceph-app"} - basic_manifest.get_model.return_value = "openstack" - tfvars = deploy_cinder_volume_step.extra_tfvars() - assert tfvars["ceph-application-name"] == "ceph-app" - assert tfvars["database-offer-url"] == "database-offer" - assert tfvars["amqp-offer-url"] == "amqp-offer" - assert tfvars["cert-distributor-offer-url"] == "cert-distributor-offer" - assert ( - tfvars["charm_cinder_volume_ceph_config"]["ceph-osd-replication-count"] == 1 - ) - assert any( - binding.get("endpoint") == "receive-ca-cert" - for binding in tfvars["endpoint_bindings"] - ) - - @patch( - "sunbeam.steps.cinder_volume.get_mandatory_control_plane_offers", - return_value={"keystone-offer-url": "url"}, - ) - def test_extra_tfvars_no_storage_nodes( - self, - get_mandatory_control_plane_offers, - deploy_cinder_volume_step, - basic_client, - mceph_tfhelper, - ): - basic_client.cluster.list_nodes_by_role.return_value = [] - tfvars = deploy_cinder_volume_step.extra_tfvars() - mceph_tfhelper.output.assert_not_called() - get_mandatory_control_plane_offers.assert_not_called() - assert "ceph-application-name" not in tfvars - assert "keystone-offer-url" not in tfvars - assert "cert-distributor-offer-url" not in tfvars - assert any( - binding.get("endpoint") == "receive-ca-cert" - for binding in tfvars["endpoint_bindings"] - ) - - def test_init_with_extra_tfvars( - self, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ): - """Test that extra_tfvars parameter is stored as override_tfvars.""" - extra_tfvars = {"enable-telemetry-notifications": True, "custom-key": "value"} - step = DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - extra_tfvars=extra_tfvars, - ) - assert step.override_tfvars == extra_tfvars - - def test_init_without_extra_tfvars( - self, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ): - """Test that override_tfvars defaults to empty dict. - - When extra_tfvars is not provided. - """ - step = DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ) - assert step.override_tfvars == {} - - @patch("sunbeam.steps.cinder_volume.microceph.ceph_replica_scale", return_value=3) - def test_extra_tfvars_override_precedence( - self, - mock_ceph_replica_scale, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - mceph_tfhelper, - ): - """Test that override_tfvars values take precedence over computed tfvars.""" - basic_client.cluster.list_nodes_by_role.return_value = ["node1"] - mceph_tfhelper.output.return_value = {"ceph-application-name": "ceph-app"} - - # Create step with override_tfvars - override_tfvars = { - "enable-telemetry-notifications": True, - "ceph-application-name": "override-ceph-app", - } - step = DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - extra_tfvars=override_tfvars, - ) - - # Mock the feature manager to return disabled telemetry - feature_manager = Mock() - feature_manager.is_feature_enabled.return_value = False - deployment_with_tfhelpers.get_feature_manager.return_value = feature_manager - - tfvars = step.extra_tfvars() - - # Verify override_tfvars values take precedence - assert tfvars["enable-telemetry-notifications"] is True # overridden - assert tfvars["ceph-application-name"] == "override-ceph-app" # overridden - - @patch("sunbeam.steps.cinder_volume.microceph.ceph_replica_scale", return_value=3) - def test_extra_tfvars_telemetry_feature_enabled( - self, - mock_ceph_replica_scale, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - mceph_tfhelper, - ): - """Test telemetry notifications are enabled. - - When telemetry feature is enabled. - """ - basic_client.cluster.list_nodes_by_role.return_value = [] - - step = DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ) - - # Mock the feature manager to return enabled telemetry - feature_manager = Mock() - feature_manager.is_feature_enabled.return_value = True - deployment_with_tfhelpers.get_feature_manager.return_value = feature_manager - - tfvars = step.extra_tfvars() - - # Verify telemetry notifications are enabled - assert tfvars["enable-telemetry-notifications"] is True - feature_manager.is_feature_enabled.assert_called_once_with( - deployment_with_tfhelpers, "telemetry" - ) - - @patch("sunbeam.steps.cinder_volume.microceph.ceph_replica_scale", return_value=3) - def test_extra_tfvars_telemetry_feature_disabled( - self, - mock_ceph_replica_scale, - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - mceph_tfhelper, - ): - """Test telemetry notifications are disabled. - - When telemetry feature is disabled. - """ - basic_client.cluster.list_nodes_by_role.return_value = [] - - step = DeployCinderVolumeApplicationStep( - deployment_with_tfhelpers, - basic_client, - tfhelper, - basic_jhelper, - basic_manifest, - test_model, - ) - - # Mock the feature manager to return disabled telemetry - feature_manager = Mock() - feature_manager.is_feature_enabled.return_value = False - deployment_with_tfhelpers.get_feature_manager.return_value = feature_manager - - tfvars = step.extra_tfvars() - - # Verify telemetry notifications are disabled - assert tfvars["enable-telemetry-notifications"] is False - feature_manager.is_feature_enabled.assert_called_once_with( - deployment_with_tfhelpers, "telemetry" - ) - - -class TestRemoveCinderVolumeUnitsStep: - @pytest.fixture - def test_names(self): - """Test node names.""" - return ["node1"] - - @pytest.fixture - def remove_cinder_volume_units_step( - self, basic_client, test_names, basic_jhelper, test_model - ): - """Create RemoveCinderVolumeUnitsStep instance for testing.""" - return RemoveCinderVolumeUnitsStep( - basic_client, - test_names, - basic_jhelper, - test_model, - ) - - def test_get_unit_timeout(self, remove_cinder_volume_units_step): - assert ( - remove_cinder_volume_units_step.get_unit_timeout() - == CINDER_VOLUME_UNIT_TIMEOUT - ) diff --git a/sunbeam-python/tests/unit/sunbeam/steps/test_hypervisor.py b/sunbeam-python/tests/unit/sunbeam/steps/test_hypervisor.py index ae06ce217..98f30ad50 100644 --- a/sunbeam-python/tests/unit/sunbeam/steps/test_hypervisor.py +++ b/sunbeam-python/tests/unit/sunbeam/steps/test_hypervisor.py @@ -11,10 +11,12 @@ from sunbeam.core.juju import ApplicationNotFoundException from sunbeam.core.terraform import TerraformException from sunbeam.steps.hypervisor import ( + DeployHypervisorApplicationStep, ReapplyHypervisorOptionalIntegrationsStep, ReapplyHypervisorTerraformPlanStep, RemoveHypervisorUnitStep, ) +from sunbeam.storage.base import HypervisorIntegration # Common fixtures @@ -29,6 +31,128 @@ def read_config_patch(): yield mock +class TestDeployHypervisorApplicationStep: + @pytest.fixture + def ovn_manager(self): + """Mock OVN manager.""" + mgr = Mock() + mgr.get_provider.return_value = Mock() + return mgr + + @pytest.fixture + def deploy_hypervisor_step( + self, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + read_config_patch, + ovn_manager, + ): + """Create DeployHypervisorApplicationStep instance for testing.""" + openstack_tfhelper = Mock() + openstack_tfhelper.output.return_value = { + "rabbitmq-offer-url": "rabbitmq-url", + "keystone-offer-url": "keystone-url", + "cert-distributor-offer-url": "cert-distributor-url", + "ca-offer-url": "ca-url", + "nova-offer-url": "nova-url", + } + basic_deployment.get_ovn_manager.return_value = ovn_manager + basic_deployment.get_space.return_value = "test-space" + step = DeployHypervisorApplicationStep( + basic_deployment, + basic_client, + basic_tfhelper, + openstack_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ) + return step + + @patch("sunbeam.steps.hypervisor.StorageBackendManager") + def test_extra_tfvars_no_backends( + self, + mock_manager_class, + deploy_hypervisor_step, + ): + """extra_tfvars should have empty extra_integrations when no backends.""" + mock_manager = Mock() + mock_manager.collect_hypervisor_integrations.return_value = set() + mock_manager_class.return_value = mock_manager + + tfvars = deploy_hypervisor_step.extra_tfvars() + + assert "extra_integrations" in tfvars + assert tfvars["extra_integrations"] == [] + mock_manager.collect_hypervisor_integrations.assert_called_once_with( + deploy_hypervisor_step.deployment, + deploy_hypervisor_step.client, + ) + + @patch("sunbeam.steps.hypervisor.StorageBackendManager") + def test_extra_tfvars_with_integrations( + self, + mock_manager_class, + deploy_hypervisor_step, + ): + """extra_tfvars should include integrations from storage framework.""" + mock_manager = Mock() + mock_manager.collect_hypervisor_integrations.return_value = { + HypervisorIntegration( + application_name="cinder-volume-ceph", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph-access", + ), + } + mock_manager_class.return_value = mock_manager + + tfvars = deploy_hypervisor_step.extra_tfvars() + + assert "extra_integrations" in tfvars + assert len(tfvars["extra_integrations"]) == 1 + integration = tfvars["extra_integrations"][0] + assert integration["application_name"] == "cinder-volume-ceph" + assert integration["endpoint_name"] == "ceph-access" + assert integration["hypervisor_endpoint_name"] == "ceph-access" + + @patch("sunbeam.steps.hypervisor.StorageBackendManager") + def test_extra_tfvars_includes_offer_urls( + self, + mock_manager_class, + deploy_hypervisor_step, + ): + """extra_tfvars should still include Juju offer URLs.""" + mock_manager = Mock() + mock_manager.collect_hypervisor_integrations.return_value = set() + mock_manager_class.return_value = mock_manager + + tfvars = deploy_hypervisor_step.extra_tfvars() + + assert tfvars["rabbitmq-offer-url"] == "rabbitmq-url" + assert tfvars["keystone-offer-url"] == "keystone-url" + assert tfvars["cert-distributor-offer-url"] == "cert-distributor-url" + assert tfvars["ca-offer-url"] == "ca-url" + assert tfvars["nova-offer-url"] == "nova-url" + + @patch("sunbeam.steps.hypervisor.StorageBackendManager") + def test_extra_tfvars_includes_integrations( + self, + mock_manager_class, + deploy_hypervisor_step, + ): + """extra_tfvars should include extra_integrations from storage backends.""" + mock_manager = Mock() + mock_manager.collect_hypervisor_integrations.return_value = set() + mock_manager_class.return_value = mock_manager + + tfvars = deploy_hypervisor_step.extra_tfvars() + assert "extra_integrations" in tfvars + + class TestRemoveHypervisorUnitStep: @pytest.fixture def remove_hypervisor_step( @@ -466,6 +590,61 @@ def test_run_after_configure_step( assert override_tfvars_from_mock_call == expected_override_tfvars assert result.result_type == ResultType.COMPLETED + @patch("sunbeam.steps.hypervisor.StorageBackendManager") + def test_run_refreshes_storage_hypervisor_integrations( + self, + mock_manager_class, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + basic_deployment, + test_model, + read_config_patch, + get_network_config_patch, + get_pci_whitelist_config_patch, + get_dpdk_config_patch, + step_context, + ): + """Reapply should keep backend-owned hypervisor integrations in tfvars.""" + mock_manager = Mock() + mock_manager.collect_hypervisor_integrations.return_value = { + HypervisorIntegration( + application_name="cinder-volume-ceph", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph-access", + ), + } + mock_manager_class.return_value = mock_manager + basic_jhelper.get_model_uuid.return_value = "test-uuid" + basic_client.cluster.list_nodes_by_role.return_value = [] + step = ReapplyHypervisorTerraformPlanStep( + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + deployment=basic_deployment, + ) + + result = step.run(step_context) + + override_tfvars = basic_tfhelper.update_tfvars_and_apply_tf.call_args.kwargs[ + "override_tfvars" + ] + assert override_tfvars["extra_integrations"] == [ + { + "application_name": "cinder-volume-ceph", + "endpoint_name": "ceph-access", + "hypervisor_endpoint_name": "ceph-access", + } + ] + mock_manager.collect_hypervisor_integrations.assert_called_once_with( + basic_deployment, + basic_client, + ) + assert result.result_type == ResultType.COMPLETED + def test_run_tf_apply_failed( self, reapply_hypervisor_step, basic_tfhelper, step_context ): diff --git a/sunbeam-python/tests/unit/sunbeam/steps/test_maintenance.py b/sunbeam-python/tests/unit/sunbeam/steps/test_maintenance.py index e80332bd0..78485947d 100644 --- a/sunbeam-python/tests/unit/sunbeam/steps/test_maintenance.py +++ b/sunbeam-python/tests/unit/sunbeam/steps/test_maintenance.py @@ -9,6 +9,7 @@ from sunbeam.core.common import ResultType, SunbeamException from sunbeam.core.juju import ActionFailedException, UnitNotFoundException +from sunbeam.features.ceph.microceph import APPLICATION as _MICROCEPH_APPLICATION from sunbeam.steps.maintenance import ( CordonControlRoleNodeStep, CreateWatcherAuditStepABC, @@ -19,7 +20,6 @@ RunWatcherAuditStep, UncordonControlRoleNodeStep, ) -from sunbeam.steps.microceph import APPLICATION as _MICROCEPH_APPLICATION @pytest.fixture(autouse=True) diff --git a/sunbeam-python/tests/unit/sunbeam/steps/test_microceph.py b/sunbeam-python/tests/unit/sunbeam/steps/test_microceph.py index 45710175c..f84678203 100644 --- a/sunbeam-python/tests/unit/sunbeam/steps/test_microceph.py +++ b/sunbeam-python/tests/unit/sunbeam/steps/test_microceph.py @@ -5,7 +5,10 @@ from sunbeam.core.common import ResultType from sunbeam.core.juju import ActionFailedException -from sunbeam.steps.microceph import ConfigureMicrocephOSDStep, SetCephMgrPoolSizeStep +from sunbeam.features.ceph.microceph import ( + ConfigureMicrocephOSDStep, + SetCephMgrPoolSizeStep, +) class TestConfigureMicrocephOSDStep: diff --git a/sunbeam-python/tests/unit/sunbeam/steps/test_openstack.py b/sunbeam-python/tests/unit/sunbeam/steps/test_openstack.py index f4789df7d..04a19ceec 100644 --- a/sunbeam-python/tests/unit/sunbeam/steps/test_openstack.py +++ b/sunbeam-python/tests/unit/sunbeam/steps/test_openstack.py @@ -75,6 +75,10 @@ def deployment_with_client(basic_client): storage_manager = Mock() storage_manager.list_principal_applications.return_value = [] deployment.get_storage_manager.return_value = storage_manager + + from sunbeam.features.ceph.microceph import MicrocephProvider + + deployment.get_ceph_provider.return_value = MicrocephProvider() return deployment @@ -493,6 +497,7 @@ def patch_client(self): """Client mock; returns node-1 for any role so ovn-relay is excluded.""" client = Mock() client.cluster.list_nodes_by_role.return_value = ["node-1"] + client.cluster.get_config.return_value = "{}" return client def _make_service(self, ip_annotation=None, ingress_ip=None): @@ -920,6 +925,10 @@ def openstack_deployment(self): storage_manager = Mock() storage_manager.list_principal_applications.return_value = [] deployment.get_storage_manager.return_value = storage_manager + + from sunbeam.features.ceph.microceph import MicrocephProvider + + deployment.get_ceph_provider.return_value = MicrocephProvider() return deployment def test_run( diff --git a/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_intra_channel.py b/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_intra_channel.py index 2017e4aa5..0806a7eef 100644 --- a/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_intra_channel.py +++ b/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_intra_channel.py @@ -25,6 +25,11 @@ ReapplyInfraModelConfigStep, RefreshSnapStep, ) +from sunbeam.steps.upgrades.storage_migration import ( + BackfillCephFeatureStateStep, + ImportCephResourcesToStorageFrameworkStep, +) +from sunbeam.storage.steps import ReapplyStorageBackendTerraformPlanStep _INTRA_CHANNEL = "sunbeam.steps.upgrades.intra_channel" @@ -825,8 +830,17 @@ def setup_method(self): ovn_manager.get_roles_for_microovn.return_value = [] self.client = Mock() self.client.cluster.list_nodes_by_role.return_value = [] + self.client.cluster.get_config.return_value = "{}" self.jhelper = Mock() self.manifest = Mock() + self.storage_backend_register_patcher = patch( + f"{_INTRA_CHANNEL}.register_storage_terraform_plan" + ) + self.storage_backend_register_patcher.start() + + def teardown_method(self): + """Stop active patchers.""" + self.storage_backend_register_patcher.stop() @patch(f"{_INTRA_CHANNEL}.is_maas_deployment") def test_get_plan_local_excludes_lb_ip_pool_step(self, mock_is_maas): @@ -890,8 +904,40 @@ def test_get_plan_always_includes_core_steps(self, mock_is_maas): assert LatestInChannel in step_types assert EnsureCiliumDeviceByHostStep in step_types assert ReapplyInfraModelConfigStep in step_types + assert ImportCephResourcesToStorageFrameworkStep in step_types + assert BackfillCephFeatureStateStep in step_types assert UpgradeFeatures in step_types + @patch(f"{_INTRA_CHANNEL}.is_maas_deployment") + def test_get_plan_backfills_ceph_state_before_upgrade_features(self, mock_is_maas): + """Ceph feature state backfill must run before feature upgrades.""" + mock_is_maas.return_value = False + + coordinator = LatestInChannelCoordinator( + self.deployment, self.client, self.jhelper, self.manifest + ) + plan = coordinator.get_plan() + + step_types = [type(step) for step in plan] + assert step_types.index(BackfillCephFeatureStateStep) < step_types.index( + UpgradeFeatures + ) + + @patch(f"{_INTRA_CHANNEL}.is_maas_deployment") + def test_get_plan_imports_ceph_resources_before_storage_reapply(self, mock_is_maas): + """Legacy Ceph resources must be imported before storage reapply.""" + mock_is_maas.return_value = False + + coordinator = LatestInChannelCoordinator( + self.deployment, self.client, self.jhelper, self.manifest + ) + plan = coordinator.get_plan() + + step_types = [type(step) for step in plan] + import_index = step_types.index(ImportCephResourcesToStorageFrameworkStep) + reapply_index = step_types.index(ReapplyStorageBackendTerraformPlanStep) + assert import_index < reapply_index + class TestReapplyInfraModelConfigStep: """Tests for ReapplyInfraModelConfigStep.""" @@ -1219,11 +1265,15 @@ def test_get_plan_includes_refresh_snap_step(self, mock_is_maas): deployment.get_ovn_manager = Mock(return_value=ovn_manager) client = Mock() client.cluster.list_nodes_by_role.return_value = [] + client.cluster.get_config.return_value = "{}" jhelper = Mock() manifest = Mock() - coordinator = LatestInChannelCoordinator(deployment, client, jhelper, manifest) - plan = coordinator.get_plan() + with patch(f"{_INTRA_CHANNEL}.register_storage_terraform_plan"): + coordinator = LatestInChannelCoordinator( + deployment, client, jhelper, manifest + ) + plan = coordinator.get_plan() step_types = [type(s) for s in plan] assert RefreshSnapStep in step_types @@ -1241,11 +1291,15 @@ def test_refresh_snap_step_placed_after_charm_refresh(self, mock_is_maas): deployment.get_ovn_manager = Mock(return_value=ovn_manager) client = Mock() client.cluster.list_nodes_by_role.return_value = [] + client.cluster.get_config.return_value = "{}" jhelper = Mock() manifest = Mock() - coordinator = LatestInChannelCoordinator(deployment, client, jhelper, manifest) - plan = coordinator.get_plan() + with patch(f"{_INTRA_CHANNEL}.register_storage_terraform_plan"): + coordinator = LatestInChannelCoordinator( + deployment, client, jhelper, manifest + ) + plan = coordinator.get_plan() step_types = [type(s) for s in plan] assert step_types.index(RefreshSnapStep) > step_types.index(LatestInChannel) diff --git a/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_storage_migration.py b/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_storage_migration.py new file mode 100644 index 000000000..24ed7764d --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/steps/upgrades/test_storage_migration.py @@ -0,0 +1,890 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +import json +from unittest.mock import Mock, patch + +import pytest + +from sunbeam.clusterd.service import ( + ConfigItemNotFoundException, + StorageBackendNotFoundException, +) +from sunbeam.core.ceph import INTERNAL_CEPH_BACKEND_NAME +from sunbeam.core.common import ResultType +from sunbeam.core.terraform import TerraformException +from sunbeam.features.interface.v1.base import EnableDisableFeature +from sunbeam.steps.upgrades.storage_migration import ( + STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY, + BackfillCephFeatureStateStep, + ImportCephResourcesToStorageFrameworkStep, + MigrateCinderVolumeToStorageFrameworkStep, +) +from sunbeam.storage.steps import STORAGE_BACKEND_TFVAR_CONFIG_KEY + +_MODULE = "sunbeam.steps.upgrades.storage_migration" + + +@pytest.fixture +def deployment(): + dep = Mock() + dep.get_space.return_value = "mgmt" + dep.openstack_machines_model = "machines" + feature_manager = Mock() + feature_manager.is_feature_enabled.return_value = False + dep.get_feature_manager.return_value = feature_manager + openstack_tfhelper = Mock() + openstack_tfhelper.output.return_value = { + "keystone-offer-url": "admin/openstack.keystone", + "cinder-volume-database-offer-url": "admin/openstack.cinder-db", + "rabbitmq-offer-url": "admin/openstack.rabbitmq", + "cert-distributor-offer-url": None, + } + dep.get_tfhelper.return_value = openstack_tfhelper + return dep + + +@pytest.fixture +def client(): + c = Mock() + c.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + {"machineid": "2"}, + ] + + def _get_config(key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + raise ConfigItemNotFoundException("not found") + if key == "StorageBackendsEnabled": + return "[]" + raise ConfigItemNotFoundException("not found") + + c.cluster.get_config.side_effect = _get_config + return c + + +@pytest.fixture +def old_tfhelper(): + helper = Mock() + helper.pull_state.return_value = {"resources": []} + return helper + + +@pytest.fixture +def jhelper(): + h = Mock() + h.get_model.return_value = { + "model-uuid": "test-model-uuid", + "name": "admin/openstack", + } + return h + + +@pytest.fixture +def manifest(): + m = Mock() + charm = Mock() + charm.channel = "2024.1/stable" + charm.revision = None + charm.config = {} + m.core.software.charms = {"cinder-volume": charm} + m.storage.root = {} + return m + + +@pytest.fixture +def step(deployment, client, old_tfhelper, jhelper, manifest): + return MigrateCinderVolumeToStorageFrameworkStep( + deployment=deployment, + client=client, + old_tfhelper=old_tfhelper, + jhelper=jhelper, + manifest=manifest, + model="machines", + ) + + +class TestMigrateCinderVolumeIsSkip: + """Tests for is_skip logic.""" + + def test_skip_when_old_plan_has_no_resources(self, step, old_tfhelper): + """Migration is skipped when old plan has no resources.""" + old_tfhelper.state_list.return_value = [] + + result = step.is_skip(Mock()) + + assert result.result_type == ResultType.SKIPPED + + def test_skip_when_old_plan_state_list_fails(self, step, old_tfhelper): + """Migration is skipped when listing old plan state fails.""" + old_tfhelper.state_list.side_effect = TerraformException("state list failed") + + result = step.is_skip(Mock()) + + assert result.result_type == ResultType.SKIPPED + + def test_not_skipped_when_storage_framework_already_configured( + self, step, old_tfhelper, client + ): + """Migration runs even when storage framework has backends (merges).""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + + def _get_config(key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return '{"backends": {"internal-ceph": {}}, "cinder-volumes": {}}' + if key == "StorageBackendsEnabled": + return "[]" + raise ConfigItemNotFoundException("not found") + + client.cluster.get_config.side_effect = _get_config + + result = step.is_skip(Mock()) + + assert result.result_type == ResultType.COMPLETED + + def test_not_skipped_when_migration_needed(self, step, old_tfhelper, client): + """Migration proceeds when old plan has resources and new plan is empty.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + "juju_application.cinder-volume-ceph", + ] + client.cluster.get_config.side_effect = ConfigItemNotFoundException("not found") + + result = step.is_skip(Mock()) + + assert result.result_type == ResultType.COMPLETED + + def test_not_skipped_when_storage_config_empty(self, step, old_tfhelper, client): + """Migration proceeds when storage config exists but has no entries.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + + def _get_config(key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return '{"backends": {}, "cinder-volumes": {}}' + if key == "StorageBackendsEnabled": + return "[]" + raise ConfigItemNotFoundException("not found") + + client.cluster.get_config.side_effect = _get_config + + result = step.is_skip(Mock()) + + assert result.result_type == ResultType.COMPLETED + + +class TestMigrateCinderVolumeClearOldState: + """Tests for _clear_old_state.""" + + def test_removes_all_non_data_resources(self, step, old_tfhelper): + """All non-data resources are removed from old state.""" + old_tfhelper.state_list.return_value = [ + "data.juju_model.machine_model", + "juju_application.cinder-volume", + "juju_application.cinder-volume-ceph", + "juju_offer.storage-backend-offer", + "juju_integration.cinder-volume-identity[0]", + "juju_integration.cinder-volume-amqp[0]", + "juju_integration.cinder-volume-database[0]", + "juju_integration.cinder-volume-ceph-to-cinder-volume", + "juju_integration.cinder-volume-ceph-to-ceph[0]", + ] + + step._clear_old_state() + + # data source should NOT be removed + expected_removals = [ + "juju_application.cinder-volume", + "juju_application.cinder-volume-ceph", + "juju_offer.storage-backend-offer", + "juju_integration.cinder-volume-identity[0]", + "juju_integration.cinder-volume-amqp[0]", + "juju_integration.cinder-volume-database[0]", + "juju_integration.cinder-volume-ceph-to-cinder-volume", + "juju_integration.cinder-volume-ceph-to-ceph[0]", + ] + assert old_tfhelper.state_rm.call_count == len(expected_removals) + for resource in expected_removals: + old_tfhelper.state_rm.assert_any_call(resource) + + def test_data_sources_are_skipped(self, step, old_tfhelper): + """Data sources are not removed from state.""" + old_tfhelper.state_list.return_value = [ + "data.juju_model.machine_model", + ] + + step._clear_old_state() + + old_tfhelper.state_rm.assert_not_called() + + +class TestMigrateCinderVolumeRegisterBackend: + """Tests for _register_internal_ceph_backend.""" + + @patch(f"{_MODULE}.ceph_replica_scale", return_value=3) + @patch(f"{_MODULE}.update_config") + def test_registers_new_backend(self, mock_update_config, mock_scale, step, client): + """Backend is registered via add_storage_backend when not present.""" + client.cluster.get_storage_backend.side_effect = ( + StorageBackendNotFoundException("not found") + ) + + step._register_internal_ceph_backend("test-model-uuid") + + client.cluster.add_storage_backend.assert_called_once() + call_kwargs = client.cluster.add_storage_backend.call_args[1] + assert call_kwargs["name"] == INTERNAL_CEPH_BACKEND_NAME + assert call_kwargs["backend_type"] == "internal-ceph" + assert call_kwargs["model_uuid"] == "test-model-uuid" + + @patch(f"{_MODULE}.ceph_replica_scale", return_value=3) + @patch(f"{_MODULE}.update_config") + def test_updates_existing_backend( + self, mock_update_config, mock_scale, step, client + ): + """Backend is updated via update_storage_backend when already present.""" + client.cluster.get_storage_backend.return_value = Mock() + + step._register_internal_ceph_backend("test-model-uuid") + + client.cluster.update_storage_backend.assert_called_once() + client.cluster.add_storage_backend.assert_not_called() + + +class TestBackfillCephFeatureStateStep: + """Tests for Ceph feature state backfill during refresh.""" + + @patch(f"{_MODULE}.is_internal_ceph_enabled", return_value=False) + def test_skips_when_internal_ceph_not_managed( + self, _mock_is_internal_ceph_enabled, deployment, client + ): + step = BackfillCephFeatureStateStep(deployment, client) + + result = step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + deployment.get_feature_manager.return_value.resolve_feature.assert_not_called() + + @patch(f"{_MODULE}.write_ceph_config") + @patch(f"{_MODULE}.load_ceph_config") + @patch(f"{_MODULE}.is_internal_ceph_enabled", return_value=True) + def test_backfills_ceph_feature_state( + self, + _mock_is_internal_ceph_enabled, + mock_load_ceph_config, + mock_write_ceph_config, + deployment, + client, + ): + from sunbeam.core.ceph import CephConfig + + mock_load_ceph_config.return_value = CephConfig(mode=None) + feature = Mock(spec=EnableDisableFeature) + deployment.get_feature_manager.return_value.resolve_feature.return_value = ( + feature + ) + + step = BackfillCephFeatureStateStep(deployment, client) + result = step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + feature.update_feature_info.assert_called_once_with( + client, + { + "enabled": "true", + "default_storage_reconciled": "true", + }, + ) + + @patch(f"{_MODULE}.write_ceph_config") + @patch(f"{_MODULE}.load_ceph_config") + @patch(f"{_MODULE}.is_internal_ceph_enabled", return_value=True) + def test_writes_ceph_mode_when_unset( + self, + _mock_is_internal_ceph_enabled, + mock_load_ceph_config, + mock_write_ceph_config, + deployment, + client, + ): + """An upgraded cluster with CephConfig.mode=None should get mode=MICROCEPH.""" + from sunbeam.core.ceph import CephConfig, CephDeploymentMode + + mock_load_ceph_config.return_value = CephConfig(mode=None) + + step = BackfillCephFeatureStateStep(deployment, client) + step.run(Mock()) + + mock_write_ceph_config.assert_called_once() + written_client, written_config = mock_write_ceph_config.call_args.args + assert written_client is client + assert written_config.mode == CephDeploymentMode.MICROCEPH + + @patch(f"{_MODULE}.write_ceph_config") + @patch(f"{_MODULE}.load_ceph_config") + @patch(f"{_MODULE}.is_internal_ceph_enabled", return_value=True) + def test_leaves_ceph_mode_when_already_set( + self, + _mock_is_internal_ceph_enabled, + mock_load_ceph_config, + mock_write_ceph_config, + deployment, + client, + ): + """Already-set CephConfig.mode must not be rewritten.""" + from sunbeam.core.ceph import CephConfig, CephDeploymentMode + + mock_load_ceph_config.return_value = CephConfig( + mode=CephDeploymentMode.MICROCEPH + ) + + step = BackfillCephFeatureStateStep(deployment, client) + step.run(Mock()) + + mock_write_ceph_config.assert_not_called() + + +class TestImportCephResourcesToStorageFrameworkStep: + """Tests for importing legacy Ceph resources into the storage plan.""" + + @pytest.fixture + def storage_tfhelper(self): + helper = Mock() + helper.state_list.return_value = [] + return helper + + @pytest.fixture + def import_step(self, deployment, client, storage_tfhelper, jhelper): + return ImportCephResourcesToStorageFrameworkStep( + deployment=deployment, + client=client, + tfhelper=storage_tfhelper, + jhelper=jhelper, + model="machines", + ) + + @patch(f"{_MODULE}.read_config") + def test_builds_imports_for_legacy_internal_ceph( + self, mock_read_config, import_step, client, jhelper + ): + """Legacy cinder-volume resources are imported into new module addresses.""" + tfvars = { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": {"internal-ceph": {}}, + } + mock_read_config.return_value = { + "juju_integration.cinder-volume-identity[0]": "legacy-id-identity", + "juju_integration.cinder-volume-amqp[0]": "legacy-id-amqp", + "juju_integration.cinder-volume-database[0]": "legacy-id-database", + } + + imports = import_step._build_imports( + tfvars, + model_uuid="test-model-uuid", + model_name="admin/openstack", + ) + + assert imports == [ + ( + 'module.cinder-volume["cinder-volume"].juju_application.cinder-volume', + "test-model-uuid:cinder-volume", + ), + ( + 'module.cinder-volume["cinder-volume"].juju_offer.storage-backend-offer', + "admin/openstack.cinder-volume", + ), + ( + 'module.backends["internal-ceph"].juju_application.storage-backend', + "test-model-uuid:cinder-volume-ceph", + ), + ( + 'module.backends["internal-ceph"].juju_integration.storage-backend-to-cinder-volume', + "test-model-uuid:cinder-volume:cinder-volume:cinder-volume-ceph:cinder-volume", + ), + ( + 'module.backends["internal-ceph"].juju_integration.backend-extra-integration["microceph-ceph"]', + "test-model-uuid:microceph:ceph:cinder-volume-ceph:ceph", + ), + ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-identity[0]', + "legacy-id-identity", + ), + ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-amqp[0]', + "legacy-id-amqp", + ), + ( + 'module.cinder-volume["cinder-volume"].juju_integration.cinder-volume-database[0]', + "legacy-id-database", + ), + ] + + @patch(f"{_MODULE}.read_config") + def test_run_imports_missing_resources( + self, + mock_read_config, + import_step, + storage_tfhelper, + jhelper, + ): + """Only missing resources are imported into the storage plan state.""" + + def _read_config(_client, key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": {"internal-ceph": {}}, + } + if key == STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY: + return { + "juju_integration.cinder-volume-identity[0]": "legacy-id-identity" + } + raise ConfigItemNotFoundException("not found") + + mock_read_config.side_effect = _read_config + storage_tfhelper.state_list.return_value = [ + 'module.cinder-volume["cinder-volume"].juju_application.cinder-volume' + ] + + result = import_step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + storage_tfhelper.write_tfvars.assert_called_once() + imported_addresses = [ + call.args[0] for call in storage_tfhelper.import_resource.call_args_list + ] + assert ( + 'module.cinder-volume["cinder-volume"].juju_application.cinder-volume' + not in imported_addresses + ) + assert ( + 'module.backends["internal-ceph"].juju_application.storage-backend' + in imported_addresses + ) + + @patch(f"{_MODULE}.read_config") + def test_run_treats_missing_state_file_as_empty_state( + self, + mock_read_config, + import_step, + storage_tfhelper, + ): + """A clean storage-backend plan should import from an empty initial state.""" + + def _read_config(_client, key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": { + "internal-ceph": { + "application_name": "cinder-volume-ceph", + } + }, + } + raise ConfigItemNotFoundException("not found") + + mock_read_config.side_effect = _read_config + storage_tfhelper.state_list.side_effect = TerraformException( + "No state file was found!" + ) + + result = import_step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + storage_tfhelper.write_tfvars.assert_called_once() + storage_tfhelper.import_resource.assert_called() + + @patch(f"{_MODULE}.read_config") + def test_skip_when_internal_ceph_backend_missing( + self, + mock_read_config, + import_step, + ): + """Nothing is imported when migration tfvars do not include internal-ceph.""" + mock_read_config.return_value = { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": {}, + } + + result = import_step.is_skip(Mock()) + + assert result.result_type == ResultType.SKIPPED + + +class TestMigrateCinderVolumeBuildTfvars: + """Tests for _build_storage_tfvars.""" + + def test_builds_correct_structure(self, step): + """Built tfvars have the expected top-level keys and structure.""" + tfvars = step._build_storage_tfvars("test-model-uuid") + + assert tfvars["model"] == "test-model-uuid" + assert "cinder-volume" in tfvars["cinder-volumes"] + assert INTERNAL_CEPH_BACKEND_NAME in tfvars["backends"] + + cv_entry = tfvars["cinder-volumes"]["cinder-volume"] + assert cv_entry["application_name"] == "cinder-volume" + assert cv_entry["machine_ids"] == ["0", "1", "2"] + assert tfvars["backends"][INTERNAL_CEPH_BACKEND_NAME]["application_name"] == ( + "cinder-volume-ceph" + ) + assert tfvars["backends"][INTERNAL_CEPH_BACKEND_NAME]["units"] is None + + def test_normalizes_integer_machine_ids_to_strings(self, step, client): + """Legacy integer machine IDs are normalized to the new tfvars contract.""" + client.cluster.list_nodes_by_role.return_value = [ + {"machineid": 0}, + {"machineid": 2}, + {"machineid": 1}, + ] + + tfvars = step._build_storage_tfvars("test-model-uuid") + + assert tfvars["cinder-volumes"]["cinder-volume"]["machine_ids"] == [ + "0", + "1", + "2", + ] + + def test_includes_control_plane_offers(self, step): + """Control plane offer URLs are included in cinder-volume entry.""" + tfvars = step._build_storage_tfvars("test-model-uuid") + cv_entry = tfvars["cinder-volumes"]["cinder-volume"] + + assert cv_entry["keystone-offer-url"] == "admin/openstack.keystone" + assert cv_entry["database-offer-url"] == "admin/openstack.cinder-db" + assert cv_entry["amqp-offer-url"] == "admin/openstack.rabbitmq" + + def test_preserves_existing_backends_on_merge(self, step, client): + """Existing third-party backends are preserved during migration.""" + pure_backend = {"charm_name": "cinder-purestorage", "units": 1} + pure_principal = { + "application_name": "cinder-volume", + "machine_ids": ["0", "1"], + } + + def _get_config(key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return json.dumps( + { + "model": "existing-uuid", + "backends": {"pure": pure_backend}, + "cinder-volumes": {"cinder-volume": pure_principal}, + } + ) + if key == "StorageBackendsEnabled": + return "[]" + raise ConfigItemNotFoundException("not found") + + client.cluster.get_config.side_effect = _get_config + + tfvars = step._build_storage_tfvars("test-model-uuid") + + # Existing backend is preserved + assert "pure" in tfvars["backends"] + assert tfvars["backends"]["pure"] == pure_backend + # Internal-ceph is added + assert INTERNAL_CEPH_BACKEND_NAME in tfvars["backends"] + # Existing HA principal is NOT overwritten + assert tfvars["cinder-volumes"]["cinder-volume"] == pure_principal + # Model preserved from existing config + assert tfvars["model"] == "existing-uuid" + + def test_creates_principal_when_only_noha_exists(self, step, client): + """HA principal is created when only non-HA exists (Hitachi scenario).""" + noha_principal = { + "application_name": "cinder-volume-noha", + "machine_ids": ["0"], + } + + def _get_config(key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return json.dumps( + { + "model": "existing-uuid", + "backends": {"hitachi": {"charm_name": "cinder-hitachi"}}, + "cinder-volumes": {"cinder-volume-noha": noha_principal}, + } + ) + if key == "StorageBackendsEnabled": + return "[]" + raise ConfigItemNotFoundException("not found") + + client.cluster.get_config.side_effect = _get_config + + tfvars = step._build_storage_tfvars("test-model-uuid") + + # Existing non-HA principal is preserved + assert "cinder-volume-noha" in tfvars["cinder-volumes"] + assert tfvars["cinder-volumes"]["cinder-volume-noha"] == noha_principal + # HA principal is created for internal-ceph + assert "cinder-volume" in tfvars["cinder-volumes"] + assert tfvars["cinder-volumes"]["cinder-volume"]["machine_ids"] == [ + "0", + "1", + "2", + ] + # Both backends present + assert "hitachi" in tfvars["backends"] + assert INTERNAL_CEPH_BACKEND_NAME in tfvars["backends"] + + +class TestMigrateCinderVolumeRun: + """Tests for run method.""" + + @patch(f"{_MODULE}.update_config") + def test_run_succeeds(self, mock_update_config, step, old_tfhelper, client): + """Migration run completes successfully.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + "juju_application.cinder-volume-ceph", + ] + old_tfhelper.pull_state.return_value = { + "resources": [ + { + "mode": "managed", + "type": "juju_application", + "name": "cinder-volume", + "instances": [{"attributes": {"id": "legacy-app-id"}}], + } + ] + } + client.cluster.get_storage_backend.side_effect = ( + StorageBackendNotFoundException("not found") + ) + + result = step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + # Old state was cleared + assert old_tfhelper.state_rm.call_count == 2 + # Backend was registered + client.cluster.add_storage_backend.assert_called_once() + # Tfvars were saved + mock_update_config.assert_called() + mock_update_config.assert_any_call( + client, + STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY, + {"juju_application.cinder-volume": "legacy-app-id"}, + ) + + def test_run_fails_on_state_clear_error(self, step, old_tfhelper): + """Run fails when clearing old state raises TerraformException.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + old_tfhelper.state_rm.side_effect = TerraformException("state rm failed") + + result = step.run(Mock()) + + assert result.result_type == ResultType.FAILED + assert "terraform state" in result.message.lower() + + @patch(f"{_MODULE}.update_config") + def test_run_fails_on_backend_registration_error( + self, mock_update_config, step, old_tfhelper, client + ): + """Run fails when backend registration raises an exception.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + client.cluster.get_storage_backend.side_effect = ( + StorageBackendNotFoundException("not found") + ) + client.cluster.add_storage_backend.side_effect = RuntimeError("api error") + + result = step.run(Mock()) + + assert result.result_type == ResultType.FAILED + assert "internal-ceph" in result.message.lower() + + @patch(f"{_MODULE}.update_config") + def test_run_does_not_clear_old_state_when_registration_fails( + self, mock_update_config, step, old_tfhelper, client + ): + """Partial failure leaves old state intact for safe retry. + + Regression test for a retry wedge where state was cleared first: + on retry, is_skip saw no resources and returned SKIPPED, leaving + the cluster half-migrated. + """ + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + client.cluster.get_storage_backend.side_effect = ( + StorageBackendNotFoundException("not found") + ) + client.cluster.add_storage_backend.side_effect = RuntimeError("api error") + + result = step.run(Mock()) + + assert result.result_type == ResultType.FAILED + # The old state must remain intact so the retry can re-run. + old_tfhelper.state_rm.assert_not_called() + + @patch(f"{_MODULE}.update_config") + def test_run_clears_old_state_only_after_clusterd_writes_succeed( + self, mock_update_config, step, old_tfhelper, client + ): + """Ordering invariant: old state is cleared last.""" + old_tfhelper.state_list.return_value = [ + "juju_application.cinder-volume", + ] + old_tfhelper.pull_state.return_value = {"resources": []} + client.cluster.get_storage_backend.side_effect = ( + StorageBackendNotFoundException("not found") + ) + + call_order: list[str] = [] + + def _track_add_storage(**kwargs): + call_order.append("register") + + def _track_state_rm(resource): + call_order.append("state_rm") + + client.cluster.add_storage_backend.side_effect = _track_add_storage + old_tfhelper.state_rm.side_effect = _track_state_rm + + def _track_update_config(client_arg, key, value): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + call_order.append("build_tfvars") + + mock_update_config.side_effect = _track_update_config + + result = step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + # register happens before build, build before state_rm + assert call_order.index("register") < call_order.index("build_tfvars") + assert call_order.index("build_tfvars") < call_order.index("state_rm") + + +class TestImportCephResourcesCleanup: + """Tests for legacy-import-id cleanup after successful import.""" + + @pytest.fixture + def storage_tfhelper(self): + helper = Mock() + helper.state_list.return_value = [] + return helper + + @pytest.fixture + def import_step(self, deployment, client, storage_tfhelper, jhelper): + return ImportCephResourcesToStorageFrameworkStep( + deployment=deployment, + client=client, + tfhelper=storage_tfhelper, + jhelper=jhelper, + model="machines", + ) + + @patch(f"{_MODULE}.read_config") + def test_run_clears_legacy_import_ids_after_success( + self, + mock_read_config, + import_step, + client, + storage_tfhelper, + ): + """The legacy-id map must be removed from clusterd once imports succeed.""" + + def _read_config(_client, key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": {"internal-ceph": {}}, + } + if key == STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY: + return { + "juju_integration.cinder-volume-identity[0]": "legacy-id-identity" + } + raise ConfigItemNotFoundException("not found") + + mock_read_config.side_effect = _read_config + + result = import_step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED + client.cluster.delete_config.assert_called_once_with( + STORAGE_BACKEND_LEGACY_IMPORT_IDS_KEY + ) + + @patch(f"{_MODULE}.read_config") + def test_run_tolerates_missing_legacy_import_ids_key_on_cleanup( + self, + mock_read_config, + import_step, + client, + storage_tfhelper, + ): + """Cleanup must not fail if the legacy-id key is already gone.""" + + def _read_config(_client, key): + if key == STORAGE_BACKEND_TFVAR_CONFIG_KEY: + return { + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "keystone-offer-url": "admin/openstack.keystone", + "amqp-offer-url": "admin/openstack.rabbitmq", + "database-offer-url": "admin/openstack.cinder-db", + "cert-distributor-offer-url": None, + } + }, + "backends": {"internal-ceph": {}}, + } + raise ConfigItemNotFoundException("not found") + + mock_read_config.side_effect = _read_config + client.cluster.delete_config.side_effect = ConfigItemNotFoundException( + "not found" + ) + + result = import_step.run(Mock()) + + assert result.result_type == ResultType.COMPLETED diff --git a/sunbeam-python/tests/unit/sunbeam/storage/backends/test_internal_ceph.py b/sunbeam-python/tests/unit/sunbeam/storage/backends/test_internal_ceph.py new file mode 100644 index 000000000..a99ec77a8 --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/storage/backends/test_internal_ceph.py @@ -0,0 +1,162 @@ +# SPDX-FileCopyrightText: 2025 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +"""Tests for internal-ceph storage backend.""" + +from unittest.mock import MagicMock + +import pytest + +from sunbeam.core.manifest import StorageBackendConfig +from sunbeam.storage.backends.internal_ceph.backend import ( + InternalCephBackend, + InternalCephConfig, +) +from sunbeam.storage.base import ( + BackendIntegration, + HypervisorIntegration, + StorageBackendBase, +) + + +@pytest.fixture +def internal_ceph_backend(): + """Provide an InternalCephBackend instance.""" + return InternalCephBackend() + + +@pytest.fixture +def mock_deployment(): + """Provide a mock deployment for endpoint binding tests.""" + deployment = MagicMock() + deployment.get_space.side_effect = lambda net: f"space-{net.value}" + return deployment + + +class TestInternalCephBackendAttributes: + """Tests for InternalCephBackend class attributes and properties.""" + + def test_backend_type(self, internal_ceph_backend): + """Test that backend_type is 'internal-ceph'.""" + assert internal_ceph_backend.backend_type == "internal-ceph" + + def test_display_name(self, internal_ceph_backend): + """Test that display_name is set.""" + assert internal_ceph_backend.display_name == "Internal Ceph" + + def test_generally_available(self, internal_ceph_backend): + """Test that generally_available is True.""" + assert internal_ceph_backend.generally_available is True + + def test_is_storage_backend_base(self, internal_ceph_backend): + """Test that backend inherits from StorageBackendBase.""" + assert isinstance(internal_ceph_backend, StorageBackendBase) + + def test_charm_name(self, internal_ceph_backend): + """Test that charm_name is 'cinder-volume-ceph'.""" + assert internal_ceph_backend.charm_name == "cinder-volume-ceph" + + def test_charm_channel(self, internal_ceph_backend): + """Test that charm_channel is set.""" + assert internal_ceph_backend.charm_channel == "2024.1/stable" + + def test_charm_base(self, internal_ceph_backend): + """Test that charm_base is ubuntu@24.04.""" + assert internal_ceph_backend.charm_base == "ubuntu@24.04" + + def test_supports_ha(self, internal_ceph_backend): + """Test that supports_ha is True.""" + assert internal_ceph_backend.supports_ha is True + + def test_principal_application(self, internal_ceph_backend): + """Test that principal_application is 'cinder-volume' (HA).""" + assert internal_ceph_backend.principal_application == "cinder-volume" + + def test_application_name(self, internal_ceph_backend): + """Test that the backend keeps the legacy subordinate app name.""" + assert internal_ceph_backend.get_application_name("internal-ceph") == ( + "cinder-volume-ceph" + ) + + def test_units(self, internal_ceph_backend): + """Test that internal Ceph is modeled as a subordinate app.""" + assert internal_ceph_backend.get_units() is None + + def test_config_type_returns_internal_ceph_config(self, internal_ceph_backend): + """Test that config_type() returns InternalCephConfig.""" + assert internal_ceph_backend.config_type() is InternalCephConfig + + def test_config_type_is_storage_backend_config_subclass( + self, internal_ceph_backend + ): + """Test that config_type() returns a StorageBackendConfig subclass.""" + config_class = internal_ceph_backend.config_type() + assert issubclass(config_class, StorageBackendConfig) + + +class TestInternalCephConfig: + """Tests for InternalCephConfig model.""" + + def test_default_replication_count(self): + """Test that default ceph_osd_replication_count is 1.""" + config = InternalCephConfig() + assert config.ceph_osd_replication_count == 1 + + def test_custom_replication_count(self): + """Test creating config with custom replication count.""" + config = InternalCephConfig.model_validate({"ceph-osd-replication-count": 3}) + assert config.ceph_osd_replication_count == 3 + + def test_config_is_pydantic_model(self): + """Test that InternalCephConfig is a Pydantic model.""" + from pydantic import BaseModel + + assert issubclass(InternalCephConfig, BaseModel) + + def test_config_is_storage_backend_config(self): + """Test that InternalCephConfig extends StorageBackendConfig.""" + assert issubclass(InternalCephConfig, StorageBackendConfig) + + +class TestInternalCephIntegrations: + """Tests for integration methods.""" + + def test_get_extra_integrations(self, internal_ceph_backend, mock_deployment): + """Test that get_extra_integrations returns microceph ceph integration.""" + integrations = internal_ceph_backend.get_extra_integrations(mock_deployment) + assert len(integrations) == 1 + + integration = next(iter(integrations)) + assert isinstance(integration, BackendIntegration) + assert integration.application_name == "microceph" + assert integration.endpoint_name == "ceph" + assert integration.backend_endpoint_name == "ceph" + + def test_get_hypervisor_integrations(self, internal_ceph_backend, mock_deployment): + """Test that get_hypervisor_integrations returns ceph-access integration.""" + integrations = internal_ceph_backend.get_hypervisor_integrations( + mock_deployment + ) + assert len(integrations) == 1 + + integration = next(iter(integrations)) + assert isinstance(integration, HypervisorIntegration) + assert integration.application_name == "cinder-volume-ceph" + assert integration.endpoint_name == "ceph-access" + assert integration.hypervisor_endpoint_name == "ceph-access" + + def test_get_endpoint_bindings(self, internal_ceph_backend, mock_deployment): + """Test endpoint bindings match the original cinder_volume_ceph.""" + bindings = internal_ceph_backend.get_endpoint_bindings(mock_deployment) + + # Should have default space, ceph-access, and ceph bindings + endpoints = {b.get("endpoint"): b.get("space") for b in bindings} + + # default space on MANAGEMENT + assert endpoints[None] == "space-management" + # ceph-access on MANAGEMENT space + assert endpoints["ceph-access"] == "space-management" + # ceph on STORAGE space + assert endpoints["ceph"] == "space-storage" + # No extra bindings + assert len(bindings) == 3 diff --git a/sunbeam-python/tests/unit/sunbeam/storage/test_base.py b/sunbeam-python/tests/unit/sunbeam/storage/test_base.py index 69ffe39f0..1df9ff11f 100644 --- a/sunbeam-python/tests/unit/sunbeam/storage/test_base.py +++ b/sunbeam-python/tests/unit/sunbeam/storage/test_base.py @@ -17,6 +17,8 @@ from sunbeam.storage.base import ( FQDN_PATTERN, JUJU_APP_NAME_PATTERN, + BackendIntegration, + HypervisorIntegration, validate_juju_application_name, ) @@ -291,6 +293,8 @@ def test_build_terraform_vars(self, backend, mock_deployment, mock_manifest): ) assert "principal_application" in tfvars + assert "units" in tfvars + assert tfvars["units"] == 1 assert tfvars["principal_application"] == backend.principal_application assert "charm_name" in tfvars assert tfvars["charm_name"] == backend.charm_name @@ -300,6 +304,51 @@ def test_build_terraform_vars(self, backend, mock_deployment, mock_manifest): assert "charm_config" in tfvars assert "secrets" in tfvars + def test_build_terraform_vars_extra_integrations_empty( + self, backend, mock_deployment, mock_manifest + ): + """Test build_terraform_vars includes empty extra_integrations.""" + config = backend.config_type().model_validate( + { + "required-field": "test", + "secret-field": "secret123", + } + ) + + tfvars = backend.build_terraform_vars( + mock_deployment, mock_manifest, "test-backend", config + ) + + assert "extra_integrations" in tfvars + assert tfvars["extra_integrations"] == [] + + def test_build_terraform_vars_extra_integrations_non_empty( + self, backend, mock_deployment, mock_manifest + ): + """Test build_terraform_vars includes overridden extra_integrations.""" + integrations = { + BackendIntegration("microceph", "ceph", "ceph-access"), + BackendIntegration("vault", "secrets", "vault-kv"), + } + with patch.object(backend, "get_extra_integrations", return_value=integrations): + config = backend.config_type().model_validate( + { + "required-field": "test", + "secret-field": "secret123", + } + ) + + tfvars = backend.build_terraform_vars( + mock_deployment, mock_manifest, "test-backend", config + ) + + assert "extra_integrations" in tfvars + assert len(tfvars["extra_integrations"]) == 2 + # Convert to a set of frozensets for order-independent comparison + result_set = {frozenset(d.items()) for d in tfvars["extra_integrations"]} + expected_set = {frozenset(i.to_dict().items()) for i in integrations} + assert result_set == expected_set + def test_display_config_options(self, backend, mock_console): """Test display of configuration options.""" with patch("sunbeam.storage.base.console", mock_console): @@ -593,3 +642,98 @@ def test_enable_backend_add_to_existing_list(self, backend): mock_client.cluster.update_config.assert_called_once_with( "StorageBackendsEnabled", json.dumps(expected_backends) ) + + +class TestBackendIntegration: + def test_creation(self): + integration = BackendIntegration( + application_name="microceph", + endpoint_name="ceph", + backend_endpoint_name="ceph-access", + ) + assert integration.application_name == "microceph" + assert integration.endpoint_name == "ceph" + assert integration.backend_endpoint_name == "ceph-access" + + def test_frozen(self): + from dataclasses import FrozenInstanceError + + integration = BackendIntegration( + application_name="microceph", + endpoint_name="ceph", + backend_endpoint_name="ceph-access", + ) + with pytest.raises(FrozenInstanceError): + integration.application_name = "other" + + def test_hashable_in_set(self): + i1 = BackendIntegration("microceph", "ceph", "ceph-access") + i2 = BackendIntegration("microceph", "ceph", "ceph-access") + i3 = BackendIntegration("other", "ceph", "ceph-access") + s = {i1, i2, i3} + assert len(s) == 2 + + def test_to_dict(self): + integration = BackendIntegration("microceph", "ceph", "ceph-access") + d = integration.to_dict() + assert d == { + "application_name": "microceph", + "endpoint_name": "ceph", + "backend_endpoint_name": "ceph-access", + } + + +class TestHypervisorIntegration: + def test_creation(self): + integration = HypervisorIntegration( + application_name="cinder-volume-ceph", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph-access", + ) + assert integration.application_name == "cinder-volume-ceph" + + def test_frozen(self): + from dataclasses import FrozenInstanceError + + integration = HypervisorIntegration( + application_name="cinder-volume-ceph", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph-access", + ) + with pytest.raises(FrozenInstanceError): + integration.application_name = "other" + + def test_hashable_in_set(self): + i1 = HypervisorIntegration("cinder-volume-ceph", "ceph-access", "ceph-access") + i2 = HypervisorIntegration("cinder-volume-ceph", "ceph-access", "ceph-access") + s = {i1, i2} + assert len(s) == 1 + + def test_to_dict(self): + integration = HypervisorIntegration( + "cinder-volume-ceph", "ceph-access", "ceph-access" + ) + d = integration.to_dict() + assert d == { + "application_name": "cinder-volume-ceph", + "endpoint_name": "ceph-access", + "hypervisor_endpoint_name": "ceph-access", + } + + +class TestStorageBackendBaseIntegrationMethods: + """Tests for the default integration methods on StorageBackendBase.""" + + def test_get_extra_integrations_returns_empty_set( + self, mock_backend, mock_deployment + ): + result = mock_backend.get_extra_integrations(mock_deployment) + assert result == set() + assert isinstance(result, set) + + def test_get_hypervisor_integrations_returns_empty_set( + self, mock_backend, mock_deployment + ): + result = mock_backend.get_hypervisor_integrations(mock_deployment) + assert result == set() + assert isinstance(result, set) diff --git a/sunbeam-python/tests/unit/sunbeam/storage/test_manager.py b/sunbeam-python/tests/unit/sunbeam/storage/test_manager.py index c39479f04..cb4521e12 100644 --- a/sunbeam-python/tests/unit/sunbeam/storage/test_manager.py +++ b/sunbeam-python/tests/unit/sunbeam/storage/test_manager.py @@ -8,6 +8,7 @@ import click import pytest +from sunbeam.storage.base import HypervisorIntegration from sunbeam.storage.manager import StorageBackendManager from sunbeam.storage.models import StorageBackendInfo @@ -354,3 +355,175 @@ def test_show_backend_command(self, manager, mock_deployment): break assert show_command is not None + + +class TestCollectHypervisorIntegrations: + """Tests for StorageBackendManager.collect_hypervisor_integrations.""" + + @pytest.fixture + def mock_client(self): + """Create a mock clusterd client.""" + return Mock() + + @pytest.fixture + def mock_deployment(self): + """Create a mock deployment.""" + return Mock() + + @pytest.fixture + def manager_with_backends(self): + """Create a manager with controlled backends.""" + StorageBackendManager._backends = {} + StorageBackendManager._loaded = True + return StorageBackendManager() + + def test_returns_integrations_from_registered_backends( + self, manager_with_backends, mock_deployment, mock_client + ): + """Returns integrations from backends that have registered instances.""" + integration_a = HypervisorIntegration( + application_name="ceph-app", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph", + ) + integration_b = HypervisorIntegration( + application_name="iscsi-app", + endpoint_name="iscsi-access", + hypervisor_endpoint_name="iscsi", + ) + + backend_ceph = Mock() + backend_ceph.get_hypervisor_integrations.return_value = {integration_a} + + backend_iscsi = Mock() + backend_iscsi.get_hypervisor_integrations.return_value = {integration_b} + + manager_with_backends._backends["ceph"] = backend_ceph + manager_with_backends._backends["iscsi"] = backend_iscsi + + # Simulate two registered backend instances, one ceph and one iscsi + registered_ceph = Mock() + registered_ceph.type = "ceph" + registered_iscsi = Mock() + registered_iscsi.type = "iscsi" + mock_client.cluster.get_storage_backends.return_value.root = [ + registered_ceph, + registered_iscsi, + ] + + result = manager_with_backends.collect_hypervisor_integrations( + mock_deployment, mock_client + ) + + assert result == {integration_a, integration_b} + backend_ceph.get_hypervisor_integrations.assert_called_once_with( + mock_deployment + ) + backend_iscsi.get_hypervisor_integrations.assert_called_once_with( + mock_deployment + ) + + def test_returns_empty_set_when_no_backends_registered( + self, manager_with_backends, mock_deployment, mock_client + ): + """Returns empty set when no backends are registered in clusterd.""" + mock_client.cluster.get_storage_backends.return_value.root = [] + + result = manager_with_backends.collect_hypervisor_integrations( + mock_deployment, mock_client + ) + + assert result == set() + + def test_skips_unknown_backend_type( + self, manager_with_backends, mock_deployment, mock_client + ): + """Skips backend types not loaded in the manager.""" + known_backend = Mock() + known_backend.get_hypervisor_integrations.return_value = set() + manager_with_backends._backends["known"] = known_backend + + registered_known = Mock() + registered_known.type = "known" + registered_unknown = Mock() + registered_unknown.type = "unknown-type" + mock_client.cluster.get_storage_backends.return_value.root = [ + registered_known, + registered_unknown, + ] + + result = manager_with_backends.collect_hypervisor_integrations( + mock_deployment, mock_client + ) + + assert result == set() + known_backend.get_hypervisor_integrations.assert_called_once_with( + mock_deployment + ) + + def test_deduplicates_backend_types( + self, manager_with_backends, mock_deployment, mock_client + ): + """Calls get_hypervisor_integrations once per type, multiple instances.""" + integration = HypervisorIntegration( + application_name="ceph-app", + endpoint_name="ceph-access", + hypervisor_endpoint_name="ceph", + ) + + backend = Mock() + backend.get_hypervisor_integrations.return_value = {integration} + manager_with_backends._backends["ceph"] = backend + + # Two instances of the same type + instance1 = Mock() + instance1.type = "ceph" + instance2 = Mock() + instance2.type = "ceph" + mock_client.cluster.get_storage_backends.return_value.root = [ + instance1, + instance2, + ] + + result = manager_with_backends.collect_hypervisor_integrations( + mock_deployment, mock_client + ) + + assert result == {integration} + backend.get_hypervisor_integrations.assert_called_once_with(mock_deployment) + + def test_unions_integrations_from_multiple_backends( + self, manager_with_backends, mock_deployment, mock_client + ): + """Returns the union of integrations from all matching backends.""" + shared = HypervisorIntegration( + application_name="shared-app", + endpoint_name="shared-ep", + hypervisor_endpoint_name="shared", + ) + unique = HypervisorIntegration( + application_name="unique-app", + endpoint_name="unique-ep", + hypervisor_endpoint_name="unique", + ) + + backend_a = Mock() + backend_a.get_hypervisor_integrations.return_value = {shared} + backend_b = Mock() + backend_b.get_hypervisor_integrations.return_value = {shared, unique} + + manager_with_backends._backends["type-a"] = backend_a + manager_with_backends._backends["type-b"] = backend_b + + reg_a = Mock() + reg_a.type = "type-a" + reg_b = Mock() + reg_b.type = "type-b" + mock_client.cluster.get_storage_backends.return_value.root = [reg_a, reg_b] + + result = manager_with_backends.collect_hypervisor_integrations( + mock_deployment, mock_client + ) + + assert result == {shared, unique} + assert len(result) == 2 diff --git a/sunbeam-python/tests/unit/sunbeam/storage/test_steps.py b/sunbeam-python/tests/unit/sunbeam/storage/test_steps.py index dbc412f12..d312bfa45 100644 --- a/sunbeam-python/tests/unit/sunbeam/storage/test_steps.py +++ b/sunbeam-python/tests/unit/sunbeam/storage/test_steps.py @@ -7,10 +7,21 @@ import pydantic import pytest +from sunbeam.clusterd.service import ( + ConfigItemNotFoundException, + NodeNotExistInClusterException, +) +from sunbeam.core.common import ResultType +from sunbeam.core.juju import ApplicationNotFoundException from sunbeam.core.questions import PasswordPromptQuestion, PromptQuestion from sunbeam.storage.models import SecretDictField from sunbeam.storage.steps import ( + BaseStorageBackendDestroyStep, + CheckStorageNodeRemovalStep, DeploySpecificCinderVolumeStep, + DestroySpecificCinderVolumeStep, + RemoveStorageMachineUnitsStep, + ValidateStoragePrerequisitesStep, basemodel_validator, generate_questions_from_config, ) @@ -356,3 +367,951 @@ def test_run_extra_tfvars_precedence( ] is True ) + + +class TestDeploySpecificCinderVolumeStepIsSkip: + """Tests for DeploySpecificCinderVolumeStep.is_skip() lifecycle logic.""" + + @pytest.fixture + def ha_backend_instance(self): + """Mock HA storage backend instance.""" + backend = Mock() + backend.principal_application = "cinder-volume" + backend.supports_ha = True + backend.snap_name = "cinder-volume" + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @pytest.fixture + def noha_backend_instance(self): + """Mock non-HA storage backend instance.""" + backend = Mock() + backend.principal_application = "cinder-volume-noha" + backend.supports_ha = False + backend.snap_name = "cinder-volume_noha" + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @patch("sunbeam.storage.steps.read_config") + def test_ha_backend_does_not_skip_when_no_cinder_volume_entry( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """HA backend should NOT skip deploy when no cinder-volume entry exists.""" + basic_client.cluster.list_nodes_by_role.return_value = [{"machineid": "0"}] + mock_read_config.return_value = {"backends": {}, "cinder-volumes": {}} + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + @patch("sunbeam.storage.steps.read_config") + def test_ha_backend_does_not_skip_when_config_not_found( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """HA backend should NOT skip deploy when config key doesn't exist yet.""" + basic_client.cluster.list_nodes_by_role.return_value = [{"machineid": "0"}] + mock_read_config.side_effect = ConfigItemNotFoundException("not found") + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + def test_ha_backend_does_not_skip_when_cinder_volume_entry_exists( + self, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """HA backend must not skip when a principal entry already exists. + + Regression test: previously the step skipped, which prevented + machine_ids from being refreshed when a new storage node joined. + """ + basic_client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + ] + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + @patch("sunbeam.storage.steps.read_config") + def test_noha_backend_does_not_skip_when_no_cinder_volume_entry( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + noha_backend_instance, + step_context, + ): + """Non-HA backend should NOT skip when no cinder-volume entry exists.""" + basic_client.cluster.list_nodes_by_role.return_value = [{"machineid": "0"}] + mock_read_config.return_value = {"backends": {}, "cinder-volumes": {}} + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + noha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + def test_deploy_fails_when_no_storage_nodes( + self, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Deploy should fail when no storage nodes are found.""" + basic_client.cluster.list_nodes_by_role.return_value = [] + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.FAILED + + +class TestDeploySpecificCinderVolumeStepRunMachineIds: + """Tests for DeploySpecificCinderVolumeStep.run() machine ID selection.""" + + @pytest.fixture + def ha_backend_instance(self): + backend = Mock() + backend.principal_application = "cinder-volume" + backend.supports_ha = True + backend.snap_name = "cinder-volume" + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @pytest.fixture + def noha_backend_instance(self): + backend = Mock() + backend.principal_application = "cinder-volume-noha" + backend.supports_ha = False + backend.snap_name = "cinder-volume_noha" + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @patch("sunbeam.storage.steps.read_config") + @patch("sunbeam.storage.steps.get_optional_control_plane_offers") + @patch("sunbeam.storage.steps.get_mandatory_control_plane_offers") + def test_ha_deploy_uses_all_storage_node_machine_ids( + self, + mock_get_offers, + mock_get_optional_offers, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """HA deploy should use all storage node machine IDs.""" + mock_read_config.return_value = {} + mock_get_offers.return_value = { + "keystone-offer-url": "keystone-url", + "amqp-offer-url": "amqp-url", + "database-offer-url": "database-url", + } + mock_get_optional_offers.return_value = {} + basic_client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + {"machineid": "2"}, + ] + basic_jhelper.get_model.return_value = {"model-uuid": "test-uuid"} + basic_jhelper.wait_application_ready = Mock() + basic_deployment.get_space.return_value = "test-space" + basic_deployment.get_tfhelper.return_value = Mock() + + feature_manager = Mock() + feature_manager.is_feature_enabled.return_value = False + basic_deployment.get_feature_manager.return_value = feature_manager + + mock_charm = Mock() + mock_charm.config = {} + mock_charm.channel = "2024.1/edge" + mock_charm.revision = 123 + basic_manifest.core.software.charms = {"cinder-volume": mock_charm} + + basic_tfhelper.update_tfvars_and_apply_tf = Mock() + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + step.run(step_context) + + call_args = basic_tfhelper.update_tfvars_and_apply_tf.call_args + tfvars = call_args[1]["override_tfvars"] + machine_ids = tfvars["cinder-volumes"]["cinder-volume"]["machine_ids"] + assert machine_ids == ["0", "1", "2"] + + @patch("sunbeam.storage.steps.read_config") + @patch("sunbeam.storage.steps.get_optional_control_plane_offers") + @patch("sunbeam.storage.steps.get_mandatory_control_plane_offers") + def test_noha_deploy_uses_first_node_only( + self, + mock_get_offers, + mock_get_optional_offers, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + noha_backend_instance, + step_context, + ): + """Non-HA deploy should use first storage node only.""" + mock_read_config.return_value = {} + mock_get_offers.return_value = { + "keystone-offer-url": "keystone-url", + "amqp-offer-url": "amqp-url", + "database-offer-url": "database-url", + } + mock_get_optional_offers.return_value = {} + basic_client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + {"machineid": "2"}, + ] + basic_jhelper.get_model.return_value = {"model-uuid": "test-uuid"} + basic_jhelper.wait_application_ready = Mock() + basic_deployment.get_space.return_value = "test-space" + basic_deployment.get_tfhelper.return_value = Mock() + + feature_manager = Mock() + feature_manager.is_feature_enabled.return_value = False + basic_deployment.get_feature_manager.return_value = feature_manager + + mock_charm = Mock() + mock_charm.config = {} + mock_charm.channel = "2024.1/edge" + mock_charm.revision = 123 + basic_manifest.core.software.charms = {"cinder-volume": mock_charm} + + basic_tfhelper.update_tfvars_and_apply_tf = Mock() + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + noha_backend_instance, + test_model, + ) + + step.run(step_context) + + call_args = basic_tfhelper.update_tfvars_and_apply_tf.call_args + tfvars = call_args[1]["override_tfvars"] + machine_ids = tfvars["cinder-volumes"]["cinder-volume-noha"]["machine_ids"] + assert machine_ids == ["0"] + + @patch("sunbeam.storage.steps.read_config") + @patch("sunbeam.storage.steps.get_optional_control_plane_offers") + @patch("sunbeam.storage.steps.get_mandatory_control_plane_offers") + def test_ha_deploy_refreshes_machine_ids_on_scale_out( + self, + mock_get_offers, + mock_get_optional_offers, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Existing stale machine_ids must be overwritten on scale-out. + + Regression: the step previously skipped when a principal entry + existed, so a newly joined storage node never got a unit. + """ + mock_read_config.return_value = { + "model": "test-uuid", + "cinder-volumes": { + "cinder-volume": { + "application_name": "cinder-volume", + "machine_ids": ["0"], + } + }, + } + mock_get_offers.return_value = { + "keystone-offer-url": "keystone-url", + "amqp-offer-url": "amqp-url", + "database-offer-url": "database-url", + } + mock_get_optional_offers.return_value = {} + basic_client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + {"machineid": "2"}, + ] + basic_jhelper.get_model.return_value = {"model-uuid": "test-uuid"} + basic_jhelper.wait_application_ready = Mock() + basic_deployment.get_space.return_value = "test-space" + basic_deployment.get_tfhelper.return_value = Mock() + + feature_manager = Mock() + feature_manager.is_feature_enabled.return_value = False + basic_deployment.get_feature_manager.return_value = feature_manager + + mock_charm = Mock() + mock_charm.config = {} + mock_charm.channel = "2024.1/edge" + mock_charm.revision = 123 + basic_manifest.core.software.charms = {"cinder-volume": mock_charm} + + basic_tfhelper.update_tfvars_and_apply_tf = Mock() + + step = DeploySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "test-backend", + ha_backend_instance, + test_model, + ) + + step.run(step_context) + + call_args = basic_tfhelper.update_tfvars_and_apply_tf.call_args + tfvars = call_args[1]["override_tfvars"] + machine_ids = tfvars["cinder-volumes"]["cinder-volume"]["machine_ids"] + assert machine_ids == ["0", "1", "2"] + + +class TestDestroySpecificCinderVolumeStepIsSkip: + """Tests for DestroySpecificCinderVolumeStep.is_skip() lifecycle logic.""" + + @pytest.fixture + def ha_backend_instance(self): + backend = Mock() + backend.principal_application = "cinder-volume" + backend.supports_ha = True + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @pytest.fixture + def noha_backend_instance(self): + backend = Mock() + backend.principal_application = "cinder-volume-noha" + backend.supports_ha = False + backend.tfvar_config_key = "TerraformVarsStorageBackends" + return backend + + @patch("sunbeam.storage.steps.read_config") + def test_destroy_skips_when_another_ha_backend_uses_same_principal( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Destroy should skip when another backend still uses the same principal.""" + mock_read_config.return_value = { + "backends": { + "backend-a": {"principal_application": "cinder-volume"}, + "backend-b": {"principal_application": "cinder-volume"}, + }, + "cinder-volumes": {"cinder-volume": {"application_name": "cinder-volume"}}, + } + + step = DestroySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-a", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + @patch("sunbeam.storage.steps.read_config") + def test_destroy_proceeds_when_no_other_backend_uses_principal( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Destroy should proceed when no other backend uses the principal.""" + mock_read_config.return_value = { + "backends": { + "backend-a": {"principal_application": "cinder-volume"}, + }, + "cinder-volumes": {"cinder-volume": {"application_name": "cinder-volume"}}, + } + + step = DestroySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-a", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + @patch("sunbeam.storage.steps.read_config") + def test_destroy_skips_when_principal_entry_not_in_cinder_volumes( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Destroy should skip when principal entry doesn't exist in tfvars.""" + mock_read_config.return_value = { + "backends": { + "backend-a": {"principal_application": "cinder-volume"}, + }, + "cinder-volumes": {}, + } + + step = DestroySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-a", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + @patch("sunbeam.storage.steps.read_config") + def test_destroy_skips_when_config_not_found( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + ha_backend_instance, + step_context, + ): + """Destroy should skip when config doesn't exist (nothing deployed).""" + mock_read_config.side_effect = ConfigItemNotFoundException("not found") + + step = DestroySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-a", + ha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + @patch("sunbeam.storage.steps.read_config") + def test_destroy_noha_proceeds_when_only_backend( + self, + mock_read_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + test_model, + noha_backend_instance, + step_context, + ): + """Non-HA destroy should proceed when it is the only backend.""" + mock_read_config.return_value = { + "backends": { + "backend-noha": {"principal_application": "cinder-volume-noha"}, + }, + "cinder-volumes": { + "cinder-volume-noha": {"application_name": "cinder-volume-noha"} + }, + } + + step = DestroySpecificCinderVolumeStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-noha", + noha_backend_instance, + test_model, + ) + + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + +class TestBaseStorageBackendDestroyStepRun: + """Tests for BaseStorageBackendDestroyStep.run() idempotency.""" + + @pytest.fixture + def backend_instance(self): + backend = Mock() + backend.display_name = "Test Backend" + backend.tfvar_config_key = "TerraformVarsStorageBackends" + backend.config_key = Mock(return_value="Storage-backend-a") + return backend + + def _make_step( + self, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + backend_instance, + test_model, + ): + return BaseStorageBackendDestroyStep( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + "backend-a", + backend_instance, + test_model, + ) + + @patch("sunbeam.storage.steps.update_config") + @patch("sunbeam.storage.steps.read_config") + def test_run_does_not_raise_when_tfvars_config_missing( + self, + mock_read_config, + mock_update_config, + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + backend_instance, + test_model, + step_context, + ): + """Regression test for KeyError when tfvars is {} from a missing config. + + Previously the destroy step indexed tfvars['backends'] directly, + raising KeyError when ConfigItemNotFoundException set tfvars = {}. + The step must now tolerate a missing config item and return + COMPLETED after a clean terraform apply. + """ + mock_read_config.side_effect = ConfigItemNotFoundException("not found") + basic_tfhelper.update_tfvars_and_apply_tf.return_value = None + basic_client.cluster.delete_storage_backend = Mock() + basic_client.cluster.delete_config = Mock() + + step = self._make_step( + basic_deployment, + basic_client, + basic_tfhelper, + basic_jhelper, + basic_manifest, + backend_instance, + test_model, + ) + result = step.run(step_context) + + assert result.result_type == ResultType.COMPLETED + assert basic_tfhelper.update_tfvars_and_apply_tf.called + applied_tfvars = basic_tfhelper.update_tfvars_and_apply_tf.call_args[1][ + "override_tfvars" + ] + assert applied_tfvars.get("backends") == {} + + +class TestValidateStoragePrerequisitesStep: + """Tests for ValidateStoragePrerequisitesStep.""" + + @pytest.fixture + def validate_step(self, basic_deployment, basic_client, basic_jhelper): + """Create ValidateStoragePrerequisitesStep instance for testing.""" + basic_deployment.openstack_machines_model = "openstack-machines" + return ValidateStoragePrerequisitesStep( + basic_deployment, + basic_client, + basic_jhelper, + ) + + def test_succeeds_without_cinder_volume_app( + self, + validate_step, + basic_client, + basic_jhelper, + step_context, + ): + """Step should succeed when cinder-volume app does not exist. + + As long as Juju auth, bootstrap, model, and storage nodes are OK, + the absence of a cinder-volume application must not cause failure. + """ + # Juju auth succeeds + basic_jhelper.models.return_value = ["openstack-machines"] + + # Sunbeam bootstrapped + basic_client.cluster.check_sunbeam_bootstrapped.return_value = True + + # OpenStack model exists + basic_jhelper.model_exists.return_value = True + + # Storage nodes are deployed + basic_client.cluster.list_nodes_by_role.return_value = [ + {"machineid": "0"}, + {"machineid": "1"}, + ] + + # No cinder-volume application present (this must NOT cause failure) + basic_jhelper.get_application.side_effect = Exception( + "application cinder-volume not found" + ) + + result = validate_step.run(step_context) + assert result.result_type == ResultType.COMPLETED + + def test_fails_when_not_bootstrapped( + self, + validate_step, + basic_client, + basic_jhelper, + step_context, + ): + """Step should fail when Sunbeam is not bootstrapped.""" + basic_jhelper.models.return_value = ["openstack-machines"] + basic_client.cluster.check_sunbeam_bootstrapped.return_value = False + + result = validate_step.run(step_context) + assert result.result_type == ResultType.FAILED + + def test_fails_when_no_storage_nodes( + self, + validate_step, + basic_client, + basic_jhelper, + step_context, + ): + """Step should fail when no storage nodes exist.""" + basic_jhelper.models.return_value = ["openstack-machines"] + basic_client.cluster.check_sunbeam_bootstrapped.return_value = True + basic_jhelper.model_exists.return_value = True + basic_client.cluster.list_nodes_by_role.return_value = [] + + result = validate_step.run(step_context) + assert result.result_type == ResultType.FAILED + + def test_fails_when_model_does_not_exist( + self, + validate_step, + basic_client, + basic_jhelper, + step_context, + ): + """Step should fail when OpenStack model does not exist.""" + basic_jhelper.models.return_value = ["openstack-machines"] + basic_client.cluster.check_sunbeam_bootstrapped.return_value = True + basic_jhelper.model_exists.return_value = False + + result = validate_step.run(step_context) + assert result.result_type == ResultType.FAILED + + +class TestCheckStorageNodeRemovalStep: + """Tests for CheckStorageNodeRemovalStep.""" + + @pytest.fixture + def make_step(self, basic_client, basic_jhelper): + """Factory for creating CheckStorageNodeRemovalStep with defaults.""" + + def _make(force=False, node_name="node-0", model="openstack-machines"): + return CheckStorageNodeRemovalStep( + basic_client, node_name, basic_jhelper, model, force=force + ) + + return _make + + def test_skips_for_non_storage_node(self, make_step, basic_client, step_context): + """Step should skip when the departing node is not a storage node.""" + basic_client.cluster.get_node_info.return_value = { + "name": "node-0", + "role": "control", + "machineid": "0", + } + + step = make_step() + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + def test_skips_when_node_not_found(self, make_step, basic_client, step_context): + """Step should skip when the node does not exist in the cluster.""" + basic_client.cluster.get_node_info.side_effect = NodeNotExistInClusterException( + "not found" + ) + + step = make_step() + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + def test_skips_when_cinder_volume_not_deployed( + self, make_step, basic_client, basic_jhelper, step_context + ): + """Step should skip when cinder-volume app does not exist.""" + basic_client.cluster.get_node_info.return_value = { + "name": "node-0", + "role": "storage", + "machineid": "0", + } + basic_jhelper.get_application.side_effect = ApplicationNotFoundException( + "not found" + ) + + step = make_step() + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + def test_skips_when_no_cinder_volume_unit_on_node( + self, make_step, basic_client, basic_jhelper, step_context + ): + """Step should skip when the node has no cinder-volume units.""" + basic_client.cluster.get_node_info.return_value = { + "name": "node-0", + "role": "storage", + "machineid": "0", + } + + # cinder-volume app exists but units are on different machines + mock_app = Mock() + mock_unit = Mock() + mock_unit.machine = "99" + mock_app.units = {"cinder-volume/0": mock_unit} + basic_jhelper.get_application.return_value = mock_app + + step = make_step() + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED + + def test_proceeds_when_node_hosts_cinder_volume( + self, make_step, basic_client, basic_jhelper, step_context + ): + """Step should NOT skip when the node hosts a cinder-volume unit.""" + basic_client.cluster.get_node_info.return_value = { + "name": "node-0", + "role": "storage", + "machineid": "0", + } + + mock_app = Mock() + mock_unit = Mock() + mock_unit.machine = "0" + mock_app.units = {"cinder-volume/0": mock_unit} + basic_jhelper.get_application.return_value = mock_app + + step = make_step() + result = step.is_skip(step_context) + assert result.result_type == ResultType.COMPLETED + + def test_fails_when_last_storage_node_without_force( + self, make_step, basic_client, step_context + ): + """Removing the last storage node should fail without --force.""" + basic_client.cluster.list_nodes_by_role.return_value = [ + {"name": "node-0", "machineid": "0"} + ] + + step = make_step(force=False) + result = step.run(step_context) + assert result.result_type == ResultType.FAILED + assert "Cannot remove the last storage node" in result.message + + def test_succeeds_with_force_on_last_node( + self, make_step, basic_client, step_context + ): + """Removing the last storage node should succeed with --force.""" + basic_client.cluster.list_nodes_by_role.return_value = [ + {"name": "node-0", "machineid": "0"} + ] + + step = make_step(force=True) + result = step.run(step_context) + assert result.result_type == ResultType.COMPLETED + + def test_succeeds_when_multiple_storage_nodes( + self, make_step, basic_client, step_context + ): + """Removing a storage node should succeed when others remain.""" + basic_client.cluster.list_nodes_by_role.return_value = [ + {"name": "node-0", "machineid": "0"}, + {"name": "node-1", "machineid": "1"}, + ] + + step = make_step(force=False) + result = step.run(step_context) + assert result.result_type == ResultType.COMPLETED + + +class TestRemoveStorageMachineUnitsStep: + """Tests for RemoveStorageMachineUnitsStep.""" + + def test_inherits_remove_machine_units_step(self): + """Step should inherit from RemoveMachineUnitsStep.""" + from sunbeam.core.steps import RemoveMachineUnitsStep + + assert issubclass(RemoveStorageMachineUnitsStep, RemoveMachineUnitsStep) + + def test_constructor_sets_application(self, basic_client, basic_jhelper): + """Step should target cinder-volume application.""" + step = RemoveStorageMachineUnitsStep( + basic_client, "node-0", basic_jhelper, "openstack-machines" + ) + assert step.application == "cinder-volume" + + def test_unit_timeout(self, basic_client, basic_jhelper): + """Step should use 30-minute timeout.""" + step = RemoveStorageMachineUnitsStep( + basic_client, "node-0", basic_jhelper, "openstack-machines" + ) + assert step.get_unit_timeout() == 1800 + + def test_skips_when_cinder_volume_not_deployed( + self, basic_client, basic_jhelper, step_context + ): + """Step should skip when cinder-volume application does not exist.""" + basic_client.cluster.list_nodes.return_value = [ + {"name": "node-0", "machineid": "0"} + ] + basic_jhelper.get_application.side_effect = ApplicationNotFoundException( + "not found" + ) + + step = RemoveStorageMachineUnitsStep( + basic_client, "node-0", basic_jhelper, "openstack-machines" + ) + result = step.is_skip(step_context) + assert result.result_type == ResultType.SKIPPED diff --git a/sunbeam-python/tests/unit/sunbeam/test_terraform_configs.py b/sunbeam-python/tests/unit/sunbeam/test_terraform_configs.py new file mode 100644 index 000000000..5e138bf2f --- /dev/null +++ b/sunbeam-python/tests/unit/sunbeam/test_terraform_configs.py @@ -0,0 +1,29 @@ +# SPDX-FileCopyrightText: 2026 - Canonical Ltd +# SPDX-License-Identifier: Apache-2.0 + +from pathlib import Path + +REPO_ROOT = Path(__file__).parents[4] + + +def test_hypervisor_ceph_relation_has_moved_block(): + """The brownfield Ceph relation should move to the keyed generic resource.""" + tf_file = REPO_ROOT / "cloud/etc/deploy-openstack-hypervisor/main.tf" + tf_config = tf_file.read_text() + + assert "from = juju_integration.hypervisor-cinder-ceph[0]" in tf_config + assert ( + "to = juju_integration.hypervisor-extra-integration" + '["cinder-volume-ceph-ceph-access"]' + ) in tf_config + + +def test_storage_backend_secret_resources_have_moved_blocks(): + """Existing backend secrets should move to their counted addresses.""" + tf_file = REPO_ROOT / "cloud/etc/deploy-storage/modules/backend/main.tf" + tf_config = tf_file.read_text() + + assert "from = juju_secret.secret" in tf_config + assert "to = juju_secret.secret[0]" in tf_config + assert "from = juju_access_secret.secret-access" in tf_config + assert "to = juju_access_secret.secret-access[0]" in tf_config