-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Decision Goal
Proceed with development of node-service with updates to existing services to support boot profiles.
Category
Architecture
Stakeholders / Affected Areas
No response
Decision Needed By
No response
Problem Statement
Summary
We will introduce a profile dimension across OpenCHAMI’s config/boot surfaces so admins can GitOps-manage multiple versions of cluster configuration (e.g., per-branch “profiles”) and have nodes request either the default or a named profile via ?profile=<name>.
We will implement this in phases by:
- extending metadata-service and boot-service to store/serve profile-scoped resources and to default cleanly when
profileis absent, - creating a node-service (node shim) that presents a coherent Node API over SMD + boot-service + metadata-service and owns NodeSet resolution, and
- adding ProfileBinding (+ optional Campaign) to decouple “config intent” from SMD group membership while preserving compatibility.
Problem statement
Today, SMD is the de facto source of node identity and (often) group membership, while metadata-service holds richer configuration context (templates, precedence, composition) and boot services hold boot parameters. SMD groups/partitions are useful inventory primitives, but using them as the primary “config grouping” creates friction when admins want:
- multiple versions of config (branch profiles),
- clear config precedence and composition,
- temporary or targeted overrides without mutating inventory group membership,
- a single “Node API” view that explains what a node will receive at boot.
SMD explicitly supports grouping/partitioning, and that’s valuable for inventory/tenancy, but it is not the same thing as configuration intent. ([OpenCHAMI]1)
Goals
-
Add a profile selector to the boot/config request path:
- no
profile→ serve default ?profile=name→ serve named profile (if present; fallback policy is configurable)
- no
-
Keep Git integration out of band: external GitOps tooling creates/updates profile-scoped resources via APIs.
-
Provide a node-centric API surface (node-service) that composes:
- inventory (SMD),
- boot parameters (boot-service),
- cloud-init metadata composition (metadata-service).
-
Introduce a clean mechanism to bind nodes (or NodeSets) to profiles without relying on mutating SMD group membership for config purposes.
Non-goals (initially)
- Embedding Git operations (clone/pull/webhooks) into OpenCHAMI services.
- Full-featured health checks/agent acknowledgements for rollout gates (can be added later).
- Replacing SMD groups/partitions; they remain an inventory/tenancy primitive.
Background and current capabilities
- metadata-service is a Fabrica-based cloud-init metadata server that serves NoCloud(-Net) metadata endpoints, renders group configs from templates, and integrates with SMD (or mock clients). ([GitHub]2)
- OpenCHAMI’s cloud-init approach relies on group-based configuration composition (multiple groups contribute config to a node). ([OpenCHAMI]3)
- boot-service is Fabrica-based and provides a resource-oriented API, including legacy compatibility behavior under
/boot/v1intended to be BSS-like. ([GitHub]4) - BSS exists to deliver per-node kernel/initrd/flags based on inventory needs. ([OpenCHAMI]5)
Proposed Solution
Core concept: Profile namespace
A profile name becomes a first-class key used by both metadata-service and boot-service.
-
profile=defaultis implied when absent. -
Profile-scoped resources are stored and served as:
- metadata-service:
(profile, group),(profile, clusterdefaults),(profile, instanceinfo) - boot-service:
(profile, bootprofile)and per-node “effective boot params” resolution
- metadata-service:
-
Request-time selection:
- metadata-service NoCloud(-Net):
.../meta-data?profile=<name>(and same for user-data/network-config as applicable) - boot-service legacy endpoints and/or new resource endpoints:
...?profile=<name>
- metadata-service NoCloud(-Net):
Default behavior requirement: if profile is not specified, return the default profile.
Node-service (node shim)
A new Fabrica-based node-service presents a coherent Node API by composing:
- SMD inventory + labels/partitions/groups (inventory view) ([GitHub]6)
- metadata-service effective config (config view) ([GitHub]2)
- boot-service effective boot parameters (boot view) ([OpenCHAMI]7)
Node-service is not on the hot path for node boots; it is an admin/operator API.
Fabrica resources and ownership
1) Node (node-service)
A composed view:
- identity (xname), role/subrole, labels, inventory attributes (from SMD)
- effective profile (default or bound)
- effective config groups and rendered artifacts (via metadata-service)
- effective boot parameters (via boot-service)
2) NodeSet (node-service)
A reusable selector + resolution + (optional) leasing primitive.
- Inputs: label selector, explicit xnames, count/percent, partitions, etc.
- Output: resolved xname list and conflict/lease info.
3) BootProfile (boot-service)
Authoritative boot configuration object(s), profile-scoped:
- kernel/initrd/kargs/image/script fragments as needed
- referenced by name from bindings/campaigns
boot-service remains the system-of-record for boot configuration. ([GitHub]4)
4) ChangeBundle (optional: separate service or external controller concept)
A packaging concept for GitOps tooling, not necessarily a runtime OpenCHAMI resource at first.
-
A ChangeBundle maps to a set of concrete API writes into:
We can add a Fabrica ChangeBundle API later if we want OpenCHAMI-native validation/status tracking of multi-service changes.
5) ProfileBinding (recommended, split-write-through)
This is the key to addressing the “SMD groups feel wrong” concern without scope-creep:
Purpose: Bind nodes/NodeSets to a profile, and optionally to specific config/boot targets, without overloading SMD group membership.
- Exposed via node-service for operator UX.
- Materialized into metadata-service and boot-service storage so those services can resolve
profilelocally during requests (no runtime hop to node-service).
6) Campaign (node-service; phase-gated)
Rollout orchestration:
- canary/batches/concurrency/approval/expiration
- implementation writes ProfileBindings (and optionally boot profile refs) for allocated nodes
Campaign is optional in the earliest phase; the profile mechanism stands on its own.
Addressing SMD group discomfort: scope vs necessity
This is not scope-creep; it’s boundary clarification.
- SMD groups/partitions are valuable inventory/tenancy primitives. ([OpenCHAMI]1)
- metadata-service groups carry richer configuration semantics and composition (multiple groups contribute config). ([OpenCHAMI]3)
Proposal: keep SMD groups as “InventoryGroups,” but introduce ProfileBinding as the “ConfigIntent” mechanism. Node-service exposes both views so admins can see what’s inventory grouping vs what’s configuration composition.
Backward compatibility remains: metadata-service can continue to use SMD group membership for the default profile while we migrate to bindings.
Phased implementation plan
Phase 0: Document conventions and profile contract
- Define
profilesemantics and default fallback policy across services. - Define naming constraints (DNS label-ish recommended) and max length.
- Define versioning approach for profile-scoped resources (etag/observedGeneration).
Phase 1: metadata-service profile support
Changes
-
Add optional
profilequery parameter to NoCloud(-Net) endpoints; default when absent. ([GitHub]2) -
Update admin APIs for groups/defaults/instance overrides to be profile-scoped:
- either by URL prefix (
/admin/profiles/{profile}/groups/...) or query/header
- either by URL prefix (
-
Storage key includes
profile. -
Maintain current behavior as
profile=default.
Acceptance criteria
- Existing clients with no profile continue to work unchanged.
- Profile-scoped config can be created and retrieved.
- Rendering works for multiple profiles.
Phase 2: boot-service profile support + BootProfile resource hardening
Changes
- Add optional
profileto boot endpoints (legacy/boot/v1and/or new resources), default when absent. ([OpenCHAMI]7) - Ensure BootProfile resources are profile-scoped and can be referenced deterministically.
- Maintain legacy compatibility behavior (BSS-like) while adding profile dimension. ([OpenCHAMI]7)
Acceptance criteria
- Existing boot flows work without profile.
- Nodes can request boot params for a non-default profile.
Phase 3: Create node-service (node shim) with Node + NodeSet
Changes
-
Implement node-service with:
Nodecomposed view (SMD + metadata-service + boot-service)NodeSetselectors and resolution
-
NodeSet can initially resolve from SMD inventory attributes/labels/partitions. ([GitHub]6)
-
Add leasing (optional) so NodeSets/Campaigns can allocate without conflict.
Acceptance criteria
- Admins can query “effective node state” (inventory + boot + config references) in one place.
- NodeSets resolve deterministically and provide explicit node lists.
Phase 4: ProfileBinding (decouple config intent from SMD groups)
Changes
-
Add a ProfileBinding API in node-service:
- bind
NodeorNodeSet→profile(+ optional bootprofile/config group overrides)
- bind
-
Implement write-through materialization:
- node-service writes binding records into metadata-service and boot-service so each can resolve locally at request time.
-
Update metadata-service/boot-service request handlers:
profilequery param wins if explicitly provided- else apply binding-derived profile
- else default
Acceptance criteria
- Admin can bind a NodeSet to a profile without changing SMD group membership.
- Nodes with no
?profilestill get default unless bound by policy. - Boot/config request path does not depend on node-service availability.
Phase 5: Campaign (optional but aligned with earlier work)
Changes
-
Add Campaign resource in node-service that:
- allocates nodes from NodeSet
- applies ProfileBindings for those nodes
- supports expiration (auto-revert), approval gates, batching
Acceptance criteria
- Canary rollout by binding 2 nodes to a profile, then expanding.
- Automatic expiration removes bindings and returns nodes to default.
API sketch (illustrative, not final)
metadata-service
GET /cloud-init/meta-data?profile=...GET /cloud-init/user-data?profile=...PUT /cloud-init/admin/profiles/{profile}/groups/{name}PUT /cloud-init/admin/profiles/{profile}/clusterdefaults- (optional) binding materialization endpoints if metadata-service owns binding storage
boot-service
GET /boot/v1/bootparameters?profile=...(legacy-compatible shape) ([OpenCHAMI]7)PUT /boot/admin/profiles/{profile}/bootprofiles/{name}- (optional) binding materialization endpoints if boot-service owns binding storage
node-service
GET /nodes/{xname}(composed view)PUT /nodesets/{name}(selector + policy)GET /nodesets/{name}/resolvedPUT /profilebindings/{name}(bind Node/NodeSet → profile)- (phase 5)
PUT /campaigns/{name}
Alternatives Considered
No response
Other Considerations
Risks and mitigations
- Ambiguous precedence (explicit
?profilevs bindings vs default): define and document a strict order (explicit param > binding > default). - Config duplication across profiles: GitOps controllers can manage; later add inheritance (
inherits: default) if needed. - Too much logic in node-service: keep node-service off hot paths; write-through to backends for runtime serving.
- Back-compat: default profile preserves current behavior; legacy
/boot/v1remains usable. ([OpenCHAMI]7)
Open questions (to resolve in implementation)
- Where should binding records physically live (metadata-service vs boot-service vs separate small binding store)? This RFD recommends materializing into both for local resolution.
- Do we want “unknown profile” to fall back to default or error (configurable per deployment)?
- Should SMD labels be the primary selector input for NodeSets, or should we introduce higher-level logical labels in node-service?
Related Docs / PRs
Metadata
Metadata
Assignees
Labels
Type
Projects
Status