Skip to content

Define an authoritative instance-type resource catalog (single source of truth for vCPU/memory) #137

Description

@scotwells

Problem

A Datum compute instance type (e.g. datumcloud/d1-standard-2) advertises a size, but the platform has no authoritative definition of how much vCPU and memory that type actually provides. Today the size is only implied by a name-to-machine-type mapping (infra-provider-gcp maps datumcloud/d1-standard-2 → GCP n2-standard-2); there is no explicit vCPU/memory declared anywhere a service or a human can read.

To make quota accounting and real instance sizing correct, the vCPU/memory values now have to live somewhere — and as an interim step they were added in two places: the compute Instance controller (for the quota claim) and the unikraft-provider (for sizing the running Pod). Both hardcode the same numbers, kept in sync only by a comment. They will drift.

Why it matters

  • A customer who selects d1-standard-2 should get exactly that much CPU/memory on their instance, and have exactly that much counted against their quota. Those two numbers must agree and must be correct — today they're two independent copies that only happen to match.
  • Adding or changing an instance type should be one reviewable definition, not N copies spread across compute and every infra provider. A missed edit silently mis-sizes instances or mis-charges quota — exactly the kind of bug that's invisible until a customer is over/under-billed or under-provisioned.
  • The "size" of an instance type is a core part of the product contract, yet it isn't defined anywhere authoritative.

Desired outcome

A single authoritative instance-type catalog — vCPU + memory (with room for future dimensions) keyed by instance type — defined once (e.g. exported from compute/api/v1alpha) and consumed by every place that needs it:

  • the quota claim path (compute Instance controller),
  • instance/Pod sizing (unikraft-provider, and any future runtime provider),
  • the infra-provider machine-type mapping (which should reference the catalog rather than re-imply the size via a cloud machine-type name).

Current state / scope

  • Interim implementation in place: instanceTypeCatalog in compute (quota) and ukcInstanceTypeCatalog in unikraft-provider (Pod sizing), both datumcloud/d1-standard-2 = 1 vCPU / 2 GiB, synced only by a cross-reference comment.
  • Consolidating to a single exported catalog requires the unikraft-provider to repin its compute dependency to a version that exports it (a small, separate change).
  • Only one instance type exists today (datumcloud/d1-standard-2 — validation rejects others), so the blast radius is small right now. Best to establish the single source of truth before more instance types are added and the duplication multiplies.

Future direction: custom / advanced sizes

The catalog above covers predefined, named instance types — the common case. We also need to brainstorm how to support custom sizes for consumers with more advanced resource needs: specifying explicit CPU/memory (and potentially other dimensions) outside the fixed menu of named types.

Open questions to work through:

  • API surface — how a custom size is expressed: explicit per-container/instance resource requests, a dedicated "custom" instance type, or both. (Validation currently allows only the single named type, and the controller + provider already have an explicit-resources path that custom sizing could build on.)
  • Quota — how a custom size is accounted, using the same resolved value that feeds the claim so it can't drift from what runs.
  • Runtime providers — how each provider honors a custom size (the unikraft-provider already prefers explicit container limits over the catalog, so the hook exists).
  • Guardrails — min/max bounds, allowed granularity, and which providers/regions can actually satisfy a given custom size.

This doesn't block the single-source catalog work, but the catalog design should leave room for custom sizes rather than assume a closed set of named types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions