feat(bluefield): add bluefield compute driver#1899
Conversation
Introduce the openshell-driver-bluefield package as a set of private workspace member crates, starting with bf-core: the shared VF handle, role, claim, and lifecycle contracts the rest of the BlueField driver builds on. No behavior is wired into the gateway yet.
bf-inventory turns host sysfs into a set of claimable VF slots and owns the per-sandbox claim/release bookkeeping (VfPool) used by the lifecycle extension to hand out one VF per sandbox.
bf-vm plugs into the VM driver's lifecycle-extension seam. For each sandbox it claims a VF, checks host passthrough readiness, binds the VF to vfio-pci, persists the binding for restart recovery, and releases it on launch failure or delete. It also selects the BlueField guest kernel and wires the static guest-egress env contract.
bf-driver is the external compute driver process. It parses the BlueField CLI/env surface, installs the bf-vm lifecycle extension for the workload-running roles, and serves the ComputeDriver gRPC API over an authenticated Unix socket (or unauthenticated TCP for local dev).
… seam Introduce a generic guest-resource model on LaunchPlan so lifecycle extensions can request host PCI passthrough without growing the shared plan type per device class: - lifecycle: LaunchPlan now carries an opaque `resources: Vec<GuestResource>` with an `add_resource` writer. `GuestResource::PciPassthrough` is the only variant today; new kinds (e.g. volume mounts) become new variants without touching the plan's shape or its constructors. - runtime: render a `pcie-root-port` + `vfio-pci` pair per passthrough device and make the GPU device block optional, so a sandbox can carry a GPU plus one or more VF NICs at once. - driver: relax the non-GPU QEMU guard when a concrete passthrough device backs the launch, and forward each device to the launched child via `--vm-pci-passthrough`. The PCI-specific projection lives in the driver layer (which renders QEMU), keeping LaunchPlan generic; the exhaustive match forces that site to be revisited when a variant is added. Wire the BlueField VM extension onto the seam: declare the claimed VF as a passthrough device in `configure_launch` (so the backend resolves to QEMU and the guard sees a concrete device) and bind it in `before_launch`, attaching it to the guest as an egress NIC alongside any GPU.
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
…nctions
Rename the VF-specific inventory/handle types into runtime-neutral
network-function types so SF and future container/Kubernetes adapters can
reuse the discovery, allocation, and assignment contracts:
- bf-core: VfRef -> NetFunction, VfSlot -> FunctionSlot (with a FunctionKind
{ Vf, Sf } discriminant and a generic `index`), drop VM-centric guest_*
field names (guest_mac -> mac, guest_datapath_address -> datapath_address,
AttachSpec guest_ip -> endpoint_ip). BluefieldAssignment carries `kind` and
uses generalized label keys.
- bf-inventory: VfInventory -> FunctionInventory, VfPool -> FunctionPool,
StaticVfInventory -> StaticFunctionInventory, VfError/VfResult ->
InventoryError/InventoryResult. Sysfs VF/representor impls keep their
kind-specific names.
- bf-vm: update all call sites; VFIO binding mechanism names retained.
Also restructure the bluefield README into a package-marker overview that
links the bf-vm implementation guide.
Signed-off-by: Patrick Riel <priel@nvidia.com>
elezar
left a comment
There was a problem hiding this comment.
As a general comment. How would a user indicate intent here? That is to say, how would they indicate that they need access to a BlueField device in their sandbox? Or is this something that is not a user concern, but rather decided by an admin/operator when deploying the gateway?
|
Hi @elezar, I'd imagine this would be an admin or operator's role. From an end user's perspective, they would just be requesting a sandbox without needing to know that storage, networking, etc., are backed by a BlueField device. Of course, if the agent needed to, let's say, have an RDMA-capable NIC, they'd be able to request it, and the BlueField sandbox compute driver would be able to provide that as a resource to the sandbox. |
Ok. For this there would be no UX changes needed, and this is just starting the relevant driver.
How would we expect an agent (or user) signal this requirement? Note that we have an RFC for adding explicit resource requirements to a sandbox request #1360 -- extending the current situation where GPUs are the only interesting resources. #1812 already goes some way in updating the proto shape, but doesn't do anything around discovering whether a driver supports this. |
|
For the RDMA example, RFC 0004 is the right direction for how a user requests the resource when creating a sandbox. Compute drivers would still need a mechanism to advertise capabilities or resources. |
Summary
Adds the host-side BlueField compute driver stack for OpenShell: a kind-aware
network-function abstraction (
bf-core), sysfs VF discovery and allocation(
bf-inventory), a VM lifecycle extension that claims a BlueField VF, binds itto
vfio-pci, passes it into a QEMU sandbox guest, and wires VF-backed egress(
bf-vm), plus the external driver binary (bf-driver). The contracts arefunction-kind aware (
Vf/Sf) so future SF, container, and Kubernetesadapters can reuse the same discovery, allocation, and assignment layers.
Related Issue
Not linked.
Changes
bf-corecontracts crate (kind-awareNetFunction/FunctionSlot/FunctionKind, lifecycle and runtime traits, and the label-basedBluefieldAssignmentcontract) plus workspace wiring.bf-inventory: sysfs host-VF and DPU-representor discovery, a static inventory for tests, and theFunctionPoolclaim/release allocator.bf-vm: a BlueField lifecycle extension over the VM compute driver — per-sandbox VF claim,vfio-pcibind with restore-on-teardown, host-passthrough preflight gating, guest-egress env contract and init drop-in, host PF auto-resolution, and QEMU guest-kernel resolution.bf-driver: the externalopenshell-driver-bluefieldcompute-driver binary the gateway spawns.openshell-driver-vmwith a generic PCI passthrough resource seam so host PCI devices can be passed into the guest.FunctionKind { Vf, Sf }; drop VM-centricguest_*field names) so SF/container/Kubernetes adapters can reuse the contracts.openshell-driver-bluefieldREADME into a package-marker overview that links thebf-vmimplementation guide.Testing
mise run pre-commitpassesChecklist