Skip to content

feat(bluefield): add bluefield compute driver#1899

Open
cheese-head wants to merge 14 commits into
NVIDIA:mainfrom
cheese-head:feat/bluefield-compute-driver-stack
Open

feat(bluefield): add bluefield compute driver#1899
cheese-head wants to merge 14 commits into
NVIDIA:mainfrom
cheese-head:feat/bluefield-compute-driver-stack

Conversation

@cheese-head

Copy link
Copy Markdown
Contributor

Summary

Adds the host-side BlueField compute driver stack for OpenShell: a kind-aware
network-function abstraction (bf-core), sysfs VF discovery and allocation
(bf-inventory), a VM lifecycle extension that claims a BlueField VF, binds it
to vfio-pci, passes it into a QEMU sandbox guest, and wires VF-backed egress
(bf-vm), plus the external driver binary (bf-driver). The contracts are
function-kind aware (Vf/Sf) so future SF, container, and Kubernetes
adapters can reuse the same discovery, allocation, and assignment layers.

Related Issue

Not linked.

Changes

  • Add bf-core contracts crate (kind-aware NetFunction/FunctionSlot/FunctionKind, lifecycle and runtime traits, and the label-based BluefieldAssignment contract) plus workspace wiring.
  • Add bf-inventory: sysfs host-VF and DPU-representor discovery, a static inventory for tests, and the FunctionPool claim/release allocator.
  • Add bf-vm: a BlueField lifecycle extension over the VM compute driver — per-sandbox VF claim, vfio-pci bind with restore-on-teardown, host-passthrough preflight gating, guest-egress env contract and init drop-in, host PF auto-resolution, and QEMU guest-kernel resolution.
  • Add bf-driver: the external openshell-driver-bluefield compute-driver binary the gateway spawns.
  • Extend openshell-driver-vm with a generic PCI passthrough resource seam so host PCI devices can be passed into the guest.
  • Generalize the VF-specific inventory/handle types into runtime-neutral network-function types (FunctionKind { Vf, Sf }; drop VM-centric guest_* field names) so SF/container/Kubernetes adapters can reuse the contracts.
  • Restructure the openshell-driver-bluefield README into a package-marker overview that links the bf-vm implementation guide.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Introduce the openshell-driver-bluefield package as a set of private
workspace member crates, starting with bf-core: the shared VF handle,
role, claim, and lifecycle contracts the rest of the BlueField driver
builds on. No behavior is wired into the gateway yet.
bf-inventory turns host sysfs into a set of claimable VF slots and owns
the per-sandbox claim/release bookkeeping (VfPool) used by the lifecycle
extension to hand out one VF per sandbox.
bf-vm plugs into the VM driver's lifecycle-extension seam. For each
sandbox it claims a VF, checks host passthrough readiness, binds the VF
to vfio-pci, persists the binding for restart recovery, and releases it
on launch failure or delete. It also selects the BlueField guest kernel
and wires the static guest-egress env contract.
bf-driver is the external compute driver process. It parses the
BlueField CLI/env surface, installs the bf-vm lifecycle extension for
the workload-running roles, and serves the ComputeDriver gRPC API over
an authenticated Unix socket (or unauthenticated TCP for local dev).
… seam

Introduce a generic guest-resource model on LaunchPlan so lifecycle
extensions can request host PCI passthrough without growing the shared
plan type per device class:

- lifecycle: LaunchPlan now carries an opaque `resources: Vec<GuestResource>`
  with an `add_resource` writer. `GuestResource::PciPassthrough` is the only
  variant today; new kinds (e.g. volume mounts) become new variants without
  touching the plan's shape or its constructors.
- runtime: render a `pcie-root-port` + `vfio-pci` pair per passthrough
  device and make the GPU device block optional, so a sandbox can carry a
  GPU plus one or more VF NICs at once.
- driver: relax the non-GPU QEMU guard when a concrete passthrough device
  backs the launch, and forward each device to the launched child via
  `--vm-pci-passthrough`. The PCI-specific projection lives in the driver
  layer (which renders QEMU), keeping LaunchPlan generic; the exhaustive
  match forces that site to be revisited when a variant is added.

Wire the BlueField VM extension onto the seam: declare the claimed VF as a
passthrough device in `configure_launch` (so the backend resolves to QEMU
and the guard sees a concrete device) and bind it in `before_launch`,
attaching it to the guest as an egress NIC alongside any GPU.
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
…nctions

Rename the VF-specific inventory/handle types into runtime-neutral
network-function types so SF and future container/Kubernetes adapters can
reuse the discovery, allocation, and assignment contracts:

- bf-core: VfRef -> NetFunction, VfSlot -> FunctionSlot (with a FunctionKind
  { Vf, Sf } discriminant and a generic `index`), drop VM-centric guest_*
  field names (guest_mac -> mac, guest_datapath_address -> datapath_address,
  AttachSpec guest_ip -> endpoint_ip). BluefieldAssignment carries `kind` and
  uses generalized label keys.
- bf-inventory: VfInventory -> FunctionInventory, VfPool -> FunctionPool,
  StaticVfInventory -> StaticFunctionInventory, VfError/VfResult ->
  InventoryError/InventoryResult. Sysfs VF/representor impls keep their
  kind-specific names.
- bf-vm: update all call sites; VFIO binding mechanism names retained.

Also restructure the bluefield README into a package-marker overview that
links the bf-vm implementation guide.
Signed-off-by: Patrick Riel <priel@nvidia.com>
@cheese-head cheese-head requested review from a team, derekwaynecarr and mrunalp as code owners June 13, 2026 21:41
@copy-pr-bot

copy-pr-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cheese-head cheese-head marked this pull request as draft June 13, 2026 21:50

@elezar elezar left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment. How would a user indicate intent here? That is to say, how would they indicate that they need access to a BlueField device in their sandbox? Or is this something that is not a user concern, but rather decided by an admin/operator when deploying the gateway?

@cheese-head

Copy link
Copy Markdown
Contributor Author

Hi @elezar, I'd imagine this would be an admin or operator's role. From an end user's perspective, they would just be requesting a sandbox without needing to know that storage, networking, etc., are backed by a BlueField device.

Of course, if the agent needed to, let's say, have an RDMA-capable NIC, they'd be able to request it, and the BlueField sandbox compute driver would be able to provide that as a resource to the sandbox.

@elezar

elezar commented Jun 15, 2026

Copy link
Copy Markdown
Member

Hi @elezar, I'd imagine this would be an admin or operator's role. From an end user's perspective, they would just be requesting a sandbox without needing to know that storage, networking, etc., are backed by a BlueField device.

Ok. For this there would be no UX changes needed, and this is just starting the relevant driver.

Of course, if the agent needed to, let's say, have an RDMA-capable NIC, they'd be able to request it, and the BlueField sandbox compute driver would be able to provide that as a resource to the sandbox.

How would we expect an agent (or user) signal this requirement? Note that we have an RFC for adding explicit resource requirements to a sandbox request #1360 -- extending the current situation where GPUs are the only interesting resources. #1812 already goes some way in updating the proto shape, but doesn't do anything around discovering whether a driver supports this.

@cheese-head

Copy link
Copy Markdown
Contributor Author

For the RDMA example, RFC 0004 is the right direction for how a user requests the resource when creating a sandbox.

Compute drivers would still need a mechanism to advertise capabilities or resources.

@cheese-head cheese-head marked this pull request as ready for review June 15, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants