Skip to content

viona: multiqueue device should stay multiqueue across migration#1121

Open
iximeow wants to merge 6 commits intomasterfrom
ixi/viona-import-usepairs
Open

viona: multiqueue device should stay multiqueue across migration#1121
iximeow wants to merge 6 commits intomasterfrom
ixi/viona-import-usepairs

Conversation

@iximeow
Copy link
Copy Markdown
Member

@iximeow iximeow commented Apr 21, 2026

We correctly export and import the propolis-side state given a multiqueue VirtIO device, but we did not communicate that through to viona. On import the virtio-nic device has only told viona it will use one queue pair; we've skipped the "normal" set_features() in favor of setting features on the handle directly. Setting an imported multi-queue device running at this point will reset the many queues, but viona rings are not in a resettable state, fail to reset, and set the device to NEEDS_RESET immediately.

Communicating the correct number of queue pairs to viona is a clear improvement, but we're not quite out of bugs yet..

We correctly export and import the propolis-side state given a
multiqueue VirtIO device, but we did not communicate that through to
viona. On import the virtio-nic device has only told viona it will use
one queue pair; we've skipped the "normal" set_features() in favor of
setting features on the handle directly. Setting an imported multi-queue
device running at this point will reset the many queues, but viona rings
are not in a resettable state, fail to reset, and set the device to
NEEDS_RESET immediately.

Communicating the correct number of queue pairs to viona is a clear
improvement, but we're not quite out of bugs yet..
@iximeow iximeow added bug Something that isn't working. networking Related to networking devices/backends. migration Issues related to live migration. labels Apr 21, 2026
@iximeow
Copy link
Copy Markdown
Member Author

iximeow commented Apr 23, 2026

between 70662f7 and 0b0566f I noticed an exciting issue very much like #1045: rebooting a guest would cause the PCI device to disappear. since peak wasn't retained across VirtQueues export/import, we'd assume any previously-initialized queues beyond len were never touched and not reset them. the guest then sees them as they were at import - enabled - after a "device reset" and rightfully refuses to operate the accursed device.

this would have been, if everything else was in a happy state, a bug introduced by #1047...

Comment on lines +1555 to +1556
delete_vnic(&vnic_name);
create_vnic(&underlying_nic, &vnic_name);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incidental but seems worth keeping: I was trying to chase out any sources of anything potentially sticking around across the "migrated" VMs, and the test vnic itself "could" stick around but really shouldn't.

the dladm commands won't (shouldn't?) block or be blocked by test operations, so I don't super love blocking the runtime like this but I'm also not worried about it..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something that isn't working. migration Issues related to live migration. networking Related to networking devices/backends.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant