Skip to content

Possible race condition in az vm disk attach/detach #19568

@akx

Description

@akx

Describe the bug

I haven't verified this bug in action, but reading the code

# region VirtualMachines Disks (Managed)
def attach_managed_data_disk(cmd, resource_group_name, vm_name, disk, new=False, sku=None,
size_gb=1023, lun=None, caching=None, enable_write_accelerator=False):
'''attach a managed disk'''
from msrestazure.tools import parse_resource_id
vm = get_vm_to_update(cmd, resource_group_name, vm_name)
DataDisk, ManagedDiskParameters, DiskCreateOption = cmd.get_models(
'DataDisk', 'ManagedDiskParameters', 'DiskCreateOptionTypes')
# pylint: disable=no-member
if lun is None:
lun = _get_disk_lun(vm.storage_profile.data_disks)
if new:
data_disk = DataDisk(lun=lun, create_option=DiskCreateOption.empty,
name=parse_resource_id(disk)['name'],
disk_size_gb=size_gb, caching=caching,
managed_disk=ManagedDiskParameters(storage_account_type=sku))
else:
params = ManagedDiskParameters(id=disk, storage_account_type=sku)
data_disk = DataDisk(lun=lun, create_option=DiskCreateOption.attach, managed_disk=params, caching=caching)
if enable_write_accelerator:
data_disk.write_accelerator_enabled = enable_write_accelerator
vm.storage_profile.data_disks.append(data_disk)
set_vm(cmd, vm)
def detach_data_disk(cmd, resource_group_name, vm_name, disk_name):
# here we handle both unmanaged or managed disk
vm = get_vm_to_update(cmd, resource_group_name, vm_name)
# pylint: disable=no-member
leftovers = [d for d in vm.storage_profile.data_disks if d.name.lower() != disk_name.lower()]
if len(vm.storage_profile.data_disks) == len(leftovers):
raise CLIError("No disk with the name '{}' was found".format(disk_name))
vm.storage_profile.data_disks = leftovers
set_vm(cmd, vm)
# endregion
it looks like it's possible for az vm disk attach / az vm disk detach to attach or detach more disks than requested, if it is being called concurrently targeting the same VM.

Consider this (poorly formatted) concurrency diagram with two az vm attaches:

Client 1                  Client 2
====================      ====================
$ az vm attach DISKX        
get vm info
disks: []                 $ az vm attach DISKZ
add data disk DISKX       get vm info
disks: [DISKX]            disks: []
update vm with [DISKX]    add data disk DISKZ
                          disks: [DISKZ]
                          update vm with [DISKZ]

Unless there's a resource version involved in the VM model retrieved from the API, my intuition says this could end up in a situation where the VM only has DISKZ attached when the intended result is that both DISKX and DISKZ get attached.

The same applies to az vm detach, where the CLI filters its local view of the data disk list to exclude the disk to be detached.

To Reproduce

As said above, I haven't reproduced this in action, but if it is a real bug, it could be reproed with two carefully timed az vm attaches.

Expected behavior

Disk attaches and detaches are atomic.

Environment summary

n/a

Additional context

I came across this when looking at how az vm disk attach was implemented, since to my surprise there isn't a direct attach/detach call for disks in https://docs.microsoft.com/en-us/rest/api/compute/disks ...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions