[UR][SYCL] Triple buffer issue#21694
Conversation
5716585 to
da2337a
Compare
763e548 to
979efe0
Compare
d134ec2 to
77e781e
Compare
iclsrc
left a comment
There was a problem hiding this comment.
This PR addresses the triple-buffer problem (misaligned host pointer → SYCL shadow copy → UR staging buffer → three live copies) by deferring the shadow-copy decision until the target backend is known. For Level Zero, SYCL skips its shadow copy and lets the adapter manage the host pointer; for OpenCL/CUDA/HIP, SYCL materialises an aligned shadow copy as before.
The structural approach is sound, but there is a critical correctness gap for the Level Zero write-back path, a data race on MPendingShadowCopyAlignment, and an integer-overflow risk in the allocation size calculation. Details below.
| if (hostPtr) { | ||
| // Initial copy using Level Zero for USM HOST memory | ||
| auto hDevice = hContext->getDevices()[0]; | ||
| UR_CALL_THROWS( | ||
| synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size)); | ||
| // Store writeBackPtr for copy-back - needed when original pointer | ||
| // cannot be imported (e.g., misaligned, wrong allocation type) | ||
| writeBackPtr = hostPtr; | ||
| mapToPtr = hostPtr; |
There was a problem hiding this comment.
🔴 Critical: write-back to misaligned host pointer is silently dropped for Level Zero integrated GPU
mapToPtr is set to the original (potentially misaligned) host pointer, but writeBackPtr — which is what copyBackToHostIfNeeded() uses in the destructor — is never assigned in this constructor. After this PR:
- SYCL skips its
MUploadDataFunctorbecauseMBackendOwnsWriteBack = true. - UR skips copy-back in the destructor because
writeBackPtr == nullptr.
The mapToPtr path does copy data from the device USM buffer back to the original pointer, but only when mapHostPtr is called with UR_MAP_FLAG_READ (i.e., during an explicit host-side map/unmap). There is no call to this at buffer-destruction time, so kernel writes are silently lost at scope exit for misaligned host pointers on integrated GPUs.
Either writeBackPtr should also be set here (mirroring mapToPtr), or copyBackToHostIfNeeded should use mapToPtr for the final synchronous copy, e.g.:
| if (hostPtr) { | |
| // Initial copy using Level Zero for USM HOST memory | |
| auto hDevice = hContext->getDevices()[0]; | |
| UR_CALL_THROWS( | |
| synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size)); | |
| // Store writeBackPtr for copy-back - needed when original pointer | |
| // cannot be imported (e.g., misaligned, wrong allocation type) | |
| writeBackPtr = hostPtr; | |
| mapToPtr = hostPtr; | |
| if (hostPtr) { | |
| auto hDevice = hContext->getDevices()[0]; | |
| UR_CALL_THROWS( | |
| synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size)); | |
| mapToPtr = hostPtr; | |
| writeBackPtr = hostPtr; | |
| } |
The write-back test in the PR (runWriteKernel) will only catch this on integrated hardware; on discrete hardware the discrete-buffer path is taken and the bug would not manifest.
There was a problem hiding this comment.
Fixed: writeBackPtr = hostPtr is now set alongside mapToPtr so ~ur_integrated_buffer_handle_t copies device updates back even without explicit map/unmap.
| const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context); | ||
| if (BackendRequiredAlign > MPendingShadowCopyAlignment) | ||
| MPendingShadowCopyAlignment = BackendRequiredAlign; |
There was a problem hiding this comment.
🟡 Important: MPendingShadowCopyAlignment written outside the mutex — potential data race
MPendingShadowCopyAlignment is updated at line 274 before MCreateShadowCopyMtx is acquired (that happens at line 293). The same field is also written inside materializeShadowCopy() (line 202), which can be invoked concurrently from a lambda in MCreateShadowCopy on another thread.
Move the alignment update inside the lock, or capture the computed alignment in a local variable and only write the member after acquiring the mutex:
| const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context); | |
| if (BackendRequiredAlign > MPendingShadowCopyAlignment) | |
| MPendingShadowCopyAlignment = BackendRequiredAlign; | |
| const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context); | |
| std::lock_guard<std::mutex> Lock(MCreateShadowCopyMtx); | |
| if (!MHasPendingAlignedShadowCopy) | |
| return; | |
| if (BackendRequiredAlign > MPendingShadowCopyAlignment) | |
| MPendingShadowCopyAlignment = BackendRequiredAlign; |
(Remove the now-redundant second lock-acquisition at line 293 and the second MHasPendingAlignedShadowCopy check at line 294.)
There was a problem hiding this comment.
Fixed: MPendingShadowCopyAlignment is now updated under MCreateShadowCopyMtx, after the in-lock MHasPendingAlignedShadowCopy re-check.
| const size_t AllocBytes = | ||
| MSizeInBytes == 0 ? RequiredAlign | ||
| : ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) * | ||
| RequiredAlign; |
There was a problem hiding this comment.
🟡 Important: integer overflow when MSizeInBytes is close to SIZE_MAX
MSizeInBytes + RequiredAlign - 1 wraps when MSizeInBytes is near SIZE_MAX. The same pattern is used in the branch in sycl_mem_obj_t.hpp (line ~260) for the iterator-pair constructor.
Use an overflow-safe round-up:
| const size_t AllocBytes = | |
| MSizeInBytes == 0 ? RequiredAlign | |
| : ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) * | |
| RequiredAlign; | |
| const size_t AllocBytes = | |
| MSizeInBytes == 0 ? RequiredAlign | |
| : MSizeInBytes + (RequiredAlign - (MSizeInBytes % RequiredAlign)) % RequiredAlign; |
This avoids the intermediate MSizeInBytes + RequiredAlign - 1 sum that overflows.
There was a problem hiding this comment.
Fixed: switched to the overflow-safe form MSizeInBytes + (RequiredAlign - MSizeInBytes % RequiredAlign) % RequiredAlign.
| if (SkipShadowCopy) { | ||
| if (MShadowCopy != nullptr) { | ||
| // A writable host accessor already forced a SYCL shadow copy. Keep using | ||
| // that path so the final copy-back still targets the original user ptr. | ||
| return; |
There was a problem hiding this comment.
🟡 Important: MHasPendingAlignedShadowCopy is not cleared on the early-return path, leaving state inconsistent across reallocations
When MShadowCopy != nullptr the function returns without clearing MHasPendingAlignedShadowCopy. This means that on a subsequent allocation (e.g., the buffer migrates to a different context), prepareForAllocation is called again, re-enters the SkipShadowCopy branch, finds MShadowCopy != nullptr again, and returns — which is fine for Level Zero.
However, if the first allocation was Level Zero (sets MBackendOwnsWriteBack = true, clears MHasPendingAlignedShadowCopy), then a host accessor forces a shadow copy (MShadowCopy != nullptr), and a second allocation on a non-Level-Zero backend calls prepareForAllocation: MHasPendingAlignedShadowCopy == false → early return at line 263 → MBackendOwnsWriteBack stays true even though SYCL now has a valid shadow copy and should own write-back.
Consider clearing MHasPendingAlignedShadowCopy and explicitly setting MBackendOwnsWriteBack = false in this branch to reflect that SYCL retains ownership:
| if (SkipShadowCopy) { | |
| if (MShadowCopy != nullptr) { | |
| // A writable host accessor already forced a SYCL shadow copy. Keep using | |
| // that path so the final copy-back still targets the original user ptr. | |
| return; | |
| if (MShadowCopy != nullptr) { | |
| // A writable host accessor already forced a SYCL shadow copy. Keep using | |
| // that path so the final copy-back still targets the original user ptr. | |
| MBackendOwnsWriteBack = false; | |
| MHasPendingAlignedShadowCopy = false; | |
| return; | |
| } |
There was a problem hiding this comment.
Fixed: the MShadowCopy != nullptr early-return now sets MBackendOwnsWriteBack = false and clears MHasPendingAlignedShadowCopy.
| return MNeedWriteBack && MUploadDataFunctor && !MBackendOwnsWriteBack; | ||
| } | ||
|
|
||
| bool backendOwnsWriteBack() const { return MBackendOwnsWriteBack; } |
There was a problem hiding this comment.
🔵 Suggestion: backendOwnsWriteBack() is declared but has no callers outside SYCLMemObjT
grep finds no uses of backendOwnsWriteBack() in the rest of the source tree. If it is intended as a future hook or debug aid, a brief comment explaining its purpose would help; otherwise consider removing it to avoid exposing internal state unnecessarily.
There was a problem hiding this comment.
Removed backendOwnsWriteBack(); internal access goes through MBackendOwnsWriteBack directly.
There was a problem hiding this comment.
Why do we need a copy here?
There was a problem hiding this comment.
Removed; checkExpectedPattern now iterates Ptr directly.
There was a problem hiding this comment.
Is this internal scope needed? Wouldn't the function scope be enough?
This branch fixes the SYCL triple-buffer/shadow-copy ownership bug by deferring misaligned host-pointer shadow-copy decisions until allocation time (via a new prepareForAllocation hook), so backend-aware policy can choose whether SYCL creates the shadow copy (NativeCPU/OpenCL/offload) or UR owns staging and final copy-back (Level Zero/CUDA/HIP).
It also adds a new end-to-end regression test (buffer_shadow_copy_platform_policy.cpp) that validates backend-dependent allocation behavior for misaligned read-only buffers and verifies correct data copy-back for writable buffers, covering the original failure mode and guarding against regressions.