[UR][SYCL] Triple buffer issue by mateuszpn · Pull Request #21694 · intel/llvm

mateuszpn · 2026-04-07T15:30:23Z

This branch fixes the SYCL triple-buffer/shadow-copy ownership bug by deferring misaligned host-pointer shadow-copy decisions until allocation time (via a new prepareForAllocation hook), so backend-aware policy can choose whether SYCL creates the shadow copy (NativeCPU/OpenCL/offload) or UR owns staging and final copy-back (Level Zero/CUDA/HIP).

It also adds a new end-to-end regression test (buffer_shadow_copy_platform_policy.cpp) that validates backend-dependent allocation behavior for misaligned read-only buffers and verifies correct data copy-back for writable buffers, covering the original failure mode and guarding against regressions.

iclsrc

This PR addresses the triple-buffer problem (misaligned host pointer → SYCL shadow copy → UR staging buffer → three live copies) by deferring the shadow-copy decision until the target backend is known. For Level Zero, SYCL skips its shadow copy and lets the adapter manage the host pointer; for OpenCL/CUDA/HIP, SYCL materialises an aligned shadow copy as before.

The structural approach is sound, but there is a critical correctness gap for the Level Zero write-back path, a data race on MPendingShadowCopyAlignment, and an integer-overflow risk in the allocation size calculation. Details below.

iclsrc · 2026-05-14T10:27:23Z

    if (hostPtr) {
-      // Initial copy using Level Zero for USM HOST memory
      auto hDevice = hContext->getDevices()[0];
      UR_CALL_THROWS(
          synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));
-      // Store writeBackPtr for copy-back - needed when original pointer
-      // cannot be imported (e.g., misaligned, wrong allocation type)
-      writeBackPtr = hostPtr;
+      mapToPtr = hostPtr;


🔴 Critical: write-back to misaligned host pointer is silently dropped for Level Zero integrated GPU

mapToPtr is set to the original (potentially misaligned) host pointer, but writeBackPtr — which is what copyBackToHostIfNeeded() uses in the destructor — is never assigned in this constructor. After this PR:

SYCL skips its MUploadDataFunctor because MBackendOwnsWriteBack = true.

UR skips copy-back in the destructor because writeBackPtr == nullptr.

The mapToPtr path does copy data from the device USM buffer back to the original pointer, but only when mapHostPtr is called with UR_MAP_FLAG_READ (i.e., during an explicit host-side map/unmap). There is no call to this at buffer-destruction time, so kernel writes are silently lost at scope exit for misaligned host pointers on integrated GPUs.

Either writeBackPtr should also be set here (mirroring mapToPtr), or copyBackToHostIfNeeded should use mapToPtr for the final synchronous copy, e.g.:

Suggested change

if (hostPtr) {

// Initial copy using Level Zero for USM HOST memory

auto hDevice = hContext->getDevices()[0];

UR_CALL_THROWS(

synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));

// Store writeBackPtr for copy-back - needed when original pointer

// cannot be imported (e.g., misaligned, wrong allocation type)

writeBackPtr = hostPtr;

mapToPtr = hostPtr;

if (hostPtr) {

auto hDevice = hContext->getDevices()[0];

UR_CALL_THROWS(

synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));

mapToPtr = hostPtr;

writeBackPtr = hostPtr;

}

The write-back test in the PR (runWriteKernel) will only catch this on integrated hardware; on discrete hardware the discrete-buffer path is taken and the bug would not manifest.

Fixed: writeBackPtr = hostPtr is now set alongside mapToPtr so ~ur_integrated_buffer_handle_t copies device updates back even without explicit map/unmap.

iclsrc · 2026-05-14T10:27:24Z

+  const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);
+  if (BackendRequiredAlign > MPendingShadowCopyAlignment)
+    MPendingShadowCopyAlignment = BackendRequiredAlign;


🟡 Important: MPendingShadowCopyAlignment written outside the mutex — potential data race

MPendingShadowCopyAlignment is updated at line 274 before MCreateShadowCopyMtx is acquired (that happens at line 293). The same field is also written inside materializeShadowCopy() (line 202), which can be invoked concurrently from a lambda in MCreateShadowCopy on another thread.

Move the alignment update inside the lock, or capture the computed alignment in a local variable and only write the member after acquiring the mutex:

Suggested change

const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);

if (BackendRequiredAlign > MPendingShadowCopyAlignment)

MPendingShadowCopyAlignment = BackendRequiredAlign;

const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);

std::lock_guard<std::mutex> Lock(MCreateShadowCopyMtx);

if (!MHasPendingAlignedShadowCopy)

return;

if (BackendRequiredAlign > MPendingShadowCopyAlignment)

MPendingShadowCopyAlignment = BackendRequiredAlign;

(Remove the now-redundant second lock-acquisition at line 293 and the second MHasPendingAlignedShadowCopy check at line 294.)

Fixed: MPendingShadowCopyAlignment is now updated under MCreateShadowCopyMtx, after the in-lock MHasPendingAlignedShadowCopy re-check.

iclsrc · 2026-05-14T10:27:24Z

+  const size_t AllocBytes =
+      MSizeInBytes == 0 ? RequiredAlign
+                        : ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) *
+                              RequiredAlign;


🟡 Important: integer overflow when MSizeInBytes is close to SIZE_MAX

MSizeInBytes + RequiredAlign - 1 wraps when MSizeInBytes is near SIZE_MAX. The same pattern is used in the branch in sycl_mem_obj_t.hpp (line ~260) for the iterator-pair constructor.

Use an overflow-safe round-up:

Suggested change

const size_t AllocBytes =

MSizeInBytes == 0 ? RequiredAlign

: ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) *

RequiredAlign;

const size_t AllocBytes =

MSizeInBytes == 0 ? RequiredAlign

: MSizeInBytes + (RequiredAlign - (MSizeInBytes % RequiredAlign)) % RequiredAlign;

This avoids the intermediate MSizeInBytes + RequiredAlign - 1 sum that overflows.

Fixed: switched to the overflow-safe form MSizeInBytes + (RequiredAlign - MSizeInBytes % RequiredAlign) % RequiredAlign.

iclsrc · 2026-05-14T10:27:24Z

+  if (SkipShadowCopy) {
+    if (MShadowCopy != nullptr) {
+      // A writable host accessor already forced a SYCL shadow copy. Keep using
+      // that path so the final copy-back still targets the original user ptr.
+      return;


🟡 Important: MHasPendingAlignedShadowCopy is not cleared on the early-return path, leaving state inconsistent across reallocations

When MShadowCopy != nullptr the function returns without clearing MHasPendingAlignedShadowCopy. This means that on a subsequent allocation (e.g., the buffer migrates to a different context), prepareForAllocation is called again, re-enters the SkipShadowCopy branch, finds MShadowCopy != nullptr again, and returns — which is fine for Level Zero.

However, if the first allocation was Level Zero (sets MBackendOwnsWriteBack = true, clears MHasPendingAlignedShadowCopy), then a host accessor forces a shadow copy (MShadowCopy != nullptr), and a second allocation on a non-Level-Zero backend calls prepareForAllocation: MHasPendingAlignedShadowCopy == false → early return at line 263 → MBackendOwnsWriteBack stays true even though SYCL now has a valid shadow copy and should own write-back.

Consider clearing MHasPendingAlignedShadowCopy and explicitly setting MBackendOwnsWriteBack = false in this branch to reflect that SYCL retains ownership:

Suggested change

if (SkipShadowCopy) {

if (MShadowCopy != nullptr) {

// A writable host accessor already forced a SYCL shadow copy. Keep using

// that path so the final copy-back still targets the original user ptr.

return;

if (MShadowCopy != nullptr) {

// A writable host accessor already forced a SYCL shadow copy. Keep using

// that path so the final copy-back still targets the original user ptr.

MBackendOwnsWriteBack = false;

MHasPendingAlignedShadowCopy = false;

return;

}

Fixed: the MShadowCopy != nullptr early-return now sets MBackendOwnsWriteBack = false and clears MHasPendingAlignedShadowCopy.

iclsrc · 2026-05-14T10:27:24Z

+    return MNeedWriteBack && MUploadDataFunctor && !MBackendOwnsWriteBack;
+  }
+
+  bool backendOwnsWriteBack() const { return MBackendOwnsWriteBack; }


🔵 Suggestion: backendOwnsWriteBack() is declared but has no callers outside SYCLMemObjT

grep finds no uses of backendOwnsWriteBack() in the rest of the source tree. If it is intended as a future hook or debug aid, a brief comment explaining its purpose would help; otherwise consider removing it to avoid exposing internal state unnecessarily.

Removed backendOwnsWriteBack(); internal access goes through MBackendOwnsWriteBack directly.

slawekptak · 2026-05-14T10:30:12Z

Why do we need a copy here?

Removed; checkExpectedPattern now iterates Ptr directly.

slawekptak · 2026-05-14T10:31:32Z

Is this internal scope needed? Wouldn't the function scope be enough?

mateuszpn force-pushed the triple-buffer-bug branch 3 times, most recently from 5716585 to da2337a Compare April 13, 2026 11:27

mateuszpn requested a review from kswiecicki April 16, 2026 11:58

kswiecicki reviewed Apr 20, 2026

View reviewed changes

Comment thread sycl/source/detail/sycl_mem_obj_t.cpp Outdated

Comment thread sycl/source/detail/sycl_mem_obj_t.cpp

Comment thread unified-runtime/source/adapters/level_zero/v2/memory.hpp Outdated

Comment thread sycl/source/detail/memory_manager.cpp Outdated

mateuszpn added 9 commits April 29, 2026 19:58

fix triple-buffer issue

2034dde

fix ownership of write-back and extend the test

dfd0914

update copy-back in memory.cpp

f6e0bc7

update the test

9367073

set the offlad adapter at the safe side

533fba5

fix the test

13c8c69

fix the e2e test issues

48e52a1

--amend

8840866

fix for opencl and simplify shadow alloc

fdda47b

mateuszpn force-pushed the triple-buffer-bug branch from 763e548 to 979efe0 Compare May 4, 2026 09:39

mateuszpn marked this pull request as ready for review May 6, 2026 08:52

mateuszpn requested review from a team as code owners May 6, 2026 08:52

mateuszpn requested a review from slawekptak May 6, 2026 08:52

mateuszpn marked this pull request as draft May 6, 2026 09:05

mateuszpn force-pushed the triple-buffer-bug branch 3 times, most recently from d134ec2 to 77e781e Compare May 7, 2026 11:19

mateuszpn added 2 commits May 7, 2026 12:21

apply comments

77e781e

bugfix

94f0ae7

mateuszpn marked this pull request as ready for review May 8, 2026 12:56

kswiecicki approved these changes May 11, 2026

View reviewed changes

mateuszpn requested a review from kswiecicki May 12, 2026 10:36

slawekptak added the claude-review label May 14, 2026

iclsrc reviewed May 14, 2026

View reviewed changes

iclsrc removed the claude-review label May 14, 2026

slawekptak reviewed May 14, 2026

View reviewed changes

apply comments

33d56cf

Conversation

mateuszpn commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iclsrc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants