Skip to content

[UR][SYCL] Triple buffer issue#21694

Open
mateuszpn wants to merge 12 commits into
intel:syclfrom
mateuszpn:triple-buffer-bug
Open

[UR][SYCL] Triple buffer issue#21694
mateuszpn wants to merge 12 commits into
intel:syclfrom
mateuszpn:triple-buffer-bug

Conversation

@mateuszpn
Copy link
Copy Markdown
Contributor

This branch fixes the SYCL triple-buffer/shadow-copy ownership bug by deferring misaligned host-pointer shadow-copy decisions until allocation time (via a new prepareForAllocation hook), so backend-aware policy can choose whether SYCL creates the shadow copy (NativeCPU/OpenCL/offload) or UR owns staging and final copy-back (Level Zero/CUDA/HIP).

It also adds a new end-to-end regression test (buffer_shadow_copy_platform_policy.cpp) that validates backend-dependent allocation behavior for misaligned read-only buffers and verifies correct data copy-back for writable buffers, covering the original failure mode and guarding against regressions.

@mateuszpn mateuszpn force-pushed the triple-buffer-bug branch 3 times, most recently from 5716585 to da2337a Compare April 13, 2026 11:27
@mateuszpn mateuszpn requested a review from kswiecicki April 16, 2026 11:58
Comment thread sycl/source/detail/sycl_mem_obj_t.cpp Outdated
Comment thread sycl/source/detail/sycl_mem_obj_t.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.hpp Outdated
Comment thread sycl/source/detail/memory_manager.cpp Outdated
@mateuszpn mateuszpn force-pushed the triple-buffer-bug branch from 763e548 to 979efe0 Compare May 4, 2026 09:39
@mateuszpn mateuszpn marked this pull request as ready for review May 6, 2026 08:52
@mateuszpn mateuszpn requested review from a team as code owners May 6, 2026 08:52
@mateuszpn mateuszpn requested a review from slawekptak May 6, 2026 08:52
@mateuszpn mateuszpn marked this pull request as draft May 6, 2026 09:05
@mateuszpn mateuszpn force-pushed the triple-buffer-bug branch 3 times, most recently from d134ec2 to 77e781e Compare May 7, 2026 11:19
@mateuszpn mateuszpn marked this pull request as ready for review May 8, 2026 12:56
Copy link
Copy Markdown
Collaborator

@iclsrc iclsrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR addresses the triple-buffer problem (misaligned host pointer → SYCL shadow copy → UR staging buffer → three live copies) by deferring the shadow-copy decision until the target backend is known. For Level Zero, SYCL skips its shadow copy and lets the adapter manage the host pointer; for OpenCL/CUDA/HIP, SYCL materialises an aligned shadow copy as before.

The structural approach is sound, but there is a critical correctness gap for the Level Zero write-back path, a data race on MPendingShadowCopyAlignment, and an integer-overflow risk in the allocation size calculation. Details below.

Comment on lines 124 to +128
if (hostPtr) {
// Initial copy using Level Zero for USM HOST memory
auto hDevice = hContext->getDevices()[0];
UR_CALL_THROWS(
synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));
// Store writeBackPtr for copy-back - needed when original pointer
// cannot be imported (e.g., misaligned, wrong allocation type)
writeBackPtr = hostPtr;
mapToPtr = hostPtr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: write-back to misaligned host pointer is silently dropped for Level Zero integrated GPU

mapToPtr is set to the original (potentially misaligned) host pointer, but writeBackPtr — which is what copyBackToHostIfNeeded() uses in the destructor — is never assigned in this constructor. After this PR:

  • SYCL skips its MUploadDataFunctor because MBackendOwnsWriteBack = true.
  • UR skips copy-back in the destructor because writeBackPtr == nullptr.

The mapToPtr path does copy data from the device USM buffer back to the original pointer, but only when mapHostPtr is called with UR_MAP_FLAG_READ (i.e., during an explicit host-side map/unmap). There is no call to this at buffer-destruction time, so kernel writes are silently lost at scope exit for misaligned host pointers on integrated GPUs.

Either writeBackPtr should also be set here (mirroring mapToPtr), or copyBackToHostIfNeeded should use mapToPtr for the final synchronous copy, e.g.:

Suggested change
if (hostPtr) {
// Initial copy using Level Zero for USM HOST memory
auto hDevice = hContext->getDevices()[0];
UR_CALL_THROWS(
synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));
// Store writeBackPtr for copy-back - needed when original pointer
// cannot be imported (e.g., misaligned, wrong allocation type)
writeBackPtr = hostPtr;
mapToPtr = hostPtr;
if (hostPtr) {
auto hDevice = hContext->getDevices()[0];
UR_CALL_THROWS(
synchronousZeCopy(hContext, hDevice, this->ptr.get(), hostPtr, size));
mapToPtr = hostPtr;
writeBackPtr = hostPtr;
}

The write-back test in the PR (runWriteKernel) will only catch this on integrated hardware; on discrete hardware the discrete-buffer path is taken and the bug would not manifest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: writeBackPtr = hostPtr is now set alongside mapToPtr so ~ur_integrated_buffer_handle_t copies device updates back even without explicit map/unmap.

Comment thread sycl/source/detail/sycl_mem_obj_t.cpp Outdated
Comment on lines +272 to +274
const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);
if (BackendRequiredAlign > MPendingShadowCopyAlignment)
MPendingShadowCopyAlignment = BackendRequiredAlign;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Important: MPendingShadowCopyAlignment written outside the mutex — potential data race

MPendingShadowCopyAlignment is updated at line 274 before MCreateShadowCopyMtx is acquired (that happens at line 293). The same field is also written inside materializeShadowCopy() (line 202), which can be invoked concurrently from a lambda in MCreateShadowCopy on another thread.

Move the alignment update inside the lock, or capture the computed alignment in a local variable and only write the member after acquiring the mutex:

Suggested change
const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);
if (BackendRequiredAlign > MPendingShadowCopyAlignment)
MPendingShadowCopyAlignment = BackendRequiredAlign;
const size_t BackendRequiredAlign = getBackendShadowCopyAlignment(Context);
std::lock_guard<std::mutex> Lock(MCreateShadowCopyMtx);
if (!MHasPendingAlignedShadowCopy)
return;
if (BackendRequiredAlign > MPendingShadowCopyAlignment)
MPendingShadowCopyAlignment = BackendRequiredAlign;

(Remove the now-redundant second lock-acquisition at line 293 and the second MHasPendingAlignedShadowCopy check at line 294.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: MPendingShadowCopyAlignment is now updated under MCreateShadowCopyMtx, after the in-lock MHasPendingAlignedShadowCopy re-check.

Comment thread sycl/source/detail/sycl_mem_obj_t.cpp Outdated
Comment on lines +223 to +226
const size_t AllocBytes =
MSizeInBytes == 0 ? RequiredAlign
: ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) *
RequiredAlign;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Important: integer overflow when MSizeInBytes is close to SIZE_MAX

MSizeInBytes + RequiredAlign - 1 wraps when MSizeInBytes is near SIZE_MAX. The same pattern is used in the branch in sycl_mem_obj_t.hpp (line ~260) for the iterator-pair constructor.

Use an overflow-safe round-up:

Suggested change
const size_t AllocBytes =
MSizeInBytes == 0 ? RequiredAlign
: ((MSizeInBytes + RequiredAlign - 1) / RequiredAlign) *
RequiredAlign;
const size_t AllocBytes =
MSizeInBytes == 0 ? RequiredAlign
: MSizeInBytes + (RequiredAlign - (MSizeInBytes % RequiredAlign)) % RequiredAlign;

This avoids the intermediate MSizeInBytes + RequiredAlign - 1 sum that overflows.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: switched to the overflow-safe form MSizeInBytes + (RequiredAlign - MSizeInBytes % RequiredAlign) % RequiredAlign.

Comment on lines +296 to +300
if (SkipShadowCopy) {
if (MShadowCopy != nullptr) {
// A writable host accessor already forced a SYCL shadow copy. Keep using
// that path so the final copy-back still targets the original user ptr.
return;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Important: MHasPendingAlignedShadowCopy is not cleared on the early-return path, leaving state inconsistent across reallocations

When MShadowCopy != nullptr the function returns without clearing MHasPendingAlignedShadowCopy. This means that on a subsequent allocation (e.g., the buffer migrates to a different context), prepareForAllocation is called again, re-enters the SkipShadowCopy branch, finds MShadowCopy != nullptr again, and returns — which is fine for Level Zero.

However, if the first allocation was Level Zero (sets MBackendOwnsWriteBack = true, clears MHasPendingAlignedShadowCopy), then a host accessor forces a shadow copy (MShadowCopy != nullptr), and a second allocation on a non-Level-Zero backend calls prepareForAllocation: MHasPendingAlignedShadowCopy == false → early return at line 263 → MBackendOwnsWriteBack stays true even though SYCL now has a valid shadow copy and should own write-back.

Consider clearing MHasPendingAlignedShadowCopy and explicitly setting MBackendOwnsWriteBack = false in this branch to reflect that SYCL retains ownership:

Suggested change
if (SkipShadowCopy) {
if (MShadowCopy != nullptr) {
// A writable host accessor already forced a SYCL shadow copy. Keep using
// that path so the final copy-back still targets the original user ptr.
return;
if (MShadowCopy != nullptr) {
// A writable host accessor already forced a SYCL shadow copy. Keep using
// that path so the final copy-back still targets the original user ptr.
MBackendOwnsWriteBack = false;
MHasPendingAlignedShadowCopy = false;
return;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the MShadowCopy != nullptr early-return now sets MBackendOwnsWriteBack = false and clears MHasPendingAlignedShadowCopy.

Comment thread sycl/source/detail/sycl_mem_obj_t.hpp Outdated
return MNeedWriteBack && MUploadDataFunctor && !MBackendOwnsWriteBack;
}

bool backendOwnsWriteBack() const { return MBackendOwnsWriteBack; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Suggestion: backendOwnsWriteBack() is declared but has no callers outside SYCLMemObjT

grep finds no uses of backendOwnsWriteBack() in the rest of the source tree. If it is intended as a future hook or debug aid, a brief comment explaining its purpose would help; otherwise consider removing it to avoid exposing internal state unnecessarily.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed backendOwnsWriteBack(); internal access goes through MBackendOwnsWriteBack directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a copy here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed; checkExpectedPattern now iterates Ptr directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this internal scope needed? Wouldn't the function scope be enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removedq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants