Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f9743a9
bindless image semaphore usage requires queues that use immediate com…
cperkinsintel May 13, 2026
ed6c1d3
narrower include
cperkinsintel May 15, 2026
a58cfbc
revise coupla tests
cperkinsintel May 15, 2026
d49c119
validation optional
cperkinsintel May 15, 2026
eb3f2f1
spelling correction
cperkinsintel May 15, 2026
3f0ce1e
DG2 testing going well. Reenabling lots of tests
cperkinsintel May 15, 2026
16f8fa2
exploratory
cperkinsintel May 18, 2026
77ad8f3
clang-format never fails
cperkinsintel May 18, 2026
cc223b3
clang-format is decidedly hostile.
cperkinsintel May 18, 2026
57c5d75
BMG enablement for DX12 tests
cperkinsintel May 18, 2026
ced6462
enable some moar vulkan tests
cperkinsintel May 19, 2026
bed00b0
clang-format is hereby uninvited from all my future social gatherings…
cperkinsintel May 19, 2026
4a0b7f6
bump
cperkinsintel May 19, 2026
88ba550
merge
cperkinsintel May 19, 2026
212abd3
Eric's suggestion
cperkinsintel May 19, 2026
3a0c314
updating tests.
cperkinsintel May 20, 2026
7a0e921
moar reenablement
cperkinsintel May 20, 2026
fdfc513
clang-for-nothing
cperkinsintel May 20, 2026
79ddf91
requiring driver v
cperkinsintel May 21, 2026
1bed428
resolve merge conflict
cperkinsintel May 21, 2026
52547c5
hmm
cperkinsintel May 21, 2026
3d703be
resolve merge conflicts
cperkinsintel May 21, 2026
a51df22
the whole lin: win: thing on the driver gate is confusing A.F. we nee…
cperkinsintel May 22, 2026
b717ddf
precautionary
cperkinsintel May 22, 2026
d9775a6
removing unorm8
cperkinsintel May 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2259,13 +2259,18 @@ memory resources handles can take different forms of structure and type
depending on the API and operating system, so do external semaphore resource
handles.

It is important to note, that the use of imported external semaphore objects
within SYCL has the restriction in that imported external semaphores can only
be used in conjuction with SYCL queues that have been constructed with the
`property::queue::in_order` property. The semaphore synchronization mechanism
is not supported for the default SYCL out-of-order queues. Use of the semaphore
synchronization mechanism with SYCL queues which were not constructed with the
`queue::in_order` property will result in undefined behaviour.
It is important to note that the use of imported external semaphore objects
within SYCL requires the SYCL queue to have been constructed with both of the
following properties:

* `sycl::property::queue::in_order` -- the semaphore synchronization mechanism
is not supported on the default out-of-order queues.
* `sycl::ext::intel::property::queue::immediate_command_list` -- external
semaphore operations are only supported on queues backed by immediate
command lists. This restriction might be lifted in the future.

Use of the semaphore synchronization mechanism with a SYCL queue that was not
constructed with both of these properties will result in undefined behaviour.

External semaphore import is facilitated through the following proposed
descriptor struct.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "dx11_interop.h"

#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/properties/queue_properties.hpp>

#ifdef TEST_SEMAPHORE_IMPORT
#include <d3d11_4.h> // Used for ID3D11Device5 / ID3D11DeviceContext4 / ID3D11Fence
Expand Down Expand Up @@ -438,8 +439,16 @@ int runTest(D3D11ProgramState &d3d11ProgramState, sycl::queue syclQueue,
}

int main() {
// Create SYCL queue, relying on SYCL device selection
// Create SYCL queue, relying on SYCL device selection.
#ifdef TEST_SEMAPHORE_IMPORT
// External semaphore ops require an in-order queue backed by immediate
// command lists (see sycl_ext_oneapi_bindless_images.asciidoc).
sycl::queue syclQueue{
{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::immediate_command_list{}}};
#else
sycl::queue syclQueue;
#endif
sycl::device syclDevice = syclQueue.get_device();

// Initialize D3D11 and create DX11 programs state from the SYCL device
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,7 @@
// REQUIRES: aspect-ext_oneapi_external_semaphore_import
// REQUIRES: windows

// UNSUPPORTED: gpu-intel-dg2
// UNSUPPORTED-TRACKER: GSD-12428
// semaphores-do-not-work-dg2

// UNSUPPORTED: gpu-intel-gen12
// UNSUPPORTED-TRACKER: GSD-12427
// Gen12-semaphores-work-but-this-test-hangs.

// UNSUPPORTED: arch-intel_gpu_bmg_g21
// UNSUPPORTED-TRACKER: GSD-12436
// this test works on BMG, but if run in parallel with itself, or with other
// semaphore tests, it can hang.
// REQUIRES-INTEL-DRIVER: lin: 38303 win: 101.9999

// RUN: %{build} %link-directx -o %t.exe %if target-spir %{ -Wno-ignored-attributes %}
// RUN: %{run} %t.exe --no-sem
Expand Down Expand Up @@ -50,6 +39,7 @@
#include <string>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/properties/queue_properties.hpp>
#include <vector>

#define WIN32_LEAN_AND_MEAN
Expand Down Expand Up @@ -120,7 +110,15 @@ int main(int argc, char **argv) {

// SYCL INTEROP
try {
sycl::queue q;
// Bindless image interop requires an in-order queue (per spec). External
// semaphore ops additionally require immediate command lists; see
// sycl_ext_oneapi_bindless_images.asciidoc.
sycl::property_list qProps =
useSemaphores ? sycl::property_list{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::
immediate_command_list{}}
: sycl::property_list{sycl::property::queue::in_order{}};
sycl::queue q{qProps};
auto device = q.get_device();
auto context = q.get_context();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,7 @@
// REQUIRES: aspect-ext_oneapi_external_semaphore_import
// REQUIRES: windows

// UNSUPPORTED: gpu-intel-dg2
// UNSUPPORTED-TRACKER: GSD-12428

// UNSUPPORTED: gpu-intel-gen12
// UNSUPPORTED-TRACKER: GSD-12427

// UNSUPPORTED: arch-intel_gpu_bmg_g21
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/22028
// REQUIRES-INTEL-DRIVER: lin: 38303 win: 101.9999

// RUN: %{build} %link-directx -o %t.exe %if target-spir %{ -Wno-ignored-attributes %}
// RUN: %{run} %t.exe --no-sem
Expand All @@ -35,6 +28,7 @@
#include <string>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/properties/queue_properties.hpp>
#include <vector>

#define WIN32_LEAN_AND_MEAN
Expand Down Expand Up @@ -183,7 +177,15 @@ int main(int argc, char **argv) {

// SYCL INTEROP - using resource_win32_name NATIVELY
try {
sycl::queue q;
// Bindless image interop requires an in-order queue (per spec). External
// semaphore ops additionally require immediate command lists; see
// sycl_ext_oneapi_bindless_images.asciidoc.
sycl::property_list qProps =
useSemaphores ? sycl::property_list{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::
immediate_command_list{}}
: sycl::property_list{sycl::property::queue::in_order{}};
sycl::queue q{qProps};
auto device = q.get_device();
auto context = q.get_context();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,19 +74,18 @@


// Semaphore coverage tests
// At this time, semaphores aren't working on DG2 (GSD-12428), and can hang on BMG if run in parallel (GSD-12436).
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int32 --channels 1 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint32 --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int16 --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint16 --channels 1 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint8 --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int8 --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 4 --sampled --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 2 --sampled --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int32 --channels 1 --sampled --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint32 --channels 4 --sampled --semaphores 33x
// RUN: %{run} %t.exe --type float --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type half --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type int32 --channels 1 --semaphores 33x
// RUN: %{run} %t.exe --type uint32 --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type int16 --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type uint16 --channels 1 --semaphores 33x
// RUN: %{run} %t.exe --type uint8 --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type int8 --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type float --channels 4 --sampled --semaphores 33x
// RUN: %{run} %t.exe --type half --channels 2 --sampled --semaphores 33x
// RUN: %{run} %t.exe --type int32 --channels 1 --sampled --semaphores 33x
// RUN: %{run} %t.exe --type uint32 --channels 4 --sampled --semaphores 33x

// clang-format on

Expand All @@ -98,6 +97,7 @@
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/ext/oneapi/bindless_images_interop.hpp>
#include <sycl/properties/queue_properties.hpp>

namespace syclexp = sycl::ext::oneapi::experimental;

Expand Down Expand Up @@ -187,7 +187,15 @@ int runTest(

// SYCL Import and Verification
try {
sycl::queue q;
// Bindless image interop requires an in-order queue (per spec). External
// semaphore ops additionally require immediate command lists; see
// sycl_ext_oneapi_bindless_images.asciidoc.
sycl::property_list qProps =
useSemaphores ? sycl::property_list{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::
immediate_command_list{}}
: sycl::property_list{sycl::property::queue::in_order{}};
sycl::queue q{qProps};

syclexp::external_mem_descriptor<syclexp::resource_win32_handle> extMemDesc{
imgRes.sharedHandle, syclexp::external_mem_handle_type::win32_nt_handle,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

// clang-format off
/*
clang++.exe -fsycl -o ds1w.exe D3D12_sycl_interop_1D_write.cpp -ld3d12 -ldxgi -ld3dcompiler
clang++.exe -fsycl -o ds1w.exe D3D12_sycl_interop_1D_write_unsampled.cpp -ld3d12 -ldxgi -ld3dcompiler

FLAGS:
--sampled ERROR: Sampled image writes are not supported
Expand Down Expand Up @@ -42,24 +42,19 @@
// RUN: %{run} %t.exe --type int8 --channels 1 33x
// RUN: %{run} %t.exe --type int8 --channels 2 33x
// RUN: %{run} %t.exe --type int8 --channels 4 33x
// RUN: %{run} %t.exe --type unorm8 --channels 1 33x
// RUN: %{run} %t.exe --type unorm8 --channels 2 33x
// RUN: %{run} %t.exe --type unorm8 --channels 4 33x

// Semaphore coverage tests
// At this time, semaphores aren't working on DG2 (GSD-12428), and can hang on BMG if run in parallel (GSD-12436).
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 1 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int32 --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint32 --channels 1 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int16 --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint16 --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint8 --channels 1 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int8 --channels 2 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type unorm8 --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 4 --semaphores 33x
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint32 --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type float --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type float --channels 1 --semaphores 33x
// RUN: %{run} %t.exe --type half --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type int32 --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type uint32 --channels 1 --semaphores 33x
// RUN: %{run} %t.exe --type int16 --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type uint16 --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type uint8 --channels 1 --semaphores 33x
// RUN: %{run} %t.exe --type int8 --channels 2 --semaphores 33x
// RUN: %{run} %t.exe --type half --channels 4 --semaphores 33x
// RUN: %{run} %t.exe --type uint32 --channels 2 --semaphores 33x

// clang-format on

Expand All @@ -71,6 +66,7 @@
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/ext/oneapi/bindless_images_interop.hpp>
#include <sycl/properties/queue_properties.hpp>

namespace syclexp = sycl::ext::oneapi::experimental;

Expand Down Expand Up @@ -148,7 +144,15 @@ int runTest(
}

try {
sycl::queue q;
// Bindless image interop requires an in-order queue (per spec). External
// semaphore ops additionally require immediate command lists; see
// sycl_ext_oneapi_bindless_images.asciidoc.
sycl::property_list qProps =
useSemaphores ? sycl::property_list{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::
immediate_command_list{}}
: sycl::property_list{sycl::property::queue::in_order{}};
sycl::queue q{qProps};

syclexp::external_mem_descriptor<syclexp::resource_win32_handle> extMemDesc{
imgRes.sharedHandle, syclexp::external_mem_handle_type::win32_nt_handle,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
// REQUIRES: aspect-ext_oneapi_external_memory_import
// REQUIRES: windows

// UNSUPPORTED: arch-intel_gpu_bmg_g21
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/20384
// also GSD-12429

// RUN: %{build} -o %t.exe %link-directx
// RUN: %{run} %t.exe --type float --channels 4 8x8

Expand All @@ -26,7 +22,6 @@

DG2:
- WORKS, including --sampled
- semaphores segfault

DG2 $ sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over
Expand Down Expand Up @@ -61,9 +56,6 @@
// RUN: %{run} %t.exe --type int8 --channels 1 32x33
// RUN: %{run} %t.exe --type int8 --channels 2 32x33
// RUN: %{run} %t.exe --type int8 --channels 4 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 1 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 2 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 4 32x33
// RUN: %{run} %t.exe --type float --channels 1 --sampled 32x33
// RUN: %{run} %t.exe --type float --channels 2 --sampled 32x33
// RUN: %{run} %t.exe --type float --channels 4 --sampled 32x33
Expand All @@ -88,24 +80,19 @@
// RUN: %{run} %t.exe --type int8 --channels 1 --sampled 32x33
// RUN: %{run} %t.exe --type int8 --channels 2 --sampled 32x33
// RUN: %{run} %t.exe --type int8 --channels 4 --sampled 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 1 --sampled 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 2 --sampled 32x33
// RUN-IF: !gpu-intel-bmg, %{run} %t.exe --type unorm8 --channels 4 --sampled 32x33

// Semaphore coverage tests
// At this time, semaphores aren't working on DG2 (GSD-12428), and can hang on BMG if run in parallel (GSD-12436).
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 4 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 2 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int32 --channels 1 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint32 --channels 4 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int16 --channels 2 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint16 --channels 1 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type uint8 --channels 4 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int8 --channels 2 --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type float --channels 4 --sampled --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type half --channels 2 --sampled --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type int32 --channels 1 --sampled --semaphores 32x33
// RUN-IF: (!gpu-intel-dg2 && !arch-intel_gpu_bmg_g21), %{run} %t.exe --type unorm8 --channels 4 --sampled --semaphores 32x33
// RUN: %{run} %t.exe --type float --channels 4 --semaphores 32x33
// RUN: %{run} %t.exe --type half --channels 2 --semaphores 32x33
// RUN: %{run} %t.exe --type int32 --channels 1 --semaphores 32x33
// RUN: %{run} %t.exe --type uint32 --channels 4 --semaphores 32x33
// RUN: %{run} %t.exe --type int16 --channels 2 --semaphores 32x33
// RUN: %{run} %t.exe --type uint16 --channels 1 --semaphores 32x33
// RUN: %{run} %t.exe --type uint8 --channels 4 --semaphores 32x33
// RUN: %{run} %t.exe --type int8 --channels 2 --semaphores 32x33
// RUN: %{run} %t.exe --type float --channels 4 --sampled --semaphores 32x33
// RUN: %{run} %t.exe --type half --channels 2 --sampled --semaphores 32x33
// RUN: %{run} %t.exe --type int32 --channels 1 --sampled --semaphores 32x33

// clang-format on

Expand All @@ -118,6 +105,7 @@
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/bindless_images.hpp>
#include <sycl/ext/oneapi/bindless_images_interop.hpp>
#include <sycl/properties/queue_properties.hpp>
#include <vector>

namespace syclexp = sycl::ext::oneapi::experimental;
Expand Down Expand Up @@ -415,7 +403,15 @@ int runTest(
signalExportableFence(ctx, extFenceB);

try {
sycl::queue q;
// Bindless image interop requires an in-order queue (per spec). External
// semaphore ops additionally require immediate command lists; see
// sycl_ext_oneapi_bindless_images.asciidoc.
sycl::property_list qProps =
useSemaphores ? sycl::property_list{sycl::property::queue::in_order{},
sycl::ext::intel::property::queue::
immediate_command_list{}}
: sycl::property_list{sycl::property::queue::in_order{}};
sycl::queue q{qProps};

auto extMemA = syclexp::import_external_memory(
syclexp::external_mem_descriptor<syclexp::resource_win32_handle>{
Expand Down
Loading
Loading