[SYCL] Embed fsycl-id-queries-range in device image as property.#21979
[SYCL] Embed fsycl-id-queries-range in device image as property.#21979uditagarwal97 wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR shifts -fsycl-id-queries-range enforcement from a host-side header check to an enqueue-time runtime check by embedding the selected mode into the device image as a binary property and reading it during kernel submission.
Changes:
- Add an
idQueriesRangemisc property to SYCL device images (emitted bysycl-post-link, driven by a new post-link option forwarded from clang). - Validate launch ranges at enqueue time in the scheduler using the embedded property; remove the old header-based
checkValueRangemechanism and associated include-deps expectations. - Remove the dedicated
sycl/unittests/range/*unit tests and their CMake wiring.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
sycl/source/detail/scheduler/commands.cpp |
Fetches idQueriesRange property from the device image and performs enqueue-time range validation. |
sycl/source/detail/ndrange_desc.hpp |
Adds getNumGlobalWorkGroups() helper used by the new validation logic. |
sycl/source/detail/device_binary_image.cpp |
Adjusts dynamic image merge behavior for misc properties (now also mentions idQueriesRange). |
llvm/tools/sycl-post-link/sycl-post-link.cpp |
Introduces -id-queries-range= option and emits the property via GlobalBinImageProps. |
llvm/lib/SYCLPostLink/ComputeModuleRuntimeInfo.cpp |
Emits idQueriesRange into the device image misc property set. |
llvm/include/llvm/SYCLPostLink/ComputeModuleRuntimeInfo.h |
Extends GlobalBinImageProps with IdQueriesRange. |
clang/lib/Driver/ToolChains/Clang.cpp |
Forwards -fsycl-id-queries-range= to sycl-post-link as -id-queries-range=. |
sycl/include/sycl/queue.hpp |
Removes host-side checkValueRange usage and the dependency on the deleted header. |
sycl/include/sycl/handler.hpp |
Removes host-side checkValueRange usage and the dependency on the deleted header. |
sycl/include/sycl/detail/id_queries_fit_in_int.hpp |
Deletes the old header implementing host-side range validation. |
sycl/test/include_deps/sycl_khr_includes_*.hpp.cpp and sycl/test/include_deps/sycl_detail_core.hpp.cpp |
Updates include dependency expectations after removing id_queries_fit_in_int.hpp. |
sycl/unittests/CMakeLists.txt |
Removes add_subdirectory(range) from the SYCL unit test suite. |
sycl/unittests/range/** |
Deletes the -fsycl-id-queries-range range validation unit tests and their CMake glue. |
| // Add device image property only if the image has a non-default | ||
| // SYCL Id range. The default range is 0 (signed int). | ||
| if (IdQueriesRange != 0) |
There was a problem hiding this comment.
The default range in the SYCL spec is size_t. Would it be safer to treat this as the default and require explicit opt-in to a smaller type, e.g. signed int?
There was a problem hiding this comment.
Even though default range as per SYCL spec is size_t, we set -fsycl-id-queries-range=int option by default for performance benefits
| // Make sure maximum number of work groups in each dimension does not | ||
| // exceed uint32_t. | ||
| uint64_t NumGlobalWorkGroups = NDRDesc.getNumGlobalWorkGroups(); | ||
| uint64_t MaxUint = | ||
| static_cast<uint64_t>(std::numeric_limits<uint32_t>::max()); | ||
| // Split the max work groups across all dimensions and check if | ||
| // any dimension exceeds uint32_t. | ||
| uint64_t NumGlobalWorkGroupsPerDim = NumGlobalWorkGroups / NDRDesc.Dims; |
There was a problem hiding this comment.
I don't think this error check is doing what is intended. Consider a 2D range with with [2 x UINT32_MAX, 1] work-groups. I think we want to emit the error string here, since the first dimension exceeds MaxUint, but due to the division by NDRDesc.Dims we will not. We should be checking each dimension of the range, not anything based on the total number of work-groups.
Noting for completeness also: Devices may support an even smaller maximum group count in each dimension (refer to ze_device_compute_properties_t), but a group count that exceeds UINT32_MAX is definitely too big for all devices currently.
| ErrMsg = | ||
| "The kernel was compiled with -fsycl-id-queries-range=int, but the " | ||
| "provided range/offset exceeds the maximum value storable in an " | ||
| "int. Either reduce the range/offset or " | ||
| "recompile the kernel with -fsycl-id-queries-range=[uint|size_t]."; |
There was a problem hiding this comment.
Can you check that this string doesn't get constructed unconditionally (with an associated memory allocation and free) when the check passes? From a performance POV we need to be careful not to introduce memory allocations into the submission path.
| MaxRange = static_cast<uint64_t>(std::numeric_limits<size_t>::max()); | ||
| ErrMsg = "The provided range/offset exceeds the maximum " | ||
| "value storable in a size_t, " | ||
| "which is the maximum value supported by DPCPP."; |
There was a problem hiding this comment.
Maybe not worth worrying about, but this error message should never be generated, correct? How could a provided range exceed size_t?
| if (Dims == 0) | ||
| return 0; |
There was a problem hiding this comment.
Nitpick: What are the semantics of a zero-dimensional range? Should this be zero work-groups, or one work-group?
| // TODO: Can we have a case where only GFlobalSize and GlobalOffset are | ||
| // are set, and not LocalSize? If so, we need to handle that case as well. |
There was a problem hiding this comment.
I think we can, but the global offset does not affect the number of work-groups calculation, so I think this comment can just be removed.
If we decide to keep it though, note the small misspelling:
| // TODO: Can we have a case where only GFlobalSize and GlobalOffset are | |
| // are set, and not LocalSize? If so, we need to handle that case as well. | |
| // TODO: Can we have a case where only GlobalSize and GlobalOffset are | |
| // are set, and not LocalSize? If so, we need to handle that case as well. |
| // Localproduct equals to zero means user has not specified local size | ||
| // and backend is free to choose it. In this case, the maximum number of | ||
| // workgroups is equal to the total global size, assuming local size to | ||
| // be 1. | ||
| if (LocalProduct == 0) | ||
| return GlobalProduct; |
There was a problem hiding this comment.
Assuming a local work-group size of one is almost certainly going to over-estimate the number of work-groups in most cases. I haven't thought through all of the scenarios to know if this is a problem or not, just mentioning for completeness.
| // Complile the kernel with different -fsycl-id-queries-range values | ||
| // to test overflow detection and Level Zero work-group limits. | ||
|
|
||
| // REQUIRES: level_zero |
There was a problem hiding this comment.
What's level_zero-specific about this test? Seems like the int and uint cases especially should apply to all backends (size_t might be a bit of a special-case).
| // Forward -fsycl-id-queries-range= to sycl-post-link. | ||
| if (Arg *A = TCArgs.getLastArg(options::OPT_fsycl_id_queries_range_EQ)) { | ||
| SmallString<64> IdQueriesRangeOpt("-id-queries-range="); | ||
| IdQueriesRangeOpt += A->getValue(); | ||
| addArgs(PostLinkArgs, TCArgs, {IdQueriesRangeOpt.str()}); | ||
| } |
There was a problem hiding this comment.
| // Forward -fsycl-id-queries-range= to sycl-post-link. | |
| if (Arg *A = TCArgs.getLastArg(options::OPT_fsycl_id_queries_range_EQ)) { | |
| SmallString<64> IdQueriesRangeOpt("-id-queries-range="); | |
| IdQueriesRangeOpt += A->getValue(); | |
| addArgs(PostLinkArgs, TCArgs, {IdQueriesRangeOpt.str()}); | |
| } | |
| // Forward -fsycl-id-queries-range= to sycl-post-link. | |
| if (const Arg *A = TCArgs.getLastArg(options::OPT_fsycl_id_queries_range_EQ)) { | |
| PostLinkArgs.push_back(TCArgs.MakeArgString( | |
| Twine("-id-queries-range=") + A->getValue())); | |
| } |
|
|
||
| // Forward -fsycl-id-queries-range= to sycl-post-link. | ||
| if (Arg *A = TCArgs.getLastArg(options::OPT_fsycl_id_queries_range_EQ)) { | ||
| SmallString<64> IdQueriesRangeOpt("-id-queries-range="); |
There was a problem hiding this comment.
The option definition is OPT_fsycl_id_queries_range_EQ and has 'fsycl in it.
Curious, why is the option not -fsycl-id-queries-range=<value>
| MNodes; | ||
| }; | ||
|
|
||
| void checkNDRangeBoundsAndThrow(const NDRDescT &NDRDesc, |
There was a problem hiding this comment.
This declaration signature doesn't match the definition.
| IdQueryRangeProp = | ||
| DeviceImageImpl.get_bin_image_ref()->getProperty("idQueriesRange"); |
There was a problem hiding this comment.
Property value is queried at every kernel launch, which is expensive, please consider caching this value as it doesn't change for device image.
Problem
Consider the following case:
(1) Compile a.cpp with a kernel foo (say, in a functor), with
-fsycl-id-queries-range=uint(2) Compile b.cpp that submits kernel foo, without
-fsycl-id-queries-rangeoption (by default that means ID queries/ranges are limited to INT_MAX)(3) Link the .o files from (1) and (2).
Currently, we emit the check whether id/ranges are in bounds, at compile-time, so when compiling b.cpp, we'd emit check to make sure SYCL queries/id are less than INT_MAX, even though the kernel we are submitting (from a.cpp) can support range upto UINT_MAX.
Solution
Store
-fsycl-id-queries-rangeas a device image property and do the range checks at runtime.