Skip to content

Commit 910187b

Browse files
Update docs for offloading (#58)
1 parent c03daf2 commit 910187b

File tree

3 files changed

+212
-34
lines changed

3 files changed

+212
-34
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,10 @@ out the [Documentation](https://pyomp.readthedocs.io).
100100

101101
PyOMP supports both CPU and GPU programming.
102102
For GPU programming, PyOMP implements OpenMP's `target` directive for offloading
103-
and supports the `device` clause, with `device(0)` by convention offloading to a
104-
GPU device.
105-
It is also possible to use the host as a multi-core CPU target device (mainly
106-
for testing purposes) by setting `device(1)`.
103+
and supports the `device` clause to select the offloading target device.
104+
For more information see the [GPU
105+
Offloading](https://pyomp.readthedocs.io/en/latest/openmp.html#openmp-and-gpu-offloading-support)
106+
section in the documentation.
107107

108108
### Example
109109

docs/source/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ You can install PyOMP from PyPI using `pip`:
77
88
$ pip install pyomp
99
10-
It is also possible to install PyOMP through conda:
10+
It is also possible to install PyOMP through `conda`:
1111

1212
.. code-block:: console
1313

docs/source/openmp.rst

Lines changed: 207 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -127,45 +127,106 @@ Combines ``target``, ``teams``, ``distribute``, and ``parallel for`` directives.
127127

128128
OpenMP runtime functions
129129
-------------------------
130-
**Thread and team information:**
131130

132-
* ``omp_get_thread_num()`` - Returns the unique identifier of the calling thread
133-
* ``omp_get_num_threads()`` - Returns the total number of threads in the current parallel region
134-
* ``omp_set_num_threads(n)`` - Sets the number of threads for subsequent parallel regions
135-
* ``omp_get_max_threads()`` - Returns the maximum number of threads available
136-
* ``omp_get_num_procs()`` - Returns the number of processors in the system
137-
* ``omp_get_thread_limit()`` - Returns the thread limit for the parallel region
138-
* ``omp_in_parallel()`` - Returns 1 if called within a parallel region, 0 otherwise
139-
* ``omp_get_team_num()`` - Returns the team number in a target region
140-
* ``omp_get_num_teams()`` - Returns the number of teams in a target region
131+
Thread and team information
132+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
133+
134+
.. list-table::
135+
:widths: 35 65
136+
137+
* - **omp_get_thread_num()**
138+
- Returns the unique identifier of the calling thread
139+
* - **omp_get_num_threads()**
140+
- Returns the total number of threads in the current parallel region
141+
* - **omp_set_num_threads(n)**
142+
- Sets the number of threads for subsequent parallel regions
143+
* - **omp_get_max_threads()**
144+
- Returns the maximum number of threads available
145+
* - **omp_get_num_procs()**
146+
- Returns the number of processors in the system
147+
* - **omp_get_thread_limit()**
148+
- Returns the thread limit for the parallel region
149+
* - **omp_in_parallel()**
150+
- Returns 1 if called within a parallel region, 0 otherwise
151+
* - **omp_get_team_num()**
152+
- Returns the team number in a target region
153+
* - **omp_get_num_teams()**
154+
- Returns the number of teams in a target region
155+
156+
Timing
157+
~~~~~~
141158

142-
**Timing:**
159+
.. list-table::
160+
:widths: 35 65
143161

144-
* ``omp_get_wtime()`` - Returns elapsed wall-clock time (useful for performance profiling)
162+
* - **omp_get_wtime()**
163+
- Returns elapsed wall-clock time (useful for performance profiling)
145164

146-
**Nested and hierarchical parallelism:**
165+
Nested and hierarchical parallelism
166+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147167

148-
* ``omp_set_nested(flag)`` - Enables or disables nested parallelism
149-
* ``omp_set_dynamic(flag)`` - Enables or disables dynamic thread adjustment
150-
* ``omp_set_max_active_levels(n)`` - Sets the maximum number of nested parallel levels
151-
* ``omp_get_max_active_levels()`` - Returns the maximum number of nested parallel levels
152-
* ``omp_get_level()`` - Returns the current nesting level
153-
* ``omp_get_active_level()`` - Returns the current active nesting level
154-
* ``omp_get_ancestor_thread_num(level)`` - Returns the thread number at a given nesting level
155-
* ``omp_get_team_size(level)`` - Returns the team size at a given nesting level
156-
* ``omp_get_supported_active_levels()`` - Returns the supported number of nested active levels
168+
.. list-table::
169+
:widths: 35 65
170+
171+
* - **omp_set_nested(flag)**
172+
- Enables or disables nested parallelism
173+
* - **omp_set_dynamic(flag)**
174+
- Enables or disables dynamic thread adjustment
175+
* - **omp_set_max_active_levels(n)**
176+
- Sets the maximum number of nested parallel levels
177+
* - **omp_get_max_active_levels()**
178+
- Returns the maximum number of nested parallel levels
179+
* - **omp_get_level()**
180+
- Returns the current nesting level
181+
* - **omp_get_active_level()**
182+
- Returns the current active nesting level
183+
* - **omp_get_ancestor_thread_num(level)**
184+
- Returns the thread number at a given nesting level
185+
* - **omp_get_team_size(level)**
186+
- Returns the team size at a given nesting level
187+
* - **omp_get_supported_active_levels()**
188+
- Returns the supported number of nested active levels
189+
190+
Advanced features
191+
~~~~~~~~~~~~~~~~~
157192

158-
**Advanced features:**
193+
.. list-table::
194+
:widths: 35 65
195+
196+
* - **omp_get_proc_bind()**
197+
- Returns the processor binding policy
198+
* - **omp_get_num_places()**
199+
- Returns the number of available places
200+
* - **omp_get_place_num_procs(place)**
201+
- Returns the number of processors in a place
202+
* - **omp_get_place_num()**
203+
- Returns the current place number
204+
* - **omp_in_final()**
205+
- Returns 1 if called in a final task, 0 otherwise
206+
207+
Device and target offloading
208+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
159209

160-
* ``omp_get_proc_bind()`` - Returns the processor binding policy
161-
* ``omp_get_num_places()`` - Returns the number of available places
162-
* ``omp_get_place_num_procs(place)`` - Returns the number of processors in a place
163-
* ``omp_get_place_num()`` - Returns the current place number
210+
.. list-table::
211+
:widths: 35 65
212+
213+
* - **omp_get_num_devices()**
214+
- Returns the number of available target devices
215+
* - **omp_get_device_num()**
216+
- Returns the device number of the current target device
217+
* - **omp_set_default_device(device_id)**
218+
- Sets the default device for subsequent target regions
219+
* - **omp_get_default_device()**
220+
- Returns the default device ID for target regions
221+
* - **omp_is_initial_device()**
222+
- Returns 1 if executing on the initial device (host), 0 otherwise
223+
* - **omp_get_initial_device()**
224+
- Returns the device ID of the initial device (host)
164225

165226
Supported features and platforms
166227
---------------------------------
167228

168-
OpenMP and GPU Offloading Support
229+
OpenMP and GPU offloading support
169230
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
170231

171232
PyOMP builds on `Numba <https://numba.pydata.org/>`_ Just-In-Time (JIT)
@@ -179,6 +240,111 @@ PyOMP also supports GPU offloading for NVIDIA GPUs. The supported GPU
179240
architectures depend on the LLVM version and its OpenMP runtime. Consult the
180241
LLVM OpenMP documentation for details on your specific version.
181242

243+
Device selection and querying
244+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
245+
246+
PyOMP provides utilities in the ``offloading`` module to query available OpenMP target
247+
devices and select specific devices for offloading based on device type, vendor, and
248+
architecture. This enables fine-grained control over where target regions execute.
249+
250+
Discovering Available Devices
251+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
252+
253+
To see all available devices and their properties, use ``print_offloading_info()``:
254+
255+
.. code-block:: python
256+
257+
from numba.openmp.offloading import print_offloading_info
258+
259+
print_offloading_info()
260+
261+
This prints information about all devices, including device counts and default device settings.
262+
263+
Finding devices by criteria
264+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
265+
266+
To programmatically find device IDs matching specific criteria, use ``find_device_ids()``:
267+
268+
.. code-block:: python
269+
270+
from numba.openmp.offloading import find_device_ids
271+
272+
# Find all GPU devices
273+
gpu_devices = find_device_ids(type="gpu")
274+
275+
# Find all NVIDIA GPUs
276+
nvidia_gpus = find_device_ids(vendor="nvidia")
277+
278+
# Find NVIDIA GPUs with specific architecture (e.g., sm_80)
279+
sm80_gpus = find_device_ids(vendor="nvidia", arch="sm_80")
280+
281+
# Find all AMD GPUs
282+
amd_gpus = find_device_ids(vendor="amd")
283+
284+
# Find host/CPU device
285+
host_devices = find_device_ids(type="host")
286+
287+
The function returns a list of device IDs (integers) matching the criteria. Any parameter
288+
can be ``None`` to act as a wildcard and match all values.
289+
290+
Querying device properties
291+
^^^^^^^^^^^^^^^^^^^^^^^^^^
292+
293+
To determine the type, vendor, or architecture of a specific device ID, use the property
294+
getter functions:
295+
296+
.. code-block:: python
297+
298+
from numba.openmp.offloading import (
299+
get_device_type,
300+
get_device_vendor,
301+
get_device_arch,
302+
)
303+
304+
# Check device type
305+
dev_type = get_device_type(device_id) # Returns "gpu", "host", or None
306+
307+
# Check vendor
308+
vendor = get_device_vendor(device_id) # Returns "nvidia", "amd", "host", or None
309+
310+
# Check architecture
311+
arch = get_device_arch(device_id) # Returns architecture string or None
312+
313+
Using device ids in target regions
314+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315+
316+
Once you have identified a device ID, you can use it in OpenMP target directives via the
317+
``device`` clause:
318+
319+
.. code-block:: python
320+
321+
from numba.openmp import njit, openmp_context as openmp
322+
from numba.openmp.offloading import find_device_ids
323+
import numpy as np
324+
325+
# Find first available NVIDIA GPU
326+
nvidia_devices = find_device_ids(vendor="nvidia")
327+
if nvidia_devices:
328+
device_id = nvidia_devices[0]
329+
else:
330+
# Fall back to host if no NVIDIA GPU found
331+
device_id = find_device_ids(type="host")[0]
332+
333+
334+
@njit
335+
def inc(x):
336+
with openmp(f"target loop device({device_id}) map(tofrom: x)"):
337+
# Computation runs on specified device
338+
for i in range(len(x)):
339+
x[i] = x[i] + 1
340+
341+
return x
342+
343+
344+
x = inc(np.ones(10))
345+
print(f"Result on device {device_id}: {x}")
346+
347+
182348
Version and platform support
183349
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184350

@@ -195,7 +361,19 @@ The following table shows tested combinations of PyOMP, Numba, Python, LLVM, and
195361
0.3.x 0.57.x - 0.60.x 3.9 - 3.12 14.x linux-64, osx-arm64, linux-arm64
196362
===================== ==================== ==================== ============ ================================
197363

198-
Platform Details
364+
OpenMP parallelism support by platform
365+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
366+
367+
=========== ================ ================= ===================
368+
Platform CPU NVIDIA GPU AMD GPU
369+
=========== ================ ================= ===================
370+
linux-64 ✅ Supported ✅ Supported 🔶 Work in progress
371+
linux-arm64 ✅ Supported ✅ Supported 🔶 Work in progress
372+
osx-arm64 ✅ Supported ❌ Unsupported ❌ Unsupported
373+
=========== ================ ================= ===================
374+
375+
376+
Platform details
199377
^^^^^^^^^^^^^^^^
200378

201379
* **linux-64**: Linux x86_64 architecture

0 commit comments

Comments
 (0)