Commit d75a7bd
feat(cuda.core): cu12 fallback for prefetch_batch (N3)
Per Leo's review on PR #1775 (_managed_memory_ops.pyx:228), raising
NotImplementedError on cu12 forces users to write their own loop. The
CUDA driver semantics for cuMemPrefetchBatchAsync are equivalent to
per-range cuMemPrefetchAsync calls — just more efficient when batched
at the driver level.
On cu12 builds (where cuMemPrefetchBatchAsync is not exposed), fall
back to a Python-level loop calling cuMemPrefetchAsync per buffer.
The single-range path (_do_single_prefetch) already works on cu12
via the IF/ELSE split inside it.
Note this fallback applies only to prefetch_batch — discard_batch and
discard_prefetch_batch keep the cu12 NotImplementedError because the
driver has no single-range cuMemDiscard{,AndPrefetch}Async to fall
back to.
Test skips for cuMemPrefetchBatchAsync unavailability dropped from
TestPrefetchBatch.test_same_location and test_per_buffer_location;
the fallback path now runs on cu12 builds too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent a9cd713 commit d75a7bd
2 files changed
Lines changed: 12 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
287 | 288 | | |
288 | 289 | | |
289 | 290 | | |
| |||
364 | 365 | | |
365 | 366 | | |
366 | 367 | | |
367 | | - | |
368 | | - | |
369 | | - | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
370 | 375 | | |
371 | 376 | | |
372 | 377 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
352 | | - | |
353 | | - | |
354 | 352 | | |
355 | 353 | | |
356 | 354 | | |
| |||
372 | 370 | | |
373 | 371 | | |
374 | 372 | | |
375 | | - | |
376 | | - | |
377 | 373 | | |
378 | 374 | | |
379 | 375 | | |
| |||
0 commit comments