Skip to content

Commit fce123f

Browse files
committed
feat(core.utils): add program caches (in-memory, sqlite, filestream)
Convert cuda.core.utils to a package and add ObjectCode caches for artifacts produced by Program.compile. Public API (cuda.core.utils): * ProgramCacheResource -- abstract bytes|str -> ObjectCode mapping with context manager. Path-backed ObjectCode is rejected at write time (would store only the path, not the bytes). * InMemoryProgramCache -- in-process OrderedDict backend that stores entries by reference (no pickling). Optional max_entries and max_size_bytes caps with LRU eviction. __getitem__ promotes LRU; __contains__ is read-only. threading.RLock serialises every method. * SQLiteProgramCache -- single-file sqlite3 backend (WAL mode, autocommit) with LRU eviction and an optional size cap. A threading.RLock serialises connection use so one cache object is safe across threads. wal_checkpoint(TRUNCATE) + VACUUM run after evictions so the cap bounds real on-disk usage. __contains__ is read-only. __len__ prunes corrupt rows. Schema-mismatch on open drops tables and rebuilds; corrupt / non-SQLite files reinitialise empty; transient OperationalError propagates without nuking the file (and closes the partial connection). * FileStreamProgramCache -- directory of atomically-written entries (tmp + os.replace) safe across concurrent processes. blake2b(32) hashed filenames so arbitrary-length keys never overflow filesystem limits. Reader pruning, clear(), and _enforce_size_cap are all stat-guarded (inode/size/mtime snapshot; refuse unlink on mismatch) so a concurrent writer's os.replace is preserved. _enforce_size_cap also decrements its running ``total`` when a concurrent deleter wins the unlink race, so a suppressed FileNotFoundError cannot over-evict newly committed entries. Stale temp files swept on open; live temps count toward the size cap. Windows ERROR_SHARING_VIOLATION (32) and ERROR_LOCK_VIOLATION (33) on os.replace are retried with bounded backoff (~185ms) before being treated as a non-fatal cache miss; other PermissionError and all POSIX failures propagate. * make_program_cache_key -- stable 32-byte blake2b digest over code, code_type, ProgramOptions, target_type, name expressions, and environment probes: cuda-core version, NVRTC version, NVVM lib+IR version, linker backend+version for PTX inputs (driver version only on the cuLink path). Backend-specific gates mirror Program/Linker: - code_type lower-cased to match Program_init. - code_type/target_type validated against Program's SUPPORTED_TARGETS matrix. - NVRTC side-effect options (create_pch, time, fdevice_time_trace) and external-content options (include_path, pre_include, pch, use_pch, pch_dir) require an extra_digest. NVVM use_libdevice=True likewise. NVRTC options.name with a directory component (e.g. '/abs/k.cu') also requires extra_digest (or no_source_include=True) because NVRTC searches that directory for #include \"...\" lookups; bare labels fall back to CWD and stay accepted. - extra_sources rejected for non-NVVM; bytes-like ``code`` rejected for non-NVVM. - PTX (Linker) options pass through per-field gates that match _prepare_nvjitlink_options / _prepare_driver_options; ptxas_options canonicalised across str/list/tuple/empty shapes; driver-linker hard rejections (time, ptxas_options, split_compile) raise at key time; ftz/prec_div/prec_sqrt/fma collapse under the driver linker. - name_expressions gated on backend == \"nvrtc\". - Failed environment probes mix the exception class name into a *_probe_failed label so broken environments never collide with working ones while staying stable across processes and repeated calls. Lazy import: ``from cuda.core.utils import StridedMemoryView`` does not pull in any cache backend. The cache classes and make_program_cache_key are exposed via module __getattr__. _LAZY_CACHE_ATTRS is a single ordered tuple spliced into __all__ via ``*_LAZY_CACHE_ATTRS`` so the two lists cannot drift; star-import still walks __all__ and therefore resolves every lazy attribute, which is expected given star-imports are discouraged anyway. sqlite3 is imported lazily inside SQLiteProgramCache.__init__ so the package is usable on interpreters built without libsqlite3. Tests: ~200 cache tests covering single-process CRUD for all three backends; LRU/size-cap (logical and on-disk, including stat-guarded race scenarios); over-eviction race (monkeypatched Path.unlink); InMemory combined caps, overwrite-updates-size, LRU-touch-on-read, contains-does-not-bump, degenerate caps (single entry > cap, max_entries=0); NVRTC source-directory path-name guard with POSIX/Windows separators and both accept paths; corruption + __len__ pruning; schema-mismatch table-DROP; threaded SQLite and InMemory (4 writers + 4 readers x 200 ops); cross-process FileStream stress (writer/reader race exercising the stat-guard prune; clear/eviction race injection via generator cleanup); Windows vs POSIX PermissionError narrowing (winerror 32/33 swallow + retry, others propagate; partial-conn close on OperationalError); lazy-import subprocess test; _SUPPORTED_TARGETS_BY_CODE_TYPE parity test that parses _program.pyx via tokenize + ast.literal_eval; and end-to-end real CUDA C++ compile -> store -> reopen -> get_kernel roundtrip parametrized over the two persistent backends. Closes #177 Closes #178 Closes #179
1 parent 19fac32 commit fce123f

6 files changed

Lines changed: 4618 additions & 8 deletions

File tree

cuda_core/cuda/core/utils.py

Lines changed: 0 additions & 8 deletions
This file was deleted.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
from cuda.core._memoryview import (
6+
StridedMemoryView,
7+
args_viewable_as_strided_memory,
8+
)
9+
10+
# Lazily expose the program-cache APIs so ``from cuda.core.utils import
11+
# StridedMemoryView`` stays lightweight -- the cache backends pull in driver,
12+
# NVRTC, and module-load machinery that memoryview-only consumers do not need.
13+
# The laziness guarantee is for explicit imports only: ``from cuda.core.utils
14+
# import *`` walks ``__all__`` and therefore resolves every lazy attribute,
15+
# which eagerly pulls ``_program_cache`` in. Star-imports are discouraged
16+
# anyway, so treat that as expected.
17+
_LAZY_CACHE_ATTRS = (
18+
"FileStreamProgramCache",
19+
"InMemoryProgramCache",
20+
"ProgramCacheResource",
21+
"SQLiteProgramCache",
22+
"make_program_cache_key",
23+
)
24+
25+
__all__ = [
26+
"StridedMemoryView",
27+
"args_viewable_as_strided_memory",
28+
*_LAZY_CACHE_ATTRS,
29+
]
30+
31+
32+
def __getattr__(name):
33+
if name in _LAZY_CACHE_ATTRS:
34+
from cuda.core.utils import _program_cache
35+
36+
value = getattr(_program_cache, name)
37+
globals()[name] = value # cache for subsequent accesses
38+
return value
39+
raise AttributeError(f"module 'cuda.core.utils' has no attribute {name!r}")
40+
41+
42+
def __dir__():
43+
# Merge the lazy public API with the real module namespace so REPL and
44+
# introspection tools still surface ``__file__``, ``__spec__``, etc.
45+
return sorted(set(globals()) | set(__all__))

0 commit comments

Comments
 (0)