Commit fce123f
committed
feat(core.utils): add program caches (in-memory, sqlite, filestream)
Convert cuda.core.utils to a package and add ObjectCode caches for
artifacts produced by Program.compile.
Public API (cuda.core.utils):
* ProgramCacheResource -- abstract bytes|str -> ObjectCode mapping
with context manager. Path-backed ObjectCode is rejected at write
time (would store only the path, not the bytes).
* InMemoryProgramCache -- in-process OrderedDict backend that
stores entries by reference (no pickling). Optional max_entries
and max_size_bytes caps with LRU eviction. __getitem__ promotes
LRU; __contains__ is read-only. threading.RLock serialises every
method.
* SQLiteProgramCache -- single-file sqlite3 backend (WAL mode,
autocommit) with LRU eviction and an optional size cap. A
threading.RLock serialises connection use so one cache object is
safe across threads. wal_checkpoint(TRUNCATE) + VACUUM run after
evictions so the cap bounds real on-disk usage. __contains__ is
read-only. __len__ prunes corrupt rows. Schema-mismatch on open
drops tables and rebuilds; corrupt / non-SQLite files reinitialise
empty; transient OperationalError propagates without nuking the
file (and closes the partial connection).
* FileStreamProgramCache -- directory of atomically-written entries
(tmp + os.replace) safe across concurrent processes. blake2b(32)
hashed filenames so arbitrary-length keys never overflow
filesystem limits. Reader pruning, clear(), and _enforce_size_cap
are all stat-guarded (inode/size/mtime snapshot; refuse unlink on
mismatch) so a concurrent writer's os.replace is preserved.
_enforce_size_cap also decrements its running ``total`` when a
concurrent deleter wins the unlink race, so a suppressed
FileNotFoundError cannot over-evict newly committed entries.
Stale temp files swept on open; live temps count toward the size
cap. Windows ERROR_SHARING_VIOLATION (32) and ERROR_LOCK_VIOLATION
(33) on os.replace are retried with bounded backoff (~185ms)
before being treated as a non-fatal cache miss; other
PermissionError and all POSIX failures propagate.
* make_program_cache_key -- stable 32-byte blake2b digest over code,
code_type, ProgramOptions, target_type, name expressions, and
environment probes: cuda-core version, NVRTC version, NVVM lib+IR
version, linker backend+version for PTX inputs (driver version
only on the cuLink path). Backend-specific gates mirror
Program/Linker:
- code_type lower-cased to match Program_init.
- code_type/target_type validated against Program's
SUPPORTED_TARGETS matrix.
- NVRTC side-effect options (create_pch, time,
fdevice_time_trace) and external-content options
(include_path, pre_include, pch, use_pch, pch_dir) require
an extra_digest. NVVM use_libdevice=True likewise. NVRTC
options.name with a directory component (e.g. '/abs/k.cu')
also requires extra_digest (or no_source_include=True) because
NVRTC searches that directory for #include \"...\" lookups;
bare labels fall back to CWD and stay accepted.
- extra_sources rejected for non-NVVM; bytes-like ``code``
rejected for non-NVVM.
- PTX (Linker) options pass through per-field gates that match
_prepare_nvjitlink_options / _prepare_driver_options;
ptxas_options canonicalised across str/list/tuple/empty
shapes; driver-linker hard rejections (time, ptxas_options,
split_compile) raise at key time; ftz/prec_div/prec_sqrt/fma
collapse under the driver linker.
- name_expressions gated on backend == \"nvrtc\".
- Failed environment probes mix the exception class name into a
*_probe_failed label so broken environments never collide
with working ones while staying stable across processes and
repeated calls.
Lazy import: ``from cuda.core.utils import StridedMemoryView`` does
not pull in any cache backend. The cache classes and
make_program_cache_key are exposed via module __getattr__.
_LAZY_CACHE_ATTRS is a single ordered tuple spliced into __all__ via
``*_LAZY_CACHE_ATTRS`` so the two lists cannot drift; star-import
still walks __all__ and therefore resolves every lazy attribute,
which is expected given star-imports are discouraged anyway.
sqlite3 is imported lazily inside SQLiteProgramCache.__init__ so the
package is usable on interpreters built without libsqlite3.
Tests: ~200 cache tests covering single-process CRUD for all three
backends; LRU/size-cap (logical and on-disk, including stat-guarded
race scenarios); over-eviction race (monkeypatched Path.unlink);
InMemory combined caps, overwrite-updates-size, LRU-touch-on-read,
contains-does-not-bump, degenerate caps (single entry > cap,
max_entries=0); NVRTC source-directory path-name guard with
POSIX/Windows separators and both accept paths; corruption +
__len__ pruning; schema-mismatch table-DROP; threaded SQLite and
InMemory (4 writers + 4 readers x 200 ops); cross-process
FileStream stress (writer/reader race exercising the stat-guard
prune; clear/eviction race injection via generator cleanup);
Windows vs POSIX PermissionError narrowing (winerror 32/33 swallow
+ retry, others propagate; partial-conn close on OperationalError);
lazy-import subprocess test; _SUPPORTED_TARGETS_BY_CODE_TYPE parity
test that parses _program.pyx via tokenize + ast.literal_eval; and
end-to-end real CUDA C++ compile -> store -> reopen -> get_kernel
roundtrip parametrized over the two persistent backends.
Closes #177
Closes #178
Closes #1791 parent 19fac32 commit fce123f
6 files changed
Lines changed: 4618 additions & 8 deletions
File tree
- cuda_core
- cuda/core
- utils
- docs/source
- tests
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
0 commit comments