Feature Description
Make the model cache explicit and controllable on deploy and delete, so that redeploying a model reuses the already-downloaded GGUF instead of re-fetching it, and deleting a deployment does not force a re-download next time.
Requested shape:
llmkube delete <name> --save-to-cache # delete the deployment but keep the cached model file
llmkube deploy <name> --gpu --from-cache # reuse the cached file (the default)
llmkube deploy <name> --gpu --skip-cache # force a fresh download
Problem Statement
As an operator iterating on deployments, I want delete + redeploy to be fast by reusing the cached model file, so that I am not re-downloading multi-GB GGUFs every time I rebuild a deployment.
Proposed Solution
The controller already caches by content key (computeCacheKey(model.Spec.Source) in internal/controller/model_controller.go) and skips the download when the file is present ("Model found in cache, skipping download"), and there is a cache subcommand (list/clear/preload). So cache REUSE already happens on a hit; this request is about making it explicit and controllable:
deploy --skip-cache (new): force a fresh download even on a cache hit (re-validate / re-pull). Default remains cache-reuse, which --from-cache can name explicitly for clarity.
delete preserves the cache by default, with an opt-in --purge-cache to also remove the cached file. (Confirm/define current delete behavior: it removes the Model + InferenceService; the request is that the cached blob survives so the next deploy is a cache hit.)
This keeps the fast-rebuild path as the default and gives an escape hatch for forcing a clean pull.
Alternatives Considered
- Rely on the implicit cache +
cache clear for purging. Works, but the behavior is invisible at the deploy/delete call site; explicit flags make the fast-rebuild contract obvious and let a user force a re-download without clearing the whole cache.
Additional Context
- Cache mechanism:
model_controller.go computeCacheKey + cache-hit skip; pkg/cli/cache.go (list/clear/preload).
deploy currently has --skip-download (for images with the model baked in) which is a different concept from --skip-cache.
Priority
Willingness to Contribute
Feature Description
Make the model cache explicit and controllable on
deployanddelete, so that redeploying a model reuses the already-downloaded GGUF instead of re-fetching it, and deleting a deployment does not force a re-download next time.Requested shape:
Problem Statement
Proposed Solution
The controller already caches by content key (
computeCacheKey(model.Spec.Source)ininternal/controller/model_controller.go) and skips the download when the file is present ("Model found in cache, skipping download"), and there is acachesubcommand (list/clear/preload). So cache REUSE already happens on a hit; this request is about making it explicit and controllable:deploy --skip-cache(new): force a fresh download even on a cache hit (re-validate / re-pull). Default remains cache-reuse, which--from-cachecan name explicitly for clarity.deletepreserves the cache by default, with an opt-in--purge-cacheto also remove the cached file. (Confirm/define currentdeletebehavior: it removes the Model + InferenceService; the request is that the cached blob survives so the next deploy is a cache hit.)This keeps the fast-rebuild path as the default and gives an escape hatch for forcing a clean pull.
Alternatives Considered
cache clearfor purging. Works, but the behavior is invisible at thedeploy/deletecall site; explicit flags make the fast-rebuild contract obvious and let a user force a re-download without clearing the whole cache.Additional Context
model_controller.gocomputeCacheKey+ cache-hit skip;pkg/cli/cache.go(list/clear/preload).deploycurrently has--skip-download(for images with the model baked in) which is a different concept from--skip-cache.Priority
Willingness to Contribute