Skip to content

[FEATURE] Explicit model-cache control on deploy/delete (--from-cache/--skip-cache, --save-to-cache) #722

@Defilan

Description

@Defilan

Feature Description

Make the model cache explicit and controllable on deploy and delete, so that redeploying a model reuses the already-downloaded GGUF instead of re-fetching it, and deleting a deployment does not force a re-download next time.

Requested shape:

llmkube delete <name> --save-to-cache        # delete the deployment but keep the cached model file
llmkube deploy <name> --gpu --from-cache      # reuse the cached file (the default)
llmkube deploy <name> --gpu --skip-cache      # force a fresh download

Problem Statement

As an operator iterating on deployments, I want delete + redeploy to be fast by reusing the cached model file, so that I am not re-downloading multi-GB GGUFs every time I rebuild a deployment.

Proposed Solution

The controller already caches by content key (computeCacheKey(model.Spec.Source) in internal/controller/model_controller.go) and skips the download when the file is present ("Model found in cache, skipping download"), and there is a cache subcommand (list/clear/preload). So cache REUSE already happens on a hit; this request is about making it explicit and controllable:

  • deploy --skip-cache (new): force a fresh download even on a cache hit (re-validate / re-pull). Default remains cache-reuse, which --from-cache can name explicitly for clarity.
  • delete preserves the cache by default, with an opt-in --purge-cache to also remove the cached file. (Confirm/define current delete behavior: it removes the Model + InferenceService; the request is that the cached blob survives so the next deploy is a cache hit.)

This keeps the fast-rebuild path as the default and gives an escape hatch for forcing a clean pull.

Alternatives Considered

  • Rely on the implicit cache + cache clear for purging. Works, but the behavior is invisible at the deploy/delete call site; explicit flags make the fast-rebuild contract obvious and let a user force a re-download without clearing the whole cache.

Additional Context

  • Cache mechanism: model_controller.go computeCacheKey + cache-hit skip; pkg/cli/cache.go (list/clear/preload).
  • deploy currently has --skip-download (for images with the model baked in) which is a different concept from --skip-cache.

Priority

  • Medium - Nice to have (cache reuse already works on a hit; this makes it explicit and guarantees delete is cache-preserving)

Willingness to Contribute

  • Yes, I can submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions