Skip to content

Cache LLM generation results by artifact hash + interpreted metadata + prompt version #35

@AK11105

Description

@AK11105

What problem does this solve?

Re-deploying the same artifact (e.g. after a server restart or a fix attempt) burns LLM API calls on identical codegen. The artifact hasn't changed, the prompt hasn't changed, but the full generate → validate → repair loop runs again from scratch.

Proposed solution

Cache LLM generation results keyed by a SHA-256 hash of:

  • artifact file bytes
  • interpreted metadata JSON (from the LLM interpretation stage, A3) — captures user answers to clarifying questions
  • PROMPT_VERSION constant in agent.py — bumped manually when system prompts change

This is more precise than hashing the artifact alone: two deploys of the same artifact with different answers to clarifying questions (e.g. "state_dict" vs "full model") produce different cache keys and different cached code.

Cache stored at ~/.inference-engine/cache/<hash>.json:

{
  "key": "<sha256>",
  "load_body": "...",
  "predict_body": "...",
  "framework": "pytorch",
  "created_at": "2026-05-14T07:00:00Z",
  "prompt_version": "3"
}

On deploy, if a cache hit exists:

  • interactive: "Cached generation found (pytorch, created 2026-05-14). Use it? [Y/n]"
  • --yes mode: always use cache, print "Using cached generation."

Cache is never expired by time — artifacts are immutable. Invalidated only when PROMPT_VERSION changes (different key, old entry is simply never matched).

Alternatives considered

No cache. Acceptable for one-off deploys but wasteful in CI pipelines that re-deploy the same artifact on every run.

Area

CLI (deploy / fix)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions