Harden vLLM prompt-cache replay execution

Parent tracker: #69
Design context: #65

## Summary

Harden the vLLM execution path for compact prefix replay using `prompt_cache_ref + append_token_ids`.

## Scope

- Keep vLLM as the authority for rendering, tokenization, prompt-cache handle execution, and generation.
- Harden the `prompt_cache_ref + append_token_ids` path.
- Decide where marginal rendering lives: incremental Responses render API in vLLM, cached prefix IDs plus marginal messages in vLLM, or an `agentic-api` path that calls the same renderer/tokenizer source of truth.
- Scope `prompt_cache_ref` to the selected serving endpoint or cache epoch.
- Add handle miss fallback, reseeding, and restart behavior.
- Avoid returning full `prompt_token_ids` on every production turn.

## Acceptance criteria

- Handle replay either executes safely or falls back to a full render path.
- vLLM restart or handle miss behavior is deterministic and tested.
- The production path can avoid full prompt-token arrays in steady state.
- Marginal rendering ownership is documented and implemented behind an explicit interface.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden vLLM prompt-cache replay execution #70

Summary

Scope

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Harden vLLM prompt-cache replay execution #70

Description

Summary

Scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions