Parent tracker: #69
Design context: #65
Summary
Persist enough prompt-token prefix metadata in agentic-api to decide whether a cached prefix can be safely replayed for a Responses or Conversation continuation.
Scope
- Store prefix metadata with the response or conversation checkpoint.
- Include model, tokenizer, renderer/template, effective instructions/tools identity, prefix hash, prefix token count, and safe-boundary proof.
- Add strict-prefix validation against a fresh full render before replay is enabled.
- Replace Harmony-specific boundary checks with renderer/template-specific safe-boundary proof.
- Measure latest-span lookup and span persistence overhead in the state store.
Acceptance criteria
- Replay planning can prove that a cached prefix still matches the active model-visible history.
- Replay is refused when model/tokenizer/renderer/template/instructions/tools fingerprints differ.
- Tests cover strict-prefix validation and unsafe append-boundary rejection.
- Storage lookup/persistence overhead is measured and reported.
Parent tracker: #69
Design context: #65
Summary
Persist enough prompt-token prefix metadata in
agentic-apito decide whether a cached prefix can be safely replayed for a Responses or Conversation continuation.Scope
Acceptance criteria