The reminder delivery contract (structured DeliveryKind on set_reminder) is fully implemented but missing behavioral eval cases that verify the LLM picks the correct delivery kind.
Suggested cases:
- "remind me in X minutes to check Y" from a Slack session -> LLM selects delivery.kind = current_session
- "when Z happens, post to #general" -> LLM selects delivery.kind = channel, transport = slack, address = #general
- Silent audit-style task -> LLM selects delivery.kind = none or current_session with deliveryRequired = false
- Regression case mirroring session D0AC6CKBK5K/1776697725.361339: Mode B reminder from Slack thread surfaces in the thread
Related: #690, #644, PR #692
The reminder delivery contract (structured DeliveryKind on set_reminder) is fully implemented but missing behavioral eval cases that verify the LLM picks the correct delivery kind.
Suggested cases:
Related: #690, #644, PR #692