Skip to content

Add eval cases for reminder delivery kind selection #793

@Aaronontheweb

Description

@Aaronontheweb

The reminder delivery contract (structured DeliveryKind on set_reminder) is fully implemented but missing behavioral eval cases that verify the LLM picks the correct delivery kind.

Suggested cases:

  • "remind me in X minutes to check Y" from a Slack session -> LLM selects delivery.kind = current_session
  • "when Z happens, post to #general" -> LLM selects delivery.kind = channel, transport = slack, address = #general
  • Silent audit-style task -> LLM selects delivery.kind = none or current_session with deliveryRequired = false
  • Regression case mirroring session D0AC6CKBK5K/1776697725.361339: Mode B reminder from Slack thread surfaces in the thread

Related: #690, #644, PR #692

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions