Skip to content

New idea: Multi-turn & Compositional Reasoning Evaluation #62

@sharmaanchita

Description

@sharmaanchita

Problem
The system only supports single-shot prompt-response evaluation, missing critical real-world reasoning capabilities.

Basis of issue

  1. Multi-turn conversation evaluation
  2. Compositional task design (chained reasoning)
  3. Logical consistency / trace quality metrics
  4. Stateful prompt handling

Importance

  1. Real-world AI usage is multi-turn
  2. Compositional reasoning is a core capability
  3. Modern benchmarks already evaluate this

Current implementation gap

  1. Single prompt → single response only

Implementation checklist

  1. Support for multi-step prompt sequences
  2. Scoring based on reasoning consistency across turns
  3. Optional evaluation of intermediate reasoning quality
  4. Backward compatibility with single-shot prompts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions