Skip to content

feat(eval): code grader multimodal — structured Content in CodeGraderInput#844

Merged
christso merged 3 commits intomainfrom
feat/821-code-grader-mm
Mar 29, 2026
Merged

feat(eval): code grader multimodal — structured Content in CodeGraderInput#844
christso merged 3 commits intomainfrom
feat/821-code-grader-mm

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Closes #821

Changes

Extends CodeGraderInput with typed Content[] so code graders can inspect structured multimodal output. ContentImage blocks carry file paths (not inline base64).

  • Added ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema to @agentv/eval
  • Updated MessageSchema.content from loose unknown to typed string | Content[]
  • materializeContentForGrader() converts data URIs to temp file paths for the grader payload
  • Exported all Content schemas from eval package
  • 21 new tests

christso and others added 3 commits March 29, 2026 04:34
…th industry patterns

- {{output}}, {{input}}, {{expected_output}} now resolve to human-readable
  text instead of JSON.stringify'd message arrays
- Deprecated _text aliases ({{input_text}}, {{output_text}},
  {{expected_output_text}}) still work but emit a stderr warning
- Removed outputText, inputText, expectedOutputText from CodeGraderInput
  schema — code graders should extract text from Message.content using
  getTextContent() from @agentv/core
- Removed EnrichedCodeGraderInput type (no longer needed)
- Updated default evaluator template to use new variable names
- Updated prompt-validator to accept both new and deprecated variable names

Closes #825

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update Claude and Pi providers to preserve non-text content blocks
(images) in Message.content instead of discarding them via
extractTextContent(). This enables multimodal content to flow from
provider response through to evaluators.

Changes:
- Create shared claude-content.ts with toContentArray() and
  extractTextContent() used by all 3 Claude providers
- Update claude-cli, claude-sdk, claude providers to use
  structuredContent ?? textContent pattern
- Add toPiContentArray() to pi-utils.ts for Pi provider
- Update pi-coding-agent convertAgentMessage() to preserve
  structured content
- Add 23 unit tests covering content preservation, backward
  compat, and end-to-end multimodal flow

Text-only responses still produce plain strings (no unnecessary
wrapping). extractTextContent() remains available for backward
compatibility.

Closes #818

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Input

- Add ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema
  as Zod discriminated union in packages/eval/src/schemas.ts
- Update MessageSchema.content to accept string | Content[] (typed blocks)
- Add materializeContentForGrader() in code-evaluator.ts:
  - Data URI images decoded and written to temp files (path, not base64)
  - Non-URI images pass source through as path field
  - Text/file blocks unchanged; string content unchanged
- Lazy temp dir creation for image files, cleaned up in finally block
- Export Content schemas and types from @agentv/eval
- Add comprehensive unit tests for schema validation and materialization
- Add integration tests for CodeEvaluator with multimodal output

Closes #821

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@christso christso merged commit 468ff01 into main Mar 29, 2026
1 of 2 checks passed
@christso christso deleted the feat/821-code-grader-mm branch March 29, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(eval): code grader multimodal input — structured Content in CodeGraderInput

1 participant