feat(eval): code grader multimodal — structured Content in CodeGraderInput#844
Merged
feat(eval): code grader multimodal — structured Content in CodeGraderInput#844
Conversation
…th industry patterns
- {{output}}, {{input}}, {{expected_output}} now resolve to human-readable
text instead of JSON.stringify'd message arrays
- Deprecated _text aliases ({{input_text}}, {{output_text}},
{{expected_output_text}}) still work but emit a stderr warning
- Removed outputText, inputText, expectedOutputText from CodeGraderInput
schema — code graders should extract text from Message.content using
getTextContent() from @agentv/core
- Removed EnrichedCodeGraderInput type (no longer needed)
- Updated default evaluator template to use new variable names
- Updated prompt-validator to accept both new and deprecated variable names
Closes #825
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update Claude and Pi providers to preserve non-text content blocks (images) in Message.content instead of discarding them via extractTextContent(). This enables multimodal content to flow from provider response through to evaluators. Changes: - Create shared claude-content.ts with toContentArray() and extractTextContent() used by all 3 Claude providers - Update claude-cli, claude-sdk, claude providers to use structuredContent ?? textContent pattern - Add toPiContentArray() to pi-utils.ts for Pi provider - Update pi-coding-agent convertAgentMessage() to preserve structured content - Add 23 unit tests covering content preservation, backward compat, and end-to-end multimodal flow Text-only responses still produce plain strings (no unnecessary wrapping). extractTextContent() remains available for backward compatibility. Closes #818 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Input - Add ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema as Zod discriminated union in packages/eval/src/schemas.ts - Update MessageSchema.content to accept string | Content[] (typed blocks) - Add materializeContentForGrader() in code-evaluator.ts: - Data URI images decoded and written to temp files (path, not base64) - Non-URI images pass source through as path field - Text/file blocks unchanged; string content unchanged - Lazy temp dir creation for image files, cleaned up in finally block - Export Content schemas and types from @agentv/eval - Add comprehensive unit tests for schema validation and materialization - Add integration tests for CodeEvaluator with multimodal output Closes #821 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #821
Changes
Extends CodeGraderInput with typed Content[] so code graders can inspect structured multimodal output. ContentImage blocks carry file paths (not inline base64).