Skip to content

fix(evals): require tool assertions before Run in test template editor#1901

Draft
Vu-John wants to merge 1 commit intomainfrom
fix/test-run-requires-tool-assertion
Draft

fix(evals): require tool assertions before Run in test template editor#1901
Vu-John wants to merge 1 commit intomainfrom
fix/test-run-requires-tool-assertion

Conversation

@Vu-John
Copy link
Copy Markdown
Collaborator

@Vu-John Vu-John commented Apr 23, 2026

Summary

Users could start a test run with a prompt but no expected tool calls (no assertions), which is not a useful eval and matches product feedback. This enforces a clear invariant: Run requires at least one tool assertion across the prompt flow.

Changes

  • Run gating: Run stays disabled when there is no asserted expected tool on any prompt turn. Save validation is unchanged; only the run path is stricter.
  • Defense in depth: handleRunCompare returns early with a toast if the invariant is not met, so a programmatic trigger cannot bypass the UI.
  • UX: When the inline “blocked run” message is shown under the header actions, the tooltip on the Run button is not duplicated (same text was shown twice; now one primary explanation).
  • Tests: Added coverage for a valid prompt with zero tool assertions; existing prompt-empty behavior remains.

How to test

  1. Open a test case in the template editor, enter a user prompt, leave Add expected tool call empty.
  2. Confirm Run is disabled and the hint explains adding a tool assertion.
  3. Add a complete expected tool; Run enables when other prerequisites (e.g. model, suite servers) are satisfied.

Made with Cursor

- Disable Run when no expected tool call is defined on any prompt turn
- Add toast guard in handleRunCompare for the same invariant
- Avoid duplicate Run tooltip when inline blocked hint is shown
- Extend validation tests for no-assertion case

Made-with: Cursor
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 23, 2026
@chelojimenez
Copy link
Copy Markdown
Contributor

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown

Internal preview

Preview URL: https://mcp-inspector-pr-1901.up.railway.app
Deployed commit: 6aa857c
PR head commit: 23907bb
Backend target: staging fallback.
Health: ❌ Convex unreachable — see upsert-preview job logs (staging may need convex deploy)
Access is employee-only in non-production environments.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

This change introduces validation logic to require expected tool assertions before running tests in the TestTemplateEditor component. A new detection mechanism identifies whether any asserted tool calls exist across prompt turns. When valid prompts lack expected tool assertions, the Run button becomes disabled with an updated tooltip message. A new test validates this behavior by confirming the Run button disables and displays the appropriate warning text when assertions are absent. Additionally, a runtime check prevents compare runs from starting without assertions and emits a toast error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
mcpjam-inspector/client/src/components/evals/test-template-editor.tsx (1)

647-655: ⚠️ Potential issue | 🟡 Minor

Preserve prompt-validation precedence before the assertion gate.

Line 647 and Line 1197 now report “Add at least one expected tool assertion…” even when the prompt itself is empty. That changes the existing empty-prompt guidance/toast. Validate prompt completeness first, then apply the new assertion requirement.

Proposed ordering fix
-    if (!hasRequiredRunAssertions) {
-      return "Add at least one expected tool assertion before running.";
-    }
     if (!arePromptTurnsValid && editForm) {
       return (
         getPromptTurnBlockReason(editForm.promptTurns) ??
         "Fix the test configuration before running."
       );
     }
+    if (!hasRequiredRunAssertions) {
+      return "Add at least one expected tool assertion before running.";
+    }
-    if (!hasAnyAssertedToolCall(editForm.promptTurns)) {
-      toast.error("Add at least one expected tool assertion before running.");
-      return;
-    }
-
     if (!validatePromptTurns(editForm.promptTurns)) {
       toast.error(
         getPromptTurnBlockReason(editForm.promptTurns) ??
           "Fix the test configuration before running."
       );
       return;
     }
+
+    if (!hasAnyAssertedToolCall(editForm.promptTurns)) {
+      toast.error("Add at least one expected tool assertion before running.");
+      return;
+    }

Also applies to: 1197-1208

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/test-template-editor.tsx` around
lines 647 - 655, The current gate checks assertions before prompt validity
causing the "Add at least one expected tool assertion..." message to override
empty-prompt guidance; change the logic in the block that uses
hasRequiredRunAssertions, arePromptTurnsValid, editForm and
getPromptTurnBlockReason so you validate the prompt first: if editForm exists
and arePromptTurnsValid is false, return
getPromptTurnBlockReason(editForm.promptTurns) ?? "Fix the test configuration
before running." before checking hasRequiredRunAssertions; apply the same
reordered check in the other occurrence that uses these same symbols (the block
around the other start/end range).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@mcpjam-inspector/client/src/components/evals/__tests__/test-template-editor-validation.test.tsx`:
- Around line 192-220: The test mutates shared fixtures caseDoc.query and
baseIteration.testCaseSnapshot.query causing cross-test pollution; fix by using
per-test clones or resetting those fields in beforeEach: locate the test
"disables Run when there are no tool assertions" and either replace direct
assignments to caseDoc and baseIteration with cloned copies ( e.g., clone the
fixture objects before modifying ) or add explicit resets in the test file's
beforeEach to restore caseDoc.query and baseIteration.testCaseSnapshot.query to
their original values so each test gets a fresh, unmodified fixture.

---

Outside diff comments:
In `@mcpjam-inspector/client/src/components/evals/test-template-editor.tsx`:
- Around line 647-655: The current gate checks assertions before prompt validity
causing the "Add at least one expected tool assertion..." message to override
empty-prompt guidance; change the logic in the block that uses
hasRequiredRunAssertions, arePromptTurnsValid, editForm and
getPromptTurnBlockReason so you validate the prompt first: if editForm exists
and arePromptTurnsValid is false, return
getPromptTurnBlockReason(editForm.promptTurns) ?? "Fix the test configuration
before running." before checking hasRequiredRunAssertions; apply the same
reordered check in the other occurrence that uses these same symbols (the block
around the other start/end range).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0544adef-53c6-4e86-9695-1aff95f9cf70

📥 Commits

Reviewing files that changed from the base of the PR and between 456ef0f and 23907bb.

📒 Files selected for processing (2)
  • mcpjam-inspector/client/src/components/evals/__tests__/test-template-editor-validation.test.tsx
  • mcpjam-inspector/client/src/components/evals/test-template-editor.tsx

Comment on lines +192 to +220
it("disables Run when there are no tool assertions", async () => {
caseDoc.query = "Hello";
baseIteration.testCaseSnapshot.query = "Hello";

renderWithProviders(
<TestTemplateEditor
suiteId="suite-1"
selectedTestCaseId="case-1"
connectedServerNames={new Set(["srv"])}
workspaceId={null}
availableModels={[
{
provider: "openai",
model: "gpt-4",
label: "GPT-4",
} as any,
]}
/>,
);

await waitFor(() => {
expect(screen.getByDisplayValue("Hello")).toBeInTheDocument();
});

expect(screen.getByRole("button", { name: /^Run$/ })).toBeDisabled();
expect(
screen.getByText("Add at least one expected tool assertion before running."),
).toBeInTheDocument();
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid leaking prompt fixture mutations across tests.

Line 193 mutates shared caseDoc, and Line 194 mutates shared baseIteration. Since beforeEach does not reset those fields, later tests can inherit "Hello" and become order-dependent. Prefer per-test cloned fixtures or reset these fields in beforeEach.

Minimal reset option
   beforeEach(() => {
     vi.clearAllMocks();
+    caseDoc.query = "";
+    baseIteration.testCaseSnapshot.query = "";
     useMutationMock.mockReturnValue(vi.fn());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@mcpjam-inspector/client/src/components/evals/__tests__/test-template-editor-validation.test.tsx`
around lines 192 - 220, The test mutates shared fixtures caseDoc.query and
baseIteration.testCaseSnapshot.query causing cross-test pollution; fix by using
per-test clones or resetting those fields in beforeEach: locate the test
"disables Run when there are no tool assertions" and either replace direct
assignments to caseDoc and baseIteration with cloned copies ( e.g., clone the
fixture objects before modifying ) or add explicit resets in the test file's
beforeEach to restore caseDoc.query and baseIteration.testCaseSnapshot.query to
their original values so each test gets a fresh, unmodified fixture.

@Vu-John Vu-John marked this pull request as draft April 23, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants