Add LLM evaluation suite support with config validation and reporting by prathmeshpatel · Pull Request #1772 · MCPJam/inspector

prathmeshpatel · 2026-04-12T06:36:23Z

Summary

This PR adds comprehensive LLM evaluation suite functionality to the CLI, enabling users to define and run test cases against MCP servers with configurable agents, multiple output formats, and detailed performance metrics.

Key Changes

Evals Configuration System (evals-config.ts):
- New EvalsConfig interface supporting servers, agent configuration, test cases, and run options
- Environment variable interpolation with ${VAR} syntax throughout config files
- Comprehensive validation of config structure, required fields, and value constraints
- Support for test case matching modes (exact vs. subset tool call matching)
- loadEvalsConfig() function to load, interpolate, and validate JSON config files
Evals Runner (evals-runner.ts):
- runEvals() function orchestrating test execution against MCP servers
- Integration with @mcpjam/sdk for agent-based testing and tool matching
- Support for configurable iterations, concurrency, timeouts, and retries
- Proper resource cleanup with server disconnection
Output Formatting (evals-output.ts):
- Human-readable format with tabular test results, pass rates, and latency metrics
- JUnit XML format for CI/CD integration with proper XML escaping
- Aggregate statistics and per-test performance summaries
CLI Commands (evals.ts):
- evals run command to execute test suites with JSON/human/JUnit-XML output options
- evals validate command to validate config files without running tests
- Support for CLI overrides of iterations and concurrency settings
- Proper exit code handling for test failures
Comprehensive Test Coverage (evals-config.test.ts, evals-output.test.ts):
- 20+ unit tests covering config validation, environment variable interpolation, and output formatting
- Tests for error cases, edge cases, and XML special character escaping

Notable Implementation Details

Config validation provides detailed error messages with field paths (e.g., tests[0].prompt)
Environment variable interpolation is recursive, supporting nested objects and arrays
Model string validation uses parseLLMString() from SDK to ensure valid format
Temperature range validation (0-2) and maxSteps positive number validation
JUnit XML output includes per-iteration test cases for detailed failure tracking
Human output includes p50/p95 latency percentiles and token usage statistics

https://claude.ai/code/session_01AKW9UqD6M3RBdPdMXkEf4P

Adds `mcpjam evals run` and `mcpjam evals validate` commands backed by the SDK's EvalSuite, EvalTest, TestAgent, and MCPClientManager. Users define servers, agent config, and test cases in a single JSON config file with ${ENV_VAR} interpolation. Supports json, human, and junit-xml output formats with CLI flag overrides for iterations and concurrency. https://claude.ai/code/session_01AKW9UqD6M3RBdPdMXkEf4P

chelojimenez · 2026-04-12T06:36:34Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

coderabbitai · 2026-04-12T06:41:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b35f66a5-13d0-45f7-ad41-52ead96a5fb2

📥 Commits

Reviewing files that changed from the base of the PR and between 01a6b07 and 7768f1a.

📒 Files selected for processing (1)

cli/excalidraw-evals.json

✅ Files skipped from review due to trivial changes (1)

cli/excalidraw-evals.json

Walkthrough

Adds an evals CLI command group with run and validate subcommands. New modules provide config loading/interpolation (loadEvalsConfig, interpolateEnvVars, validateEvalsConfig), test execution (runEvals) using MCP clients and a TestAgent, and output formatting (json, human, junit-xml). run accepts iteration/concurrency overrides and writes results in the chosen format, setting exit code 1 on test failures; validate checks configs and reports validation errors. Unit tests cover config validation, env interpolation, and output formatting.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

cli/tests/evals-config.test.ts (1)

188-211: Add tests for options validation and non-finite numeric inputs.

The suite currently skips the options.iterations/concurrency/timeoutMs/retries validation paths and edge values like NaN/Infinity, which are worth pinning once validation is tightened.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cli/tests/evals-config.test.ts` around lines 188 - 211, Add unit tests
covering validation of the options object and non-finite numeric inputs for
validateEvalsConfig: create tests that assert validateEvalsConfig rejects
out-of-range or missing options.iterations, options.concurrency,
options.timeoutMs, and options.retries (and accepts valid edge values), and
separate tests that pass NaN/Infinity for numeric fields (e.g.,
agent.temperature, options.*) and assert a CliError is thrown mentioning the
field; use the existing test patterns (assert.throws with CliError checks) and
the function validateEvalsConfig plus the test names like "validateEvalsConfig
rejects temperature out of range" as references to add these new tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/commands/evals.ts`:
- Around line 93-110: The validate flow accepts "junit-xml" but never emits XML;
update the validate branch in evals.ts (the block using parseEvalsFormat,
loadEvalsConfig, summary, and writeResult) to handle format === "junit-xml":
either (preferred) implement a small serializer (e.g., writeJUnitXml(summary))
that converts the summary into a minimal JUnit XML testsuite/testcase structure
and writes it to stdout, then call that when format === "junit-xml", or change
parseEvalsFormat to reject "junit-xml" for the validate command so it errors
earlier; reference the parseEvalsFormat, loadEvalsConfig, writeResult, and the
local format/summary variables when making the change.

In `@cli/src/lib/evals-config.ts`:
- Around line 177-182: The current validation only checks that config.options is
an object; enhance validation inside the same block (where config.options is
referenced) to validate individual fields (e.g., config.options.iterations,
config.options.concurrency, config.options.timeout, etc.): ensure iterations is
a positive integer (>0), concurrency is a positive integer (>0),
timeouts/durations are finite non-negative numbers, and reject wrong types
(e.g., string "3") by throwing usageError with a clear message; implement these
checks directly after the existing typeof/Array.isArray guard in evals-config.ts
so invalid option values are caught early and rejected before runtime.
- Around line 126-135: The validation for agent.maxSteps and agent.temperature
currently allows NaN and Infinity; update the checks in the validation logic
that throws usageError to also require numeric finiteness (use Number.isFinite
or isFinite) for both agent.maxSteps and agent.temperature so that non-finite
values are rejected—specifically modify the conditions around agent.maxSteps and
agent.temperature (the same blocks that call usageError) to include a finiteness
check before accepting the value.

---

Nitpick comments:
In `@cli/tests/evals-config.test.ts`:
- Around line 188-211: Add unit tests covering validation of the options object
and non-finite numeric inputs for validateEvalsConfig: create tests that assert
validateEvalsConfig rejects out-of-range or missing options.iterations,
options.concurrency, options.timeoutMs, and options.retries (and accepts valid
edge values), and separate tests that pass NaN/Infinity for numeric fields
(e.g., agent.temperature, options.*) and assert a CliError is thrown mentioning
the field; use the existing test patterns (assert.throws with CliError checks)
and the function validateEvalsConfig plus the test names like
"validateEvalsConfig rejects temperature out of range" as references to add
these new tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e44006f-da7a-4082-ab10-eea7d52feb27

📥 Commits

Reviewing files that changed from the base of the PR and between f01908e and cee1943.

📒 Files selected for processing (7)

cli/src/commands/evals.ts
cli/src/index.ts
cli/src/lib/evals-config.ts
cli/src/lib/evals-output.ts
cli/src/lib/evals-runner.ts
cli/tests/evals-config.test.ts
cli/tests/evals-output.test.ts

coderabbitai · 2026-04-12T06:41:42Z

+      const format = parseEvalsFormat(command);
+
+      try {
+        const config = loadEvalsConfig(options.config);
+        const summary = {
+          valid: true,
+          servers: Object.keys(config.servers).length,
+          tests: config.tests.length,
+          model: config.agent.model,
+        };
+
+        if (format === "human") {
+          process.stdout.write(
+            `Config valid: ${summary.servers} server(s), ${summary.tests} test(s), model ${summary.model}\n`,
+          );
+        } else {
+          writeResult(summary);
+        }


⚠️ Potential issue | 🟡 Minor

evals validate --format junit-xml is accepted but not actually produced.

The validate path parses junit-xml as valid but emits JSON/human only. Either reject junit-xml for validate, or implement XML output for this command.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cli/src/commands/evals.ts` around lines 93 - 110, The validate flow accepts "junit-xml" but never emits XML; update the validate branch in evals.ts (the block using parseEvalsFormat, loadEvalsConfig, summary, and writeResult) to handle format === "junit-xml": either (preferred) implement a small serializer (e.g., writeJUnitXml(summary)) that converts the summary into a minimal JUnit XML testsuite/testcase structure and writes it to stdout, then call that when format === "junit-xml", or change parseEvalsFormat to reject "junit-xml" for the validate command so it errors earlier; reference the parseEvalsFormat, loadEvalsConfig, writeResult, and the local format/summary variables when making the change.

coderabbitai · 2026-04-12T06:41:42Z

+  if (agent.maxSteps !== undefined && (typeof agent.maxSteps !== "number" || agent.maxSteps <= 0)) {
+    throw usageError('"agent.maxSteps" must be a positive number.');
+  }
+
+  if (
+    agent.temperature !== undefined &&
+    (typeof agent.temperature !== "number" || agent.temperature < 0 || agent.temperature > 2)
+  ) {
+    throw usageError('"agent.temperature" must be a number between 0 and 2.');
+  }


⚠️ Potential issue | 🟡 Minor

Guard against non-finite numeric values in agent settings.

NaN/Infinity currently pass these checks and can propagate into agent runtime settings.

Suggested fix

- if (agent.maxSteps !== undefined && (typeof agent.maxSteps !== "number" || agent.maxSteps <= 0)) { + if ( + agent.maxSteps !== undefined && + (!Number.isFinite(agent.maxSteps) || agent.maxSteps <= 0) + ) { throw usageError('"agent.maxSteps" must be a positive number.'); } @@ - (typeof agent.temperature !== "number" || agent.temperature < 0 || agent.temperature > 2) + (!Number.isFinite(agent.temperature) || agent.temperature < 0 || agent.temperature > 2) ) { throw usageError('"agent.temperature" must be a number between 0 and 2.'); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cli/src/lib/evals-config.ts` around lines 126 - 135, The validation for agent.maxSteps and agent.temperature currently allows NaN and Infinity; update the checks in the validation logic that throws usageError to also require numeric finiteness (use Number.isFinite or isFinite) for both agent.maxSteps and agent.temperature so that non-finite values are rejected—specifically modify the conditions around agent.maxSteps and agent.temperature (the same blocks that call usageError) to include a finiteness check before accepting the value.

coderabbitai · 2026-04-12T06:41:42Z

+  // options (optional)
+  if (config.options !== undefined) {
+    if (typeof config.options !== "object" || Array.isArray(config.options)) {
+      throw usageError('"options" must be an object.');
+    }
+  }


⚠️ Potential issue | 🟠 Major

Validate options fields, not only options shape.

options is accepted as any object, so invalid values like iterations: 0, concurrency: "3", or negative/NaN timeouts can reach runtime execution.

Suggested fix

// options (optional) if (config.options !== undefined) { if (typeof config.options !== "object" || Array.isArray(config.options)) { throw usageError('"options" must be an object.'); } + const options = config.options as Record<string, unknown>; + if ( + options.iterations !== undefined && + (!Number.isInteger(options.iterations) || (options.iterations as number) <= 0) + ) { + throw usageError('"options.iterations" must be a positive integer.'); + } + if ( + options.concurrency !== undefined && + (!Number.isInteger(options.concurrency) || (options.concurrency as number) <= 0) + ) { + throw usageError('"options.concurrency" must be a positive integer.'); + } + if ( + options.timeoutMs !== undefined && + (!Number.isFinite(options.timeoutMs) || (options.timeoutMs as number) <= 0) + ) { + throw usageError('"options.timeoutMs" must be a positive number.'); + } + if ( + options.retries !== undefined && + (!Number.isInteger(options.retries) || (options.retries as number) < 0) + ) { + throw usageError('"options.retries" must be a non-negative integer.'); + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// options (optional)

if (config.options !== undefined) {

if (typeof config.options !== "object" || Array.isArray(config.options)) {

throw usageError('"options" must be an object.');

}

}

// options (optional)

if (config.options !== undefined) {

if (typeof config.options !== "object" || Array.isArray(config.options)) {

throw usageError('"options" must be an object.');

}

const options = config.options as Record<string, unknown>;

if (

options.iterations !== undefined &&

(!Number.isInteger(options.iterations) || (options.iterations as number) <= 0)

) {

throw usageError('"options.iterations" must be a positive integer.');

}

if (

options.concurrency !== undefined &&

(!Number.isInteger(options.concurrency) || (options.concurrency as number) <= 0)

) {

throw usageError('"options.concurrency" must be a positive integer.');

}

if (

options.timeoutMs !== undefined &&

(!Number.isFinite(options.timeoutMs) || (options.timeoutMs as number) <= 0)

) {

throw usageError('"options.timeoutMs" must be a positive number.');

}

if (

options.retries !== undefined &&

(!Number.isInteger(options.retries) || (options.retries as number) < 0)

) {

throw usageError('"options.retries" must be a non-negative integer.');

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cli/src/lib/evals-config.ts` around lines 177 - 182, The current validation only checks that config.options is an object; enhance validation inside the same block (where config.options is referenced) to validate individual fields (e.g., config.options.iterations, config.options.concurrency, config.options.timeout, etc.): ensure iterations is a positive integer (>0), concurrency is a positive integer (>0), timeouts/durations are finite non-negative numbers, and reject wrong types (e.g., string "3") by throwing usageError with a clear message; implement these checks directly after the existing typeof/Array.isArray guard in evals-config.ts so invalid option values are caught early and rejected before runtime.

prathmeshpatel · 2026-04-12T06:52:11Z

will keep iterating tmr

When tests fail, the runner now throws descriptive errors showing expected vs actual tool calls and any prompt-level errors (model failures, connection issues). The human formatter shows a deduplicated Failures section so users can immediately see what went wrong. https://claude.ai/code/session_01AKW9UqD6M3RBdPdMXkEf4P

Example config for running evals against the Excalidraw MCP server. Tests cover simple shapes, flowcharts, colored elements, and architecture diagrams — all expecting read_me + create_view calls. https://claude.ai/code/session_01AKW9UqD6M3RBdPdMXkEf4P

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

cli/tests/evals-output.test.ts (1)

143-149: Strengthen XML-escaping assertion for quotes.

The input includes ", but the assertion only checks <, >, and &. Add a check for " to catch regressions in attribute escaping.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cli/tests/evals-output.test.ts` around lines 143 - 149, The test
"formatEvalsJUnit escapes XML special characters" currently asserts that
formatEvalsJUnit(result, 'Suite <"test"> & more') escapes <, > and &, but
doesn't check for escaped quotes; update the assertion in
cli/tests/evals-output.test.ts to also verify that the output includes &quot;
(i.e., ensure output.includes("&quot;") is checked) so that formatEvalsJUnit's
attribute-quoting behavior is covered; reference the test helper
makeSuiteResult() and the function under test formatEvalsJUnit when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/lib/evals-runner.ts`:
- Around line 91-93: The finally block currently awaits
manager.disconnectAllServers() which can throw and overwrite the original error
from the try; modify the finally in the eval runner (where
manager.disconnectAllServers() is called) to wrap the disconnect in its own
try/catch, store the original error thrown from the try block (if any), and if
the cleanup throws as well, aggregate both errors (e.g., attach cleanup error as
a cause or append its message) and rethrow the original error with the cleanup
failure attached so the original eval failure is preserved; ensure this logic
references manager.disconnectAllServers() and the surrounding eval invocation
scope so the original exception is rethrown (with cleanup details) rather than
being replaced by the cleanup exception.

In `@cli/tests/evals-output.test.ts`:
- Around line 51-52: The mocked test summaries for the "echo" entry have
successes/failures counters that don't match the per-iteration data in
results/iterationDetails; update the makeTestResult calls (the tests.set("echo",
makeTestResult(...)) entries) so the successes and failures values reflect the
actual counts derived from results/iterationDetails (e.g., if one iteration in
iterationDetails is failing, set failures: 1 and successes accordingly). Repeat
the same correction for the second occurrence around the other makeTestResult
call (the block referenced in the comment).

---

Nitpick comments:
In `@cli/tests/evals-output.test.ts`:
- Around line 143-149: The test "formatEvalsJUnit escapes XML special
characters" currently asserts that formatEvalsJUnit(result, 'Suite <"test"> &
more') escapes <, > and &, but doesn't check for escaped quotes; update the
assertion in cli/tests/evals-output.test.ts to also verify that the output
includes &quot; (i.e., ensure output.includes("&quot;") is checked) so that
formatEvalsJUnit's attribute-quoting behavior is covered; reference the test
helper makeSuiteResult() and the function under test formatEvalsJUnit when
making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 890c8bbe-cd41-4e31-868e-cb10f173de8a

📥 Commits

Reviewing files that changed from the base of the PR and between cee1943 and 01a6b07.

📒 Files selected for processing (3)

cli/src/lib/evals-output.ts
cli/src/lib/evals-runner.ts
cli/tests/evals-output.test.ts

✅ Files skipped from review due to trivial changes (1)

cli/src/lib/evals-output.ts

coderabbitai · 2026-04-12T06:59:27Z

+  } finally {
+    await manager.disconnectAllServers();
+  }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd -t f "evals-runner.ts" --exec cat -n {} \;

Repository: MCPJam/inspector

Length of output: 50373

🏁 Script executed:

rg -n "runEvals|disconnectAllServers" cli/src/lib/evals-runner.ts

Repository: MCPJam/inspector

Length of output: 139

🏁 Script executed:

sed -n '17,100p' cli/src/lib/evals-runner.ts

Repository: MCPJam/inspector

Length of output: 2362

Preserve the original eval failure when cleanup also fails.

If disconnectAllServers() throws, it will overwrite the original error from the try block due to JavaScript finally semantics, masking critical diagnostics from the eval run. Wrap the cleanup in its own try/catch, capture the original error, and aggregate both if cleanup also fails.

Proposed fix

export async function runEvals( config: EvalsConfig, overrides: RunOverrides, ): Promise<EvalSuiteResult> { const manager = new MCPClientManager(config.servers); + let runError: unknown; try { const tools = await manager.getTools(); const agent = new TestAgent({ tools, model: config.agent.model, apiKey: config.agent.apiKey, systemPrompt: config.agent.systemPrompt, temperature: config.agent.temperature, maxSteps: config.agent.maxSteps, mcpClientManager: manager, }); const suite = new EvalSuite({ name: config.mcpjam?.suiteName ?? "CLI Eval Suite", mcpjam: config.mcpjam, }); for (const testCase of config.tests) { const matchMode = testCase.matchMode ?? "subset"; const expected = testCase.expectedToolCalls; const evalTest = new EvalTest({ name: testCase.name, test: async (evalAgent) => { const result = await evalAgent.prompt(testCase.prompt); // Surface prompt-level errors (model failures, connection issues) if (result.hasError()) { throw new Error(`Prompt error: ${result.getError()}`); } const called = result.toolsCalled(); const matched = matchMode === "exact" ? matchToolCalls(expected, called) : matchToolCallsSubset(expected, called); if (!matched) { const actualStr = called.length > 0 ? called.join(", ") : "(none)"; const expectedStr = expected.join(", "); throw new Error( `Tool call mismatch (${matchMode}): expected [${expectedStr}], got [${actualStr}]`, ); } return true; }, expectedToolCalls: expected.map((toolName) => ({ toolName })), }); suite.add(evalTest); } const iterations = overrides.iterations ?? config.options?.iterations ?? 5; const concurrency = overrides.concurrency ?? config.options?.concurrency ?? 3; const timeoutMs = config.options?.timeoutMs ?? 30_000; const retries = config.options?.retries ?? 0; const result = await suite.run(agent, { iterations, concurrency, timeoutMs, retries, }); return result; + } catch (error) { + runError = error; + throw error; } finally { - await manager.disconnectAllServers(); + try { + await manager.disconnectAllServers(); + } catch (disconnectError) { + if (runError) { + throw new AggregateError( + [runError, disconnectError], + "Eval run failed and cleanup also failed", + ); + } + throw disconnectError; + } } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cli/src/lib/evals-runner.ts` around lines 91 - 93, The finally block currently awaits manager.disconnectAllServers() which can throw and overwrite the original error from the try; modify the finally in the eval runner (where manager.disconnectAllServers() is called) to wrap the disconnect in its own try/catch, store the original error thrown from the try block (if any), and if the cleanup throws as well, aggregate both errors (e.g., attach cleanup error as a cause or append its message) and rethrow the original error with the cleanup failure attached so the original eval failure is preserved; ensure this logic references manager.disconnectAllServers() and the surrounding eval invocation scope so the original exception is rethrown (with cleanup details) rather than being replaced by the cleanup exception.

coderabbitai · 2026-04-12T06:59:27Z

+  tests.set("echo", makeTestResult({ successes: 5, failures: 0 }));
+


⚠️ Potential issue | 🟡 Minor

Keep mocked pass/fail counters consistent with iteration-level data.

At Line 51 and Line 114, the overrides set successes/failures to all-pass, but results and iterationDetails still contain a failing iteration. This can let tests pass for the wrong reason.

💡 Proposed fix

+function makeAllPassTestResult(): EvalRunResult { + const base = makeTestResult(); + return { + ...base, + successes: base.iterations, + failures: 0, + results: Array(base.iterations).fill(true), + iterationDetails: base.iterationDetails.map(({ latencies, tokens }) => ({ + passed: true, + latencies, + tokens, + })), + }; +} @@ - tests.set("echo", makeTestResult({ successes: 5, failures: 0 })); + tests.set("echo", makeAllPassTestResult()); @@ - result.tests.set("addition", makeTestResult({ successes: 5, failures: 0 })); + result.tests.set("addition", makeAllPassTestResult());

Also applies to: 111-115

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cli/tests/evals-output.test.ts` around lines 51 - 52, The mocked test summaries for the "echo" entry have successes/failures counters that don't match the per-iteration data in results/iterationDetails; update the makeTestResult calls (the tests.set("echo", makeTestResult(...)) entries) so the successes and failures values reflect the actual counts derived from results/iterationDetails (e.g., if one iteration in iterationDetails is failing, set failures: 1 and successes accordingly). Repeat the same correction for the second occurrence around the other makeTestResult call (the block referenced in the comment).

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 12, 2026

prathmeshpatel requested a review from chelojimenez April 12, 2026 06:36

dosubot Bot added the enhancement New feature or request label Apr 12, 2026

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

claude added 2 commits April 12, 2026 06:53

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM evaluation suite support with config validation and reporting#1772

Add LLM evaluation suite support with config validation and reporting#1772
prathmeshpatel wants to merge 3 commits intomainfrom
claude/refine-local-plan-JtxuU

prathmeshpatel commented Apr 12, 2026

Uh oh!

chelojimenez commented Apr 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

prathmeshpatel commented Apr 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

coderabbitai Bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  // options (optional)
-  if (config.options !== undefined) {
-    if (typeof config.options !== "object" || Array.isArray(config.options)) {
-      throw usageError('"options" must be an object.');
-    }
-  }
+  // options (optional)
+  if (config.options !== undefined) {
+    if (typeof config.options !== "object" || Array.isArray(config.options)) {
+      throw usageError('"options" must be an object.');
+    }
+    const options = config.options as Record<string, unknown>;
+    if (
+      options.iterations !== undefined &&
+      (!Number.isInteger(options.iterations) || (options.iterations as number) <= 0)
+    ) {
+      throw usageError('"options.iterations" must be a positive integer.');
+    }
+    if (
+      options.concurrency !== undefined &&
+      (!Number.isInteger(options.concurrency) || (options.concurrency as number) <= 0)
+    ) {
+      throw usageError('"options.concurrency" must be a positive integer.');
+    }
+    if (
+      options.timeoutMs !== undefined &&
+      (!Number.isFinite(options.timeoutMs) || (options.timeoutMs as number) <= 0)
+    ) {
+      throw usageError('"options.timeoutMs" must be a positive number.');
+    }
+    if (
+      options.retries !== undefined &&
+      (!Number.isInteger(options.retries) || (options.retries as number) < 0)
+    ) {
+      throw usageError('"options.retries" must be a non-negative integer.');
+    }
+  }

		tests.set("echo", makeTestResult({ successes: 5, failures: 0 }));

Uh oh!

Conversation

prathmeshpatel commented Apr 12, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

chelojimenez commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

prathmeshpatel commented Apr 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chelojimenez commented Apr 12, 2026 •

edited

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading