Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: Bug report
description: Report a bug or issue with the Apify MCP server
title: "[Bug]: "
labels: ["bug"]
labels: ["bug","t-ai"]
body:
- type: dropdown
id: server-type
Expand Down
51 changes: 51 additions & 0 deletions .github/workflows/_smoke_test_npm_package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Smoke test npm package

on:
workflow_call:
inputs:
npm_tag:
description: NPM dist-tag to install (e.g. "beta", "latest"). Ignored when version is set.
required: false
type: string
default: latest
version:
description: Exact version to install (e.g. "1.2.3"). Takes precedence over npm_tag.
required: false
type: string
default: ""

jobs:
smoke_test:
name: Smoke test published package
runs-on: ubuntu-latest
steps:
- name: Use Node.js
uses: actions/setup-node@v6
with:
node-version: 22

- name: Install published package
uses: nick-fields/retry@v4
env:
PKG_VERSION: ${{ inputs.version || inputs.npm_tag }}
with:
timeout_minutes: 5
max_attempts: 3
retry_wait_seconds: 30
command: |

Check warning on line 35 in .github/workflows/_smoke_test_npm_package.yaml

View check run for this annotation

Claude / Claude Code Review

@apify/mcpc unpinned in smoke test

The smoke test installs `@apify/mcpc` without a version pin (`npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc`), so it always installs the latest published version. If `@apify/mcpc` releases a breaking change (renamed CLI flags, changed output format), the smoke test would fail even when `actors-mcp-server` itself is perfectly healthy, creating false CI failures that could block releases. Pinning to a known-good version (e.g. `@apify/mcpc@0.2.0` or `@apify/mcpc@^0.2.0`) would m
Comment on lines +31 to +35
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The smoke test installs @apify/mcpc without a version pin (npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc), so it always installs the latest published version. If @apify/mcpc releases a breaking change (renamed CLI flags, changed output format), the smoke test would fail even when actors-mcp-server itself is perfectly healthy, creating false CI failures that could block releases. Pinning to a known-good version (e.g. @apify/mcpc@0.2.0 or @apify/mcpc@^0.2.0) would make the test deterministic.

Extended reasoning...

What the bug is and how it manifests

In .github/workflows/_smoke_test_npm_package.yaml (line 35), the install command is:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc

@apify/mcpc has no version constraint, so npm always resolves and installs the latest published version. This means the smoke-test environment is non-deterministic: different CI runs can silently pull in different versions of the test harness.

The specific code path that triggers it

The smoke test creates a fresh smoke-test/ directory, runs npm init -y, and installs both packages from scratch. Because @apify/mcpc is not pinned (unlike the main package under test, which is pinned to ${PKG_VERSION}), any new release of @apify/mcpc published between two CI runs will be picked up automatically. The subsequent smoke commands—npx mcpc connect, npx mcpc @dev ping, npx mcpc --json @dev tools-list, etc.—all depend on the CLI API of whichever version was installed.

Why existing code does not prevent it

The project's own package.json pins @apify/mcpc at ^0.2.0 in devDependencies, demonstrating that the team already values version stability for this tool. However, the smoke-test workflow bypasses package.json entirely: it sets up a standalone directory with its own package.json (via npm init -y) and installs packages directly. The retry mechanism (max_attempts: 3) mitigates transient npm registry failures but does nothing about determinism across versions.

What the impact would be

If @apify/mcpc 0.3.0 (hypothetically) renames the connect subcommand, changes the --json flag semantics, or alters the tools-list output format, the smoke test would fail at step 2 or 3—even though @apify/actors-mcp-server itself is perfectly healthy. This would block the beta or stable release pipeline. Because the failure comes from the test harness rather than the package under test, it could be confusing to diagnose, especially in time-sensitive release windows.

How to fix it

Pin @apify/mcpc to the same range already used in devDependencies:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" "@apify/mcpc@^0.2.0"

Or use an exact version for maximum reproducibility:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" "@apify/mcpc@0.2.0"

The refutation argument—that @apify/mcpc is an Apify-owned package and won't introduce uncoordinated breaking changes—is reasonable and lowers the practical risk. However, the same logic would argue against pinning anything, which contradicts the team's existing practice. Even for internal tools, a pinned test harness version is a best-practice that prevents CI surprise failures and makes version-upgrade decisions explicit and intentional.

mkdir -p smoke-test && cd smoke-test
npm init -y
npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc

- name: Smoke test MCP server via mcpc
working-directory: smoke-test
run: |
echo '{"mcpServers":{"dev":{"command":"npx","args":["actors-mcp-server","--tools","docs,search-actors,fetch-actor-details","--telemetry-enabled","false"]}}}' > .mcp.json
npx mcpc connect .mcp.json:dev @dev
npx mcpc @dev ping
npx mcpc --json @dev tools-list | jq -e 'length > 0'
npx mcpc --json @dev tools-call search-actors keywords:="web scraper" | jq -e '.content[0].text'
npx mcpc --json @dev tools-call fetch-actor-details actor:="apify/web-scraper" | jq -e '.content[0].text'
npx mcpc --json @dev tools-call search-apify-docs query:="web scraping" docSource:="apify" | jq -e '.content[0].text'
npx mcpc --json @dev tools-call fetch-apify-docs url:="https://docs.apify.com/platform/actors" | jq -e '.content[0].text'
npx mcpc @dev close
7 changes: 7 additions & 0 deletions .github/workflows/manual_release_stable.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,16 @@
"tag": "latest"
}

smoke_test:
name: Smoke test published package
needs: [ release_metadata, publish_to_npm ]
uses: ./.github/workflows/_smoke_test_npm_package.yaml
with:
version: ${{ needs.release_metadata.outputs.version_number }}

bump_dependency_in_internal_repo:
name: Bump dependency in apify-mcp-server-internal
needs: [ release_metadata, publish_to_npm ]

Check failure on line 131 in .github/workflows/manual_release_stable.yaml

View check run for this annotation

Claude / Claude Code Review

bump_dependency_in_internal_repo runs in parallel with smoke_test

The new smoke_test job was added to gate releases, but bump_dependency_in_internal_repo still has needs: [release_metadata, publish_to_npm], so it runs in parallel with the smoke test rather than after it. If the smoke test fails (broken npm package), the internal-repo dependency bump has already started and will create a PR pinning the broken version. Adding smoke_test to the needs list of bump_dependency_in_internal_repo fixes this.
Comment on lines 129 to 131
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new smoke_test job was added to gate releases, but bump_dependency_in_internal_repo still has needs: [release_metadata, publish_to_npm], so it runs in parallel with the smoke test rather than after it. If the smoke test fails (broken npm package), the internal-repo dependency bump has already started and will create a PR pinning the broken version. Adding smoke_test to the needs list of bump_dependency_in_internal_repo fixes this.

Extended reasoning...

What the bug is and how it manifests

The PR introduces a new smoke_test job (lines 122-127 in manual_release_stable.yaml) that installs the just-published npm package and runs a series of mcpc commands to verify it is functional. The intent is that this acts as a quality gate before the release is propagated to the internal repository. However, bump_dependency_in_internal_repo (line 129) has the same needs: [release_metadata, publish_to_npm] as smoke_test, so GitHub Actions schedules both jobs to run concurrently as soon as publish_to_npm finishes.

The specific code path that triggers it

In .github/workflows/manual_release_stable.yaml, after publish_to_npm completes, GitHub Actions fans out to any job whose needs list is fully satisfied. Both smoke_test and bump_dependency_in_internal_repo only require [release_metadata, publish_to_npm], so they both start simultaneously. If the smoke test catches a broken release and fails, the bump job is already mid-execution: checking out the internal repo, running npm install, and opening a PR.

Why existing code does not prevent it

There is no needs: smoke_test dependency on bump_dependency_in_internal_repo. GitHub Actions job ordering is determined solely by the needs graph; there is no implicit sequencing. The smoke_test job cannot cancel a sibling job that is already running.

What the impact would be

If a broken version is caught by the smoke test, bump_dependency_in_internal_repo will have already opened a pull request in apify/apify-mcp-server-internal pinning the broken version. Although this is only a PR (not an auto-merge), it creates noise, risks accidental merges by reviewers who trust CI to gate quality, and defeats the entire purpose of adding the smoke test quality gate.

Step-by-step proof

  1. Developer triggers manual_release_stable.yaml with release type patch.
  2. release_metadata runs and outputs version_number = 0.9.19.
  3. publish_to_npm runs and publishes a broken 0.9.19 to npm (e.g., startup crash).
  4. GitHub Actions marks publish_to_npm as succeeded (the publish itself worked).
  5. Both smoke_test AND bump_dependency_in_internal_repo become runnable and start in parallel.
  6. bump_dependency_in_internal_repo checks out apify-mcp-server-internal, installs @apify/actors-mcp-server@0.9.19, and opens a PR titled "chore: bump @apify/actors-mcp-server to 0.9.19".
  7. smoke_test installs 0.9.19, runs npx mcpc against it, gets a crash, and fails — but the PR in the internal repo is already open.

How to fix it

Change the needs list of bump_dependency_in_internal_repo to include smoke_test:

bump_dependency_in_internal_repo:
    name: Bump dependency in apify-mcp-server-internal
    needs: [ release_metadata, publish_to_npm, smoke_test ]

This ensures the bump only runs after the smoke test has confirmed the published package is functional. Note that on_master.yaml correctly adds smoke_test as a leaf job with no downstream dependents, so that workflow does not have this issue.

runs-on: ubuntu-latest
env:
GH_TOKEN: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/on_master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,10 @@ jobs:
"ref": "${{ needs.release_metadata.outputs.changelog_commitish }}",
"tag": "beta"
}

smoke_test:
name: Smoke test published package
needs: [ publish_to_npm ]
uses: ./.github/workflows/_smoke_test_npm_package.yaml
with:
npm_tag: beta
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
All notable changes to this project will be documented in this file.

<!-- git-cliff-unreleased-start -->
## 0.9.18 - **not yet released**
## 0.9.19 - **not yet released**


<!-- git-cliff-unreleased-end -->
## [0.9.18](https://github.com/apify/apify-mcp-server/releases/tag/v0.9.18) (2026-04-14)

### 🚀 Features

Expand All @@ -16,9 +20,9 @@ All notable changes to this project will be documented in this file.
- Sanitize all env values in evals to prevent ERR_INVALID_CHAR ([#665](https://github.com/apify/apify-mcp-server/pull/665)) ([9f0b9d9](https://github.com/apify/apify-mcp-server/commit/9f0b9d9d07a68999a26f3ed2e439fac47a872863)) by [@jirispilka](https://github.com/jirispilka)
- Actor run cancellation handling and add tests for abort scenarios ([#662](https://github.com/apify/apify-mcp-server/pull/662)) ([acedbf5](https://github.com/apify/apify-mcp-server/commit/acedbf59b374a1243bb31d2930ebac8403b7c47e)) by [@jirispilka](https://github.com/jirispilka)
- DNS rebinding protection for dev server ([#653](https://github.com/apify/apify-mcp-server/pull/653)) ([9781b45](https://github.com/apify/apify-mcp-server/commit/9781b450a53c96e0df49f463881253507502f195)) by [@jirispilka](https://github.com/jirispilka)
- Fix error in the validation pipeline ([#669](https://github.com/apify/apify-mcp-server/pull/669)) ([b844f31](https://github.com/apify/apify-mcp-server/commit/b844f31f4ccbae1c53c98d2441a95c1e65799230)) by [@jirispilka](https://github.com/jirispilka)


<!-- git-cliff-unreleased-end -->
## [0.9.17](https://github.com/apify/apify-mcp-server/releases/tag/v0.9.17) (2026-04-08)

### 🚀 Features
Expand Down
5 changes: 5 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,11 @@ Restart Claude Code for the change to take effect. This token is picked up by bo
| **Unit tests** | `npm run test:unit` | Individual modules in isolation — no credentials needed |
| **Integration tests** | `npm run test:integration` | Full server over all transports against real Apify API (requires `APIFY_TOKEN` + `npm run build`) |
| **mcpc probing** | `mcpc @stdio tools-call ...` | Interactive end-to-end verification during development |
| **LLM evals** | CI only — apply `validated` label | Runs `evals/run_evaluation.ts` against multiple models via OpenRouter; requires `PHOENIX_*` and `OPENROUTER_*` secrets |

To trigger the eval workflow on a PR, apply the **`validated`** label.
The workflow then runs automatically and posts results to Phoenix.
It also runs automatically on every merge to the `master` branch.

### Live probing with mcpc

Expand Down
2 changes: 1 addition & 1 deletion evals/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { fileURLToPath } from 'node:url';
import log from '@apify/log';

// Re-export shared config
export { OPENROUTER_CONFIG, sanitizeEnvValue, validateEnvVars, getRequiredEnvVars } from './shared/config.js';
export { OPENROUTER_CONFIG, sanitizeEnvValue, sanitizeProcessEnv, validateEnvVars, getRequiredEnvVars } from './shared/config.js';

// Read the version from test-cases.json
function getTestCasesVersion(): string {
Expand Down
3 changes: 2 additions & 1 deletion evals/create_dataset.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import { hideBin } from 'yargs/helpers';

import log from '@apify/log';

import { sanitizeEnvValue, validatePhoenixEnvVars } from './config.js';
import { sanitizeEnvValue, sanitizeProcessEnv, validatePhoenixEnvVars } from './config.js';
import { loadTestCases, filterByCategory, filterById, type TestCase } from './evaluation_utils.js';

// Set log level to debug
Expand All @@ -32,6 +32,7 @@ type CliArgs = {

// Load environment variables from .env file if present
dotenv.config({ path: '.env' });
sanitizeProcessEnv();

// Parse command line arguments using yargs
const argv = yargs(hideBin(process.argv))
Expand Down
2 changes: 2 additions & 0 deletions evals/run_evaluation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ import {
PHOENIX_MAX_RETRIES,
type EvaluatorName,
sanitizeEnvValue,
sanitizeProcessEnv,
validatePhoenixEnvVars
} from './config.js';

Expand Down Expand Up @@ -58,6 +59,7 @@ const RUN_LLM_EVALUATOR = true;
const RUN_TOOLS_EXACT_MATCH_EVALUATOR = true;

dotenv.config({ path: '.env' });
sanitizeProcessEnv();

// Parse command line arguments using yargs
const argv = yargs(hideBin(process.argv))
Expand Down
54 changes: 51 additions & 3 deletions evals/shared/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,60 @@
}

/**
* Removes newlines, trims whitespace, and strips surrounding quotes from env values.
* CI secrets often include trailing newlines or quotes that break HTTP headers and URLs.
* Strips control characters, trims whitespace, and removes surrounding double quotes.
* CI secrets often contain trailing newlines or invisible control chars that break HTTP headers.
*/
export function sanitizeEnvValue(value?: string): string | undefined {
if (value == null) return value;
return value.replace(/[\r\n]/g, '').trim().replace(/^"|"$/g, '');
// eslint-disable-next-line no-control-regex
return value.replace(/[\x00-\x08\x0a-\x1f\x7f]/g, '').trim().replace(/^"|"$/g, '');
}

/**
* Env vars used in HTTP headers (API keys, tokens, URLs).
*
* Why in-place? The phoenix-otel OTel exporter reads PHOENIX_API_KEY directly
* from process.env (inside getEnvApiKey()) and passes it to node:http, which
* throws ERR_INVALID_CHAR on any control characters. We can't intercept that
* read, so we sanitize process.env itself before any library loads.
*/
const ENV_KEYS_TO_SANITIZE = [
'OPENROUTER_API_KEY',
'OPENROUTER_BASE_URL',
'PHOENIX_API_KEY',
'PHOENIX_BASE_URL',
];

/**
* Redact a value for safe logging: shows first 4 and last 4 chars, masks the rest.
* Fully masks short values (≤ 8 chars) to prevent reconstruction from the log line.
* Returns '(empty)' for empty strings, '(unset)' for undefined/null.
*/
function redact(value?: string | null): string {
if (value == null) return '(unset)';
if (value.length === 0) return '(empty)';
if (value.length <= 6) return `*** (${value.length} chars)`;
return `${value.slice(0, 3)}***${value.slice(-3)} (${value.length} chars)`;
}

Check warning on line 60 in evals/shared/config.ts

View check run for this annotation

Claude / Claude Code Review

redact() JSDoc says 4 chars / <=8 threshold but code uses 3 chars / <=6

The JSDoc for the new `redact()` function claims it "shows first 4 and last 4 chars" and "Fully masks short values (≤ 8 chars)", but the implementation uses `slice(0,3)`/`slice(-3)` (3 chars each side) and masks values with `length <= 6`. For a security-adjacent logging helper, the docs should accurately reflect how much of a secret is revealed.
Comment on lines +50 to +60
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The JSDoc for the new redact() function claims it "shows first 4 and last 4 chars" and "Fully masks short values (≤ 8 chars)", but the implementation uses slice(0,3)/slice(-3) (3 chars each side) and masks values with length <= 6. For a security-adjacent logging helper, the docs should accurately reflect how much of a secret is revealed.

Extended reasoning...

What the bug is: The JSDoc comment for the newly-introduced redact() function (evals/shared/config.ts, lines 50–60) contains two factual errors relative to the implementation. (1) The comment says "shows first 4 and last 4 chars" but the code executes value.slice(0, 3) and value.slice(-3), exposing only 3 characters from each end. (2) The comment says "Fully masks short values (≤ 8 chars)" but the guard is value.length <= 6, not <= 8.

The specific code path: The function is called inside sanitizeProcessEnv() to produce redacted log lines such as env PHOENIX_API_KEY: abc***xyz (20 chars). Reviewers reading only the JSDoc would believe secrets shorter than 9 characters are fully masked, and that exactly 4 characters are revealed on each side—neither is accurate.

Why existing code doesn't prevent it: There are no automated tests that assert the format of the redacted output (only that sanitizeProcessEnv modifies process.env correctly). The mismatch slipped through because the function is private and the JSDoc was simply never updated to match the final implementation.

Impact: No functional impact—the actual redaction logic is self-consistent and secrets are still protected. However, the incorrect documentation could mislead a security reviewer auditing information-leakage guarantees of the CI log output. For example: a 7-character API key fragment would expose 6 of its 7 characters (3+3), whereas the JSDoc implies it would be fully masked (≤ 8 chars threshold).

Step-by-step proof: Take an 8-character value like sk-12345 (length = 8). According to the JSDoc it should be fully masked ("≤ 8 chars"). The code checks value.length <= 6—8 > 6, so it falls through to the reveal path and returns sk-***345 (8 chars), exposing 6 of 8 characters. Similarly, for a 10-character value the JSDoc implies 4+4=8 characters are visible, but the code reveals 3+3=6 characters.

How to fix: Correct the JSDoc to match the implementation: change "first 4 and last 4 chars" to "first 3 and last 3 chars" and change "≤ 8 chars" to "≤ 6 chars".


/**
* Sanitize env vars in-place on process.env and log redacted values for CI debugging.
* Must be called before any library reads these values.
*/
export function sanitizeProcessEnv(): void {
for (const key of ENV_KEYS_TO_SANITIZE) {
const raw = process.env[key];
if (raw != null) {
const sanitized = sanitizeEnvValue(raw)!;
const changed = raw !== sanitized;
process.env[key] = sanitized;
// eslint-disable-next-line no-console
console.log(`env ${key}: ${redact(sanitized)}${changed ? ' (sanitized)' : ''}`);
} else {
// eslint-disable-next-line no-console
console.log(`env ${key}: ${redact(raw)}`);
}
}
}

/**
Expand Down
2 changes: 1 addition & 1 deletion evals/workflows/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
*/

// Re-export shared config for convenience
export { OPENROUTER_CONFIG, sanitizeEnvValue, validateEnvVars } from '../shared/config.js';
export { OPENROUTER_CONFIG, sanitizeEnvValue, sanitizeProcessEnv, validateEnvVars } from '../shared/config.js';

/**
* Default model configuration for agent and judge
Expand Down
2 changes: 1 addition & 1 deletion manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"manifest_version": "0.3",
"name": "apify-mcp-server",
"display_name": "Apify",
"version": "0.9.18",
"version": "0.9.19",
"description": "Extract data from any website with thousands of scrapers, crawlers, and automations on Apify Store.",
"long_description": "Apify is the world's largest marketplace of tools for web scraping, crawling, data extraction, and web automation. Get structured data from social media, e-commerce, search engines, maps, travel sites, or any other website.",
"keywords": [
Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@apify/actors-mcp-server",
"version": "0.9.18",
"version": "0.9.19",
"type": "module",
"description": "Apify MCP Server",
"mcpName": "com.apify/apify-mcp-server",
Expand Down
2 changes: 1 addition & 1 deletion server.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"url": "https://github.com/apify/apify-mcp-server",
"source": "github"
},
"version": "0.9.18",
"version": "0.9.19",
"remotes": [
{
"type": "streamable-http",
Expand Down
5 changes: 5 additions & 0 deletions src/mcp/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1258,6 +1258,9 @@ export class ActorsMcpServer {
taskId,
mcpSessionId,
});
// Set final statusMessage before transitioning to terminal state,
// because storeTaskResult does not accept a statusMessage parameter.
await this.taskStore.updateTaskStatus(taskId, 'working', `${getToolFullName(tool)}: completed`, mcpSessionId);
await this.taskStore.storeTaskResult(taskId, 'completed', result, mcpSessionId);
log.debug('Task completed successfully', { taskId, toolName: tool.name, mcpSessionId });

Expand All @@ -1267,6 +1270,7 @@ export class ActorsMcpServer {
const httpStatus = getHttpStatusCode(error);
if (httpStatus === HTTP_PAYMENT_REQUIRED) {
logHttpError(error, 'Payment required while calling tool (task mode)', { toolName: tool.name });
await this.taskStore.updateTaskStatus(taskId, 'working', `${getToolFullName(tool)}: payment required`, mcpSessionId);
await this.taskStore.storeTaskResult(taskId, 'completed', buildPaymentRequiredResponse(error), mcpSessionId);
finishTaskTracking(TOOL_STATUS.SOFT_FAIL, {
failure_category: FAILURE_CATEGORY.INVALID_INPUT,
Expand Down Expand Up @@ -1312,6 +1316,7 @@ export class ActorsMcpServer {
taskId,
mcpSessionId,
});
await this.taskStore.updateTaskStatus(taskId, 'working', `${getToolFullName(tool)}: failed`, mcpSessionId);
await this.taskStore.storeTaskResult(taskId, 'failed', {
content: [{
type: 'text' as const,
Expand Down
33 changes: 31 additions & 2 deletions tests/unit/evals.config.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { describe, expect, it } from 'vitest';
import { afterEach, describe, expect, it } from 'vitest';

import { sanitizeEnvValue } from '../../evals/shared/config.js';
import { sanitizeEnvValue, sanitizeProcessEnv } from '../../evals/shared/config.js';

describe('sanitizeEnvValue', () => {
it('returns undefined for undefined', () => {
Expand Down Expand Up @@ -47,8 +47,37 @@ describe('sanitizeEnvValue', () => {
expect(sanitizeEnvValue('')).toBe('');
});

it('strips control characters', () => {
expect(sanitizeEnvValue('sk-abc\x00123')).toBe('sk-abc123'); // null byte
expect(sanitizeEnvValue('sk-abc\x01123')).toBe('sk-abc123'); // SOH
expect(sanitizeEnvValue('sk-abc\x0b123')).toBe('sk-abc123'); // vertical tab
expect(sanitizeEnvValue('sk-abc\x0c123')).toBe('sk-abc123'); // form feed
expect(sanitizeEnvValue('sk-abc\x1f123')).toBe('sk-abc123'); // unit separator
expect(sanitizeEnvValue('sk-abc\x7f123')).toBe('sk-abc123'); // DEL
});

it('is idempotent', () => {
const value = ' "sk-abc123"\r\n';
expect(sanitizeEnvValue(sanitizeEnvValue(value))).toBe(sanitizeEnvValue(value));
});
});

describe('sanitizeProcessEnv', () => {
afterEach(() => {
delete process.env.PHOENIX_API_KEY;
delete process.env.OPENROUTER_API_KEY;
});

it('sanitizes env vars in-place', () => {
process.env.PHOENIX_API_KEY = 'key-with-newline\n';
process.env.OPENROUTER_API_KEY = ' "quoted-key"\r\n';
sanitizeProcessEnv();
expect(process.env.PHOENIX_API_KEY).toBe('key-with-newline');
expect(process.env.OPENROUTER_API_KEY).toBe('quoted-key');
});

it('leaves unset vars untouched', () => {
sanitizeProcessEnv();
expect(process.env.PHOENIX_API_KEY).toBeUndefined();
});
});
Loading