Skip to content

fix: set final statusMessage before storeTaskResult (#638)#677

Closed
jirispilka wants to merge 9 commits intomasterfrom
claude/investigate-issue-638-eFPfG
Closed

fix: set final statusMessage before storeTaskResult (#638)#677
jirispilka wants to merge 9 commits intomasterfrom
claude/investigate-issue-638-eFPfG

Conversation

@jirispilka
Copy link
Copy Markdown
Collaborator

@jirispilka jirispilka commented Apr 15, 2026

Summary

  • storeTaskResult() does not accept a statusMessage parameter, so it left the previous intermediate progress message on the task after completion
  • Calls updateTaskStatus(taskId, 'working', '<tool>: completed|failed|payment required') immediately before each storeTaskResult() so the final message is in place before the terminal transition
  • Verified end-to-end via mcpc: tasks/list now shows "apify/rag-web-browser: completed" instead of the stale "Starting the crawler."

Test plan

  • Unit tests: npm run test:unit (2 regression cases in task.statusMessage.test.ts)
  • mcpc smoke test: call apify/rag-web-browser, check tasks/liststatusMessage shows "<tool>: completed"

🤖 Generated with Claude Code

claude and others added 9 commits April 14, 2026 07:25
storeTaskResult() has no statusMessage parameter, so the last
intermediate progress message (e.g. "Starting the crawler.") persisted
as the final statusMessage on completed/failed tasks. Call
updateTaskStatus() with a descriptive final message right before each
storeTaskResult() call.

https://claude.ai/code/session_01ABYjPsZxtCdib54jfGcx3f
* fix: strip all invalid HTTP header characters in sanitizeEnvValue

The previous regex only removed \r\n but Node.js rejects all control
characters outside [\t\x20-\x7e\x80-\xff] with ERR_INVALID_CHAR.
CI secrets can contain null bytes or other control chars that slipped
through.

https://claude.ai/code/session_01Kj7rEHNuhTG5U7NrM9RWaG

* fix: sanitize env vars in-place so third-party libs get clean values

The OTel OTLP exporter (used by @arizeai/phoenix-otel) reads
PHOENIX_API_KEY directly from process.env via getEnvApiKey(), bypassing
our sanitizeEnvValue() wrapper. It then passes the raw value to
node:http which throws ERR_INVALID_CHAR.

Add sanitizeProcessEnv() that rewrites sensitive env vars on
process.env itself, called right after dotenv.config(). This ensures
every reader — including third-party libraries — gets clean values.

https://claude.ai/code/session_01Kj7rEHNuhTG5U7NrM9RWaG

* fix: simplify regex, improve comments, add redacted env logging

- Replace cryptic allowlist regex with readable control-char blocklist
- Explain why in-place sanitization is needed (phoenix-otel reads env directly)
- Log redacted env var values during sanitizeProcessEnv() for CI debugging

https://claude.ai/code/session_01Kj7rEHNuhTG5U7NrM9RWaG

* fix: simplify tests for sanitizeEnvValue and sanitizeProcessEnv

* docs: update DEVELOPMENT.md to include LLM evals

* fix: improve value redaction logic for safer logging

---------

Co-authored-by: Claude <noreply@anthropic.com>
* ci: add post-publish smoke test for npm package

Installs the published package from npm in an isolated directory and
verifies it works using mcpc: library imports, server startup, ping,
tools-list, and tool calls (search-apify-docs, fetch-apify-docs).

Uses --tools docs mode (no APIFY_TOKEN needed). Runs after both beta
(on_master) and stable (manual_release_stable) publishes.

https://claude.ai/code/session_01W3QpQDWu44fBWmKsd7rA5Q

* ci: add temporary debug workflow for smoke test (remove before merge)

https://claude.ai/code/session_01W3QpQDWu44fBWmKsd7rA5Q

* ci: clarify file deletion reminder

https://claude.ai/code/session_01W3QpQDWu44fBWmKsd7rA5Q

* ci: simplify smoke test workflow

https://claude.ai/code/session_01W3QpQDWu44fBWmKsd7rA5Q

* ci: enhance smoke test workflow to support exact version installation

* ci: update smoke test to use version 0.9.17

* ci: update retry action to version 4

* ci: reduce timeout and max attempts for package installation; add additional fetch commands

---------

Co-authored-by: Claude <noreply@anthropic.com>
#671)

* fix: Use bin entry point in smoke test instead of direct JS path

The smoke test was running `node node_modules/.../dist/stdio.js` directly,
bypassing the package's bin entry point. Switch to `npx actors-mcp-server`
so we actually test the published package as users would use it.

Includes a temporary workflow_dispatch copy for testing in PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: trigger test workflow on push (delete later)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Use environment variable for package version in smoke test workflow

* chore: remove temp smoke test

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The third test asserted that a stale intermediate statusMessage is preserved
through storeTaskResult — which tests InMemoryTaskStore internals, not the
bug. The two remaining cases are sufficient regression coverage.
Comment on lines 129 to 131
bump_dependency_in_internal_repo:
name: Bump dependency in apify-mcp-server-internal
needs: [ release_metadata, publish_to_npm ]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new smoke_test job was added to gate releases, but bump_dependency_in_internal_repo still has needs: [release_metadata, publish_to_npm], so it runs in parallel with the smoke test rather than after it. If the smoke test fails (broken npm package), the internal-repo dependency bump has already started and will create a PR pinning the broken version. Adding smoke_test to the needs list of bump_dependency_in_internal_repo fixes this.

Extended reasoning...

What the bug is and how it manifests

The PR introduces a new smoke_test job (lines 122-127 in manual_release_stable.yaml) that installs the just-published npm package and runs a series of mcpc commands to verify it is functional. The intent is that this acts as a quality gate before the release is propagated to the internal repository. However, bump_dependency_in_internal_repo (line 129) has the same needs: [release_metadata, publish_to_npm] as smoke_test, so GitHub Actions schedules both jobs to run concurrently as soon as publish_to_npm finishes.

The specific code path that triggers it

In .github/workflows/manual_release_stable.yaml, after publish_to_npm completes, GitHub Actions fans out to any job whose needs list is fully satisfied. Both smoke_test and bump_dependency_in_internal_repo only require [release_metadata, publish_to_npm], so they both start simultaneously. If the smoke test catches a broken release and fails, the bump job is already mid-execution: checking out the internal repo, running npm install, and opening a PR.

Why existing code does not prevent it

There is no needs: smoke_test dependency on bump_dependency_in_internal_repo. GitHub Actions job ordering is determined solely by the needs graph; there is no implicit sequencing. The smoke_test job cannot cancel a sibling job that is already running.

What the impact would be

If a broken version is caught by the smoke test, bump_dependency_in_internal_repo will have already opened a pull request in apify/apify-mcp-server-internal pinning the broken version. Although this is only a PR (not an auto-merge), it creates noise, risks accidental merges by reviewers who trust CI to gate quality, and defeats the entire purpose of adding the smoke test quality gate.

Step-by-step proof

  1. Developer triggers manual_release_stable.yaml with release type patch.
  2. release_metadata runs and outputs version_number = 0.9.19.
  3. publish_to_npm runs and publishes a broken 0.9.19 to npm (e.g., startup crash).
  4. GitHub Actions marks publish_to_npm as succeeded (the publish itself worked).
  5. Both smoke_test AND bump_dependency_in_internal_repo become runnable and start in parallel.
  6. bump_dependency_in_internal_repo checks out apify-mcp-server-internal, installs @apify/actors-mcp-server@0.9.19, and opens a PR titled "chore: bump @apify/actors-mcp-server to 0.9.19".
  7. smoke_test installs 0.9.19, runs npx mcpc against it, gets a crash, and fails — but the PR in the internal repo is already open.

How to fix it

Change the needs list of bump_dependency_in_internal_repo to include smoke_test:

bump_dependency_in_internal_repo:
    name: Bump dependency in apify-mcp-server-internal
    needs: [ release_metadata, publish_to_npm, smoke_test ]

This ensures the bump only runs after the smoke test has confirmed the published package is functional. Note that on_master.yaml correctly adds smoke_test as a leaf job with no downstream dependents, so that workflow does not have this issue.

Comment on lines +31 to +35
with:
timeout_minutes: 5
max_attempts: 3
retry_wait_seconds: 30
command: |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The smoke test installs @apify/mcpc without a version pin (npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc), so it always installs the latest published version. If @apify/mcpc releases a breaking change (renamed CLI flags, changed output format), the smoke test would fail even when actors-mcp-server itself is perfectly healthy, creating false CI failures that could block releases. Pinning to a known-good version (e.g. @apify/mcpc@0.2.0 or @apify/mcpc@^0.2.0) would make the test deterministic.

Extended reasoning...

What the bug is and how it manifests

In .github/workflows/_smoke_test_npm_package.yaml (line 35), the install command is:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" @apify/mcpc

@apify/mcpc has no version constraint, so npm always resolves and installs the latest published version. This means the smoke-test environment is non-deterministic: different CI runs can silently pull in different versions of the test harness.

The specific code path that triggers it

The smoke test creates a fresh smoke-test/ directory, runs npm init -y, and installs both packages from scratch. Because @apify/mcpc is not pinned (unlike the main package under test, which is pinned to ${PKG_VERSION}), any new release of @apify/mcpc published between two CI runs will be picked up automatically. The subsequent smoke commands—npx mcpc connect, npx mcpc @dev ping, npx mcpc --json @dev tools-list, etc.—all depend on the CLI API of whichever version was installed.

Why existing code does not prevent it

The project's own package.json pins @apify/mcpc at ^0.2.0 in devDependencies, demonstrating that the team already values version stability for this tool. However, the smoke-test workflow bypasses package.json entirely: it sets up a standalone directory with its own package.json (via npm init -y) and installs packages directly. The retry mechanism (max_attempts: 3) mitigates transient npm registry failures but does nothing about determinism across versions.

What the impact would be

If @apify/mcpc 0.3.0 (hypothetically) renames the connect subcommand, changes the --json flag semantics, or alters the tools-list output format, the smoke test would fail at step 2 or 3—even though @apify/actors-mcp-server itself is perfectly healthy. This would block the beta or stable release pipeline. Because the failure comes from the test harness rather than the package under test, it could be confusing to diagnose, especially in time-sensitive release windows.

How to fix it

Pin @apify/mcpc to the same range already used in devDependencies:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" "@apify/mcpc@^0.2.0"

Or use an exact version for maximum reproducibility:

npm install "@apify/actors-mcp-server@${PKG_VERSION}" "@apify/mcpc@0.2.0"

The refutation argument—that @apify/mcpc is an Apify-owned package and won't introduce uncoordinated breaking changes—is reasonable and lowers the practical risk. However, the same logic would argue against pinning anything, which contradicts the team's existing practice. Even for internal tools, a pinned test harness version is a best-practice that prevents CI surprise failures and makes version-upgrade decisions explicit and intentional.

Comment thread evals/shared/config.ts
Comment on lines +50 to +60
/**
* Redact a value for safe logging: shows first 4 and last 4 chars, masks the rest.
* Fully masks short values (≤ 8 chars) to prevent reconstruction from the log line.
* Returns '(empty)' for empty strings, '(unset)' for undefined/null.
*/
function redact(value?: string | null): string {
if (value == null) return '(unset)';
if (value.length === 0) return '(empty)';
if (value.length <= 6) return `*** (${value.length} chars)`;
return `${value.slice(0, 3)}***${value.slice(-3)} (${value.length} chars)`;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The JSDoc for the new redact() function claims it "shows first 4 and last 4 chars" and "Fully masks short values (≤ 8 chars)", but the implementation uses slice(0,3)/slice(-3) (3 chars each side) and masks values with length <= 6. For a security-adjacent logging helper, the docs should accurately reflect how much of a secret is revealed.

Extended reasoning...

What the bug is: The JSDoc comment for the newly-introduced redact() function (evals/shared/config.ts, lines 50–60) contains two factual errors relative to the implementation. (1) The comment says "shows first 4 and last 4 chars" but the code executes value.slice(0, 3) and value.slice(-3), exposing only 3 characters from each end. (2) The comment says "Fully masks short values (≤ 8 chars)" but the guard is value.length <= 6, not <= 8.

The specific code path: The function is called inside sanitizeProcessEnv() to produce redacted log lines such as env PHOENIX_API_KEY: abc***xyz (20 chars). Reviewers reading only the JSDoc would believe secrets shorter than 9 characters are fully masked, and that exactly 4 characters are revealed on each side—neither is accurate.

Why existing code doesn't prevent it: There are no automated tests that assert the format of the redacted output (only that sanitizeProcessEnv modifies process.env correctly). The mismatch slipped through because the function is private and the JSDoc was simply never updated to match the final implementation.

Impact: No functional impact—the actual redaction logic is self-consistent and secrets are still protected. However, the incorrect documentation could mislead a security reviewer auditing information-leakage guarantees of the CI log output. For example: a 7-character API key fragment would expose 6 of its 7 characters (3+3), whereas the JSDoc implies it would be fully masked (≤ 8 chars threshold).

Step-by-step proof: Take an 8-character value like sk-12345 (length = 8). According to the JSDoc it should be fully masked ("≤ 8 chars"). The code checks value.length <= 6—8 > 6, so it falls through to the reveal path and returns sk-***345 (8 chars), exposing 6 of 8 characters. Similarly, for a 10-character value the JSDoc implies 4+4=8 characters are visible, but the code reveals 3+3=6 characters.

How to fix: Correct the JSDoc to match the implementation: change "first 4 and last 4 chars" to "first 3 and last 3 chars" and change "≤ 8 chars" to "≤ 6 chars".

@jirispilka
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #680 — same fix on a clean branch rebased on current master (the old branch had 7 stale commits from an older master state that muddled the diff).

@jirispilka jirispilka closed this Apr 15, 2026
@jirispilka jirispilka deleted the claude/investigate-issue-638-eFPfG branch April 15, 2026 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants