test(junior): Rework testing architecture by dcramer · Pull Request #532 · getsentry/junior

dcramer · 2026-06-05T11:45:01Z

Reworks Junior's testing architecture so behavior coverage lives in the right layer: evals for agent-facing outcomes, integration for real runtime/Slack wiring, component tests for deterministic orchestration ports, and unit tests for local invariants. This branch also thins duplicate brittle tests, removes test-only production singleton mutation patterns, and updates the testing policies so mocks and telemetry assertions stay rare and explicit.

Boundary Enforcement

The old Slack-specific checker is now the broader test:boundaries command. It runs from both Junior and eval package scripts, scans eval sources, rejects integration module mocks, and blocks observability mocks/assertions outside rare tests/unit/logging/** contract suites.

Harness And Fixtures

Deterministic controls now use named harness ports, shared Vitest helpers, default clock helpers, memory adapters, MSW fixtures, and a shared direct-tool runtime fixture instead of module mocks or ad hoc empty runtime objects. Reply runtime overrides sit under ReplyRequestContext.harness, capability catalog injection no longer shares the production global cache across test sources, direct Slack tool contracts use typed state/context fixtures, and evals expose compact turn diagnostics instead of scraping logs, spans, or prompts.

Auth Regression

The cleanup uncovered and fixes a pending auth reuse bug: MCP auth reused a direct state import and both MCP/plugin reuse checks depended on wall time. Pending auth reuse now flows through injected services and clocks, with focused regression coverage.

Local Junior typecheck and the full package test command pass. The eval harness unit tests pass; live eval execution is still blocked in this worktree by missing Vercel/Gateway project configuration.

vercel · 2026-06-05T11:45:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
junior-docs	Ready	Preview, Comment	Jun 13, 2026 7:01pm

Replace repeated any-cast Slack message stubs with a small Message fixture. This keeps the unit suite focused on thread context normalization while exercising the real Chat SDK message shape. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Add a focused routing eval for deterministic one-step transforms. The eval asserts turn diagnostics directly so thinking-level routing is checked as behavior rather than incidental rubric prose. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Move the host loadSkill cases into the canonical tool suite and delete the misplaced skills test file. Keep the same coverage while removing result any-casts, cleaning up temporary skill directories, and avoiding real skill discovery in the unknown-skill unit case. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Centralize the fake MCP manager and tool result builders in the callMcpTool unit suite. Keep the invalid payload coverage while containing the unsafe call path in one helper. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Centralize webSearch execution and AI SDK result fixtures so the suite keeps the same Gateway adapter coverage with fewer casts and less repeated setup. Restore the patched AbortController in a finally block for better isolation. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Use typed completeText and fetch fixtures in the imageGenerate unit suite and centralize tool execution. This keeps the same adapter coverage while removing repeated execute casts and broad dependency casts. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Keep rebased resume and reporting tests aligned with the testing policy. Drop stale telemetry footer assertions and preserve the focused runtime test seams. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Keep the coverage test script aligned with the consolidated test boundary policy command. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Use real plugin registry, memory state, env stubs, fake timers, and temp files for tests that previously relied on production dependency wrappers. Keep explicit fakes at real external boundaries such as Vercel Sandbox, Slack delivery, OAuth launch, model completion, and HTTP fetch. Update testing policy docs to reject production dependency parameters for fs, env, time, logging, spans, and local helpers. This keeps behavior paths wired through real adapters by default. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Remove low-signal prompt-shape and persistence-failure cases from runtime component tests. Keep auth, yield, and timeout contracts covered through real state and adapter boundaries, and make the snapshot lock wait test use fake timers. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Remove duplicate sandbox and Slack image tests that asserted private implementation details or call-count-only behavior. Normalize dashboard reporting tests onto the shared Vitest clock helper. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Remove low-signal sandbox assertions, private prompt-wrapper checks, and duplicated Slack test helpers. Keep coverage focused on public behavior while sharing small fixture utilities across Slack integration tests. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Replace nested runtime service override bags with role-named adapter controls. Remove the Slack runtime clock dependency and use the shared fake clock helper in tests. Document when to use module-owned adapter selection versus explicit runtime scenario adapters so the test seam stays narrow. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Remove dependency injection that only existed to steer local helpers in tests. Keep production code on direct filesystem, skill loading, and turn-session state helpers where those are not real adapter boundaries. Update the affected tests to use temp app/plugin files, memory state, and env fixtures so coverage exercises the production paths more honestly. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Add an eval for image attachments when vision is unavailable so the model must acknowledge the image without inventing contents. Remove the remaining webFetch local-helper injection seam and its call-choreography unit test. Keep image generation adapters limited to the external model and fetch boundaries. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Update the eval behavior harness to use the flat Slack runtime adapter API so eval fixtures keep replacing only named scenario boundaries. Remove the broad runtime-factory override from harness unit tests and route those tests through the real Slack runtime with deterministic reply fixtures. Add the eval package typecheck to the normal root typecheck path so harness contract drift is caught before evals run. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Add a default Slack destination in the shared test runtime fixture so behavior tests keep using real runtime wiring after the destination contract from main. Remove stale generic tool-context channel capability overrides and update the subscribed-message retry test to use runtime adapter overrides. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Update heartbeat resume recovery tests to include the runtime destination now required for timeout resume scheduling. Adjust the scheduler heartbeat blocked-run case to exercise invalid credential routing, since scheduler storage now rejects malformed destinations before heartbeat can process them. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Replace prototype-style slash and JuniorChat ingress unit tests with signed Slack slash-command integration coverage. Add deterministic webFetch integration coverage for page extraction, image delivery, and HTTP client failures. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Preserve the mainline conversation-work and reporting changes while keeping the test cleanup branch focused on reliable boundaries. Prune stale split tests, move auth orchestration coverage to component tests, and keep shared fixtures aligned with the runtime contracts. Fix timeout continuation retries to use the timeout-resume reason while accepting legacy continuation errors during the cutover. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Add the ai and zod peer dependencies used by chat so the eval harness resolves the same chat type instance as Junior runtime fixtures. This keeps the rebased eval typecheck green without changing test behavior. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Narrow runtime test adapters to role-named scenario seams and group eval harness overrides by contract area. Move shared fixtures into feature folders, split the broad respond helper module, and update testing policy/enforcement so raw Slack captures and legacy flat eval override keys do not drift back in. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 74c8bd1. Configure here.}

Align the eval package lockfile entry with the root ai override so pnpm frozen install succeeds in CI. Set shared Vitest timeouts for the coverage-heavy Junior suite and reserve explicit timeouts for known long-running build checks. Co-Authored-By: GPT-5 Codex <codex@openai.com>

Remove stale per-test timeout overrides that are now covered by the shared Junior Vitest timeout budget. Keep local overrides reserved for known slow external or build boundaries. Co-Authored-By: GPT-5 Codex <codex@openai.com>

vercel Bot deployed to Preview – junior-docs June 5, 2026 11:45 View deployment

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/junior/src/chat/services/mcp-auth-orchestration.ts Outdated

vercel Bot deployed to Preview – junior-docs June 5, 2026 12:54 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 13:39 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 14:26 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 15:05 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 15:49 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 16:01 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 16:25 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 16:28 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 19:00 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 19:33 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 19:42 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 19:55 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 20:08 View deployment

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread packages/junior/src/chat/mcp/oauth.ts

vercel Bot deployed to Preview – junior-docs June 5, 2026 21:16 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 22:15 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 23:11 View deployment

vercel Bot deployed to Preview – junior-docs June 5, 2026 23:33 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 02:52 View deployment

cursor Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread packages/junior/package.json

dcramer force-pushed the codex/testing-architecture-cleanup branch from 0d12ef9 to 43a47d4 Compare June 6, 2026 04:03

vercel Bot deployed to Preview – junior-docs June 6, 2026 04:04 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 04:09 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 04:57 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 05:24 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 12:53 View deployment

vercel Bot deployed to Preview – junior-docs June 6, 2026 13:02 View deployment

dcramer and others added 21 commits June 13, 2026 08:58

test(junior): Use real thread context messages

c7f4881

Replace repeated any-cast Slack message stubs with a small Message fixture. This keeps the unit suite focused on thread context normalization while exercising the real Chat SDK message shape. Co-Authored-By: GPT-5 Codex <codex@openai.com>

test(evals): Cover low thinking routing

f19de89

Add a focused routing eval for deterministic one-step transforms. The eval asserts turn diagnostics directly so thinking-level routing is checked as behavior rather than incidental rubric prose. Co-Authored-By: GPT-5 Codex <codex@openai.com>

test(junior): Tighten MCP call tool fixtures

8f10a8f

Centralize the fake MCP manager and tool result builders in the callMcpTool unit suite. Keep the invalid payload coverage while containing the unsafe call path in one helper. Co-Authored-By: GPT-5 Codex <codex@openai.com>

test(junior): Reapply cleanup after rebase

3336d63

Keep rebased resume and reporting tests aligned with the testing policy. Drop stale telemetry footer assertions and preserve the focused runtime test seams. Co-Authored-By: GPT-5 Codex <codex@openai.com>

test(junior): Use renamed boundary check in coverage

a31f809

Keep the coverage test script aligned with the consolidated test boundary policy command. Co-Authored-By: GPT-5 Codex <codex@openai.com>

dcramer force-pushed the codex/testing-architecture-cleanup branch from 18e46e8 to 0b75c6d Compare June 13, 2026 16:06

vercel Bot had a problem deploying to Preview – junior-docs June 13, 2026 16:06 Failure

vercel Bot had a problem deploying to Preview – junior-docs June 13, 2026 17:21 Failure

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread packages/junior/src/chat/respond.ts

vercel Bot deployed to Preview – junior-docs June 13, 2026 18:20 View deployment

test(junior): Centralize ordinary Vitest timeouts

2c24908

Remove stale per-test timeout overrides that are now covered by the shared Junior Vitest timeout budget. Keep local overrides reserved for known slow external or build boundaries. Co-Authored-By: GPT-5 Codex <codex@openai.com>

vercel Bot deployed to Preview – junior-docs June 13, 2026 19:01 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(junior): Rework testing architecture#532

test(junior): Rework testing architecture#532
dcramer wants to merge 130 commits into
mainfrom
codex/testing-architecture-cleanup

dcramer commented Jun 5, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dcramer commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dcramer commented Jun 5, 2026 •

edited

Loading

vercel Bot commented Jun 5, 2026 •

edited

Loading