Skip to content

test(junior): Rework testing architecture#532

Draft
dcramer wants to merge 130 commits into
mainfrom
codex/testing-architecture-cleanup
Draft

test(junior): Rework testing architecture#532
dcramer wants to merge 130 commits into
mainfrom
codex/testing-architecture-cleanup

Conversation

@dcramer

@dcramer dcramer commented Jun 5, 2026

Copy link
Copy Markdown
Member

Reworks Junior's testing architecture so behavior coverage lives in the right layer: evals for agent-facing outcomes, integration for real runtime/Slack wiring, component tests for deterministic orchestration ports, and unit tests for local invariants. This branch also thins duplicate brittle tests, removes test-only production singleton mutation patterns, and updates the testing policies so mocks and telemetry assertions stay rare and explicit.

Boundary Enforcement

The old Slack-specific checker is now the broader test:boundaries command. It runs from both Junior and eval package scripts, scans eval sources, rejects integration module mocks, and blocks observability mocks/assertions outside rare tests/unit/logging/** contract suites.

Harness And Fixtures

Deterministic controls now use named harness ports, shared Vitest helpers, default clock helpers, memory adapters, MSW fixtures, and a shared direct-tool runtime fixture instead of module mocks or ad hoc empty runtime objects. Reply runtime overrides sit under ReplyRequestContext.harness, capability catalog injection no longer shares the production global cache across test sources, direct Slack tool contracts use typed state/context fixtures, and evals expose compact turn diagnostics instead of scraping logs, spans, or prompts.

Auth Regression

The cleanup uncovered and fixes a pending auth reuse bug: MCP auth reused a direct state import and both MCP/plugin reuse checks depended on wall time. Pending auth reuse now flows through injected services and clocks, with focused regression coverage.

Local Junior typecheck and the full package test command pass. The eval harness unit tests pass; live eval execution is still blocked in this worktree by missing Vercel/Gateway project configuration.

@vercel

vercel Bot commented Jun 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
junior-docs Ready Ready Preview, Comment Jun 13, 2026 7:01pm

Request Review

Comment thread packages/junior/src/chat/services/mcp-auth-orchestration.ts Outdated
Comment thread packages/junior/src/chat/mcp/oauth.ts
Comment thread packages/junior/package.json
dcramer and others added 21 commits June 13, 2026 08:58
Replace repeated any-cast Slack message stubs with a small Message fixture. This keeps the unit suite focused on thread context normalization while exercising the real Chat SDK message shape.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Add a focused routing eval for deterministic one-step transforms. The eval asserts turn diagnostics directly so thinking-level routing is checked as behavior rather than incidental rubric prose.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Move the host loadSkill cases into the canonical tool suite and delete the misplaced skills test file. Keep the same coverage while removing result any-casts, cleaning up temporary skill directories, and avoiding real skill discovery in the unknown-skill unit case.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Centralize the fake MCP manager and tool result builders in the callMcpTool unit suite. Keep the invalid payload coverage while containing the unsafe call path in one helper.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Centralize webSearch execution and AI SDK result fixtures so the suite keeps the same Gateway adapter coverage with fewer casts and less repeated setup. Restore the patched AbortController in a finally block for better isolation.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Use typed completeText and fetch fixtures in the imageGenerate unit suite and centralize tool execution. This keeps the same adapter coverage while removing repeated execute casts and broad dependency casts.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Keep rebased resume and reporting tests aligned with the testing policy.

Drop stale telemetry footer assertions and preserve the focused runtime test seams.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Keep the coverage test script aligned with the consolidated test boundary policy command.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Use real plugin registry, memory state, env stubs, fake timers, and temp files for tests that previously relied on production dependency wrappers. Keep explicit fakes at real external boundaries such as Vercel Sandbox, Slack delivery, OAuth launch, model completion, and HTTP fetch.

Update testing policy docs to reject production dependency parameters for fs, env, time, logging, spans, and local helpers. This keeps behavior paths wired through real adapters by default.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Remove low-signal prompt-shape and persistence-failure cases from runtime component tests. Keep auth, yield, and timeout contracts covered through real state and adapter boundaries, and make the snapshot lock wait test use fake timers.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Remove duplicate sandbox and Slack image tests that asserted private implementation details or call-count-only behavior. Normalize dashboard reporting tests onto the shared Vitest clock helper.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Remove low-signal sandbox assertions, private prompt-wrapper checks, and duplicated Slack test helpers. Keep coverage focused on public behavior while sharing small fixture utilities across Slack integration tests.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Replace nested runtime service override bags with role-named adapter controls. Remove the Slack runtime clock dependency and use the shared fake clock helper in tests.

Document when to use module-owned adapter selection versus explicit runtime scenario adapters so the test seam stays narrow.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Remove dependency injection that only existed to steer local helpers in tests. Keep production code on direct filesystem, skill loading, and turn-session state helpers where those are not real adapter boundaries.

Update the affected tests to use temp app/plugin files, memory state, and env fixtures so coverage exercises the production paths more honestly.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Add an eval for image attachments when vision is unavailable so the model must acknowledge the image without inventing contents.

Remove the remaining webFetch local-helper injection seam and its call-choreography unit test. Keep image generation adapters limited to the external model and fetch boundaries.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Update the eval behavior harness to use the flat Slack runtime adapter API so eval fixtures keep replacing only named scenario boundaries.

Remove the broad runtime-factory override from harness unit tests and route those tests through the real Slack runtime with deterministic reply fixtures.

Add the eval package typecheck to the normal root typecheck path so harness contract drift is caught before evals run.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Add a default Slack destination in the shared test runtime fixture so behavior tests keep using real runtime wiring after the destination contract from main.

Remove stale generic tool-context channel capability overrides and update the subscribed-message retry test to use runtime adapter overrides.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Update heartbeat resume recovery tests to include the runtime destination now required for timeout resume scheduling.

Adjust the scheduler heartbeat blocked-run case to exercise invalid credential routing, since scheduler storage now rejects malformed destinations before heartbeat can process them.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Replace prototype-style slash and JuniorChat ingress unit tests with signed Slack slash-command integration coverage. Add deterministic webFetch integration coverage for page extraction, image delivery, and HTTP client failures.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Preserve the mainline conversation-work and reporting changes while keeping the test cleanup branch focused on reliable boundaries.

Prune stale split tests, move auth orchestration coverage to component tests, and keep shared fixtures aligned with the runtime contracts.

Fix timeout continuation retries to use the timeout-resume reason while accepting legacy continuation errors during the cutover.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Add the ai and zod peer dependencies used by chat so the eval harness resolves the same chat type instance as Junior runtime fixtures. This keeps the rebased eval typecheck green without changing test behavior.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Narrow runtime test adapters to role-named scenario seams and group eval harness overrides by contract area.

Move shared fixtures into feature folders, split the broad respond helper module, and update testing policy/enforcement so raw Slack captures and legacy flat eval override keys do not drift back in.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 74c8bd1. Configure here.

Comment thread packages/junior/src/chat/respond.ts
Align the eval package lockfile entry with the root ai override so pnpm frozen install succeeds in CI.

Set shared Vitest timeouts for the coverage-heavy Junior suite and reserve explicit timeouts for known long-running build checks.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Remove stale per-test timeout overrides that are now covered by the shared Junior Vitest timeout budget. Keep local overrides reserved for known slow external or build boundaries.

Co-Authored-By: GPT-5 Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant