base-deep => gpt-5.4

jahooma · jahooma · commit 9c65ed1cbb15 · 2026-03-06T17:21:09.000-08:00
diff --git a/agents/base2/base-deep.ts b/agents/base2/base-deep.ts
@@ -29,7 +29,7 @@ Use the spawn_agents tool to spawn specialized agents to help you complete the u
 - **Spawn multiple agents in parallel:** This increases the speed of your response **and** allows you to be more comprehensive by spawning more total agents to synthesize the best response.
 - **Sequence agents properly:** Keep in mind dependencies when spawning different agents. Don't spawn agents in parallel that depend on each other.
   - Spawn context-gathering agents (file pickers, code-searcher, directory-lister, glob-matcher, and web/docs researchers) before making edits.
-  - Spawn the thinker-codex after gathering context to solve complex problems or when the user asks you to think about a problem. (gpt-5-agent is a last resort for complex problems)
+  - Spawn the thinker-gpt after gathering context to solve complex problems or when the user asks you to think about a problem. (gpt-5-agent is a last resort for complex problems)
   - Implement code changes using direct file editing tools.
   - Prefer apply_patch for existing-file edits. Use write_file only for creating or replacing entire files when that is simpler.
   - Spawn commanders sequentially if the second command depends on the the first.
@@ -62,13 +62,13 @@ For other questions, you can direct them to codebuff.com, or especially codebuff
 
 [ Phase 1 — Codebase Context & Research: You spawn file-pickers, code-searchers, and researchers (web/docs) in parallel to find relevant files and research external libraries/APIs, then read the results to build understanding ]
 
-[ Phase 2 — Spec: You draft an initial SPEC.md, then use ask_user iteratively to refine it, then run thinker-codex critique loop until clean ]
+[ Phase 2 — Spec: You draft an initial SPEC.md, then use ask_user iteratively to refine it, then run thinker-gpt critique loop until clean ]
 
-[ Phase 3 — Plan: You write a detailed PLAN.md with all implementation steps, run thinker-codex critique loop, then write implementation todos ]
+[ Phase 3 — Plan: You write a detailed PLAN.md with all implementation steps, run thinker-gpt critique loop, then write implementation todos ]
 
 [ Phase 4 — Implement: You fully implement the spec using direct file editing tools ]
 
-[ Phase 5 — Review Loop: You spawn code-reviewer-codex, fix any issues found, and re-run the reviewer until no new issues are found ]
+[ Phase 5 — Review Loop: You spawn code-reviewer-gpt, fix any issues found, and re-run the reviewer until no new issues are found ]
 
 [ Phase 6 — Validate: You run unit tests, add new tests, fix failures, and attempt E2E verification by running the application ]${noLearning ? '' : `
 
@@ -156,9 +156,9 @@ Draft a spec first, then refine it with the user:
 5. **Do NOT ask obvious questions.** If you are >80% confident you know what the user would choose, just make that choice and move on. Only ask questions where the user's input would genuinely change the outcome.
 6. As the LAST question before finishing this phase, ask one open-ended question giving the user a chance to share any final feedback, concerns, or changes to the spec. For example: "Before I finalize the spec, is there anything else you'd like to add, change, or flag about the requirements?"`}
 ${noAskUser ? '3' : '7'}. Iteratively critique the spec:
-   a. Spawn thinker-codex to critique the spec — ask it to identify missing requirements, ambiguities, contradictions, overlooked edge cases, or technical approach issues.
+   a. Spawn thinker-gpt to critique the spec — ask it to identify missing requirements, ambiguities, contradictions, overlooked edge cases, or technical approach issues.
    b. If the thinker raises valid critiques, update SPEC.md to address them.
-   c. After updating, you MUST spawn thinker-codex again to re-critique the revised spec.
+   c. After updating, you MUST spawn thinker-gpt again to re-critique the revised spec.
    d. Repeat until the thinker finds no new substantive critiques. Do NOT skip the re-critique — every revision must be verified.
 ${noAskUser ? '4' : '8'}. Do NOT proceed until you are confident the spec captures the full picture.
 
@@ -171,17 +171,17 @@ Create a detailed implementation plan, iteratively critique it, and save it alon
    - **Dependencies / Ordering**: Note which steps depend on others and the recommended order of implementation.
    - **Risk Areas**: Flag any steps that are tricky, uncertain, or likely to need iteration.
 2. Iteratively critique the plan:
-   a. Spawn thinker-codex to critique the plan — ask it to identify gaps, missed edge cases, better approaches, ordering issues, or unnecessary steps.
+   a. Spawn thinker-gpt to critique the plan — ask it to identify gaps, missed edge cases, better approaches, ordering issues, or unnecessary steps.
    b. If the thinker raises valid critiques, update PLAN.md to address them.
-   c. After updating, you MUST spawn thinker-codex again to re-critique the revised plan.
+   c. After updating, you MUST spawn thinker-gpt again to re-critique the revised plan.
    d. Repeat until the thinker finds no new substantive critiques. Do NOT skip the re-critique — every revision must be verified.
 3. Write implementation todos (the second phase of todos) — one todo per plan step, plus todos for phases 5-${noLearning ? '6' : '7'}.
 
 ## Phase 4 — Implement
 
 Fully implement the spec:
 
-1. For complex problems, spawn the thinker-codex agent to help find the best solution.
+1. For complex problems, spawn the thinker-gpt agent to help find the best solution.
 2. Implement all changes using direct file editing tools. Prefer apply_patch for edits.
 3. Implement ALL requirements from the spec — do not leave anything partially done.
 4. Narrate what you are doing as you go.
@@ -190,9 +190,9 @@ Fully implement the spec:
 
 Iteratively review until the code is clean:
 
-1. Spawn code-reviewer-codex to review all changes.
+1. Spawn code-reviewer-gpt to review all changes.
 2. If the reviewer finds ANY issues, fix them.
-3. After fixing, you MUST spawn code-reviewer-codex again to re-review.
+3. After fixing, you MUST spawn code-reviewer-gpt again to re-review.
 4. Repeat steps 1-3 until the reviewer finds no new issues. Do NOT skip the re-review — every fix must be verified.
 
 ## Phase 6 — Validate
@@ -234,9 +234,9 @@ Capture learnings for future sessions:
      - Reference the specific session directory where each piece of knowledge was learned (e.g. "(from .agents/sessions/2025-01-15-add-auth/)")
      - Only include insights that are genuinely useful for future work — not generic advice
 3. Iteratively improve lessons and skills:
-   a. Spawn thinker-codex to critique your LESSONS.md and skill file edits — ask it to identify missing insights, improvements to existing entries, and brainstorm additional skills that could be created or updated based on the work done in this session.
+   a. Spawn thinker-gpt to critique your LESSONS.md and skill file edits — ask it to identify missing insights, improvements to existing entries, and brainstorm additional skills that could be created or updated based on the work done in this session.
    b. If the thinker suggests valid improvements or new skill ideas, update the relevant files accordingly.
-   c. After updating, you MUST spawn thinker-codex again to re-critique and brainstorm further.
+   c. After updating, you MUST spawn thinker-gpt again to re-critique and brainstorm further.
    d. Repeat until the thinker finds no new substantive improvements or skill ideas. Do NOT skip the re-critique — every revision must be verified.`}${noAskUser ? '' : `
 ${noLearning ? '1' : '4'}. Use suggest_followups to suggest ~3 next steps the user might want to take.`}
 
@@ -255,8 +255,8 @@ export function createBaseDeep(options?: {
   const { noAskUser = false, noLearning = false } = options ?? {}
   return {
     publisher,
-    model: 'openai/gpt-5.3-codex',
-    displayName: 'Buffy the Codex Orchestrator',
+    model: 'openai/gpt-5.4',
+    displayName: 'Buffy the GPT Orchestrator',
     spawnerPrompt:
       'Advanced base agent that orchestrates planning, editing, and reviewing for complex coding tasks',
     inputSchema: {
@@ -296,8 +296,8 @@ export function createBaseDeep(options?: {
       'researcher-web',
       'researcher-docs',
       'commander',
-      'thinker-codex',
-      'code-reviewer-codex',
+      'thinker-gpt',
+      'code-reviewer-gpt',
       'gpt-5-agent',
       'context-pruner',
     ],
@@ -307,14 +307,14 @@ export function createBaseDeep(options?: {
 
 **Planning todos** (write at start): Phase 1 → Phase 2 → Phase 3
 1. Context & Research — file-pickers + code-searchers + researchers in parallel, read results
-2. Spec — draft SPEC.md, ${noAskUser ? '' : 'iterative ask_user to refine (skip obvious Qs), open-ended final Q, '}thinker-codex critique loop
-3. Plan — write PLAN.md, thinker-codex critique loop
+2. Spec — draft SPEC.md, ${noAskUser ? '' : 'iterative ask_user to refine (skip obvious Qs), open-ended final Q, '}thinker-gpt critique loop
+3. Plan — write PLAN.md, thinker-gpt critique loop
 
 **Implementation todos** (write after Plan): one todo per plan step + phases 5-${noLearning ? '6' : '7'}
 4. Implement — fully build the spec using file editing tools
-5. Review Loop — code-reviewer-codex → fix → re-review until clean
+5. Review Loop — code-reviewer-gpt → fix → re-review until clean
 6. Validate — run tests + typechecks, add new tests, do E2E verification${noLearning ? '' : `
-7. Lessons — write LESSONS.md, update/create skills, iterative thinker-codex brainstorm loop`}`,
+7. Lessons — write LESSONS.md, update/create skills, iterative thinker-gpt brainstorm loop`}`,
     handleSteps: function* ({ params }) {
       while (true) {
         // Run context-pruner before each step.
diff --git a/agents/e2e/base-deep.e2e.test.ts b/agents/e2e/base-deep.e2e.test.ts
@@ -9,7 +9,7 @@ import { beforeAll, describe, expect, it } from 'bun:test'
 import { $ } from 'bun'
 
 import baseDeep from '../base2/base-deep'
-import thinkerCodex from '../thinker/thinker-codex'
+import thinkerCodex from '../thinker/thinker-gpt'
 
 import type { PrintModeEvent } from '@codebuff/common/types/print-mode'
 
diff --git a/agents/reviewer/code-reviewer-gpt.ts b/agents/reviewer/code-reviewer-gpt.ts
@@ -3,9 +3,9 @@ import type { SecretAgentDefinition } from '../types/secret-agent-definition'
 import { createReviewer } from './code-reviewer'
 
 const definition: SecretAgentDefinition = {
-  id: 'code-reviewer-codex',
+  id: 'code-reviewer-gpt',
   publisher,
-  ...createReviewer('openai/gpt-5.3-codex'),
+  ...createReviewer('openai/gpt-5.4'),
 }
 
 export default definition
diff --git a/agents/thinker/thinker-gpt.ts b/agents/thinker/thinker-gpt.ts
@@ -4,11 +4,11 @@ import type { SecretAgentDefinition } from '../types/secret-agent-definition'
 
 const definition: SecretAgentDefinition = {
   ...thinker,
-  id: 'thinker-codex',
-  model: 'openai/gpt-5.3-codex',
+  id: 'thinker-gpt',
+  model: 'openai/gpt-5.4',
   outputSchema: undefined,
   outputMode: 'last_message',
-  instructionsPrompt: `You are the thinker-codex agent. Think deeply about the user request and when satisfied, write out your response.
+  instructionsPrompt: `You are the thinker-gpt agent. Think deeply about the user request and when satisfied, write out your response.
   
 The parent agent will see your response. DO NOT call any tools. No need to spawn the thinker agent, because you are already the thinker agent. Just do the thinking work now.`,
   handleSteps: function* () {