Background
The 2026-05-15 codex backend smoke test (#485) surfaced four interacting wizard / install-flow bugs that cost multiple deploy iterations to work around. Each is independent but they share a theme: the wizard and the install apply path do not robustly carry codex-specific configuration through to the running bot.
Bug 1 — make config does not switch DEFAULT_MODEL on backend change
When the operator runs make config and switches AGENT_BACKEND from claude to codex, the wizard accepts the prior DEFAULT_MODEL="opus" value and writes it into install.conf. The next install apply writes that value into /etc/kai/env. The bot starts up with a claude model selected against the codex backend; the pool dispatches a Starting persistent Codex app-server process (model=opus, ...) which codex either fails on or silently substitutes its default.
The wizard should validate (AGENT_BACKEND, DEFAULT_MODEL) together and either re-prompt or reset the model to a backend-appropriate default. The apply-time validator in _cmd_apply already runs validate_model_for_provider and raises SystemExit on bad combos for some providers, but in this session the operator ended up with AGENT_BACKEND=codex + DEFAULT_MODEL=opus in /etc/kai/env more than once — either the validator missed the codex case or the wizard wrote past it.
Repro: start with a working claude install, run make config, switch backend to codex, accept defaults through the rest of the wizard. Inspect install.conf and /etc/kai/env afterward.
Bug 2 — make install strips CODEX_BIN through its inner sudo
The Makefile recipe is:
install:
sudo DRY_RUN="$(DRY_RUN)" $(BIN)python -m kai install apply
When the operator runs sudo CODEX_BIN=/path/to/codex make install, make inherits CODEX_BIN but does not forward it to the inner sudo call. The python process never sees it, the apply path skips the CODEX_BIN write, and /etc/kai/env does not get the path.
The proven workaround during the smoke test was bypassing make entirely with sudo CODEX_BIN=... .venv/bin/python -m kai install apply. The proper fix is one of:
- Add
CODEX_BIN="$(CODEX_BIN)" to the recipe so make forwards it.
- Better: prompt for the codex binary path in
make config and persist it in install.conf, eliminating the env-var dance entirely.
The second option also closes Bug 4.
Bug 3 — Post-install codex login hint is stale
install.py _cmd_apply prints:
Codex subscription auth required:
sudo -u kai codex login
But with the per-user os_user wrap from PR #484, the auth lives at ~<os_user>/.codex/auth.json, not at ~kai/.codex/auth.json. The hint sends the operator down a path that produces Operation not permitted / Permission denied. The hint should enumerate the distinct os_user values from users.yaml and tell each one to run codex login as themselves.
There is also a stale variant earlier in _cmd_config ("After install, run 'codex login' as the service user.") that has the same problem.
Bug 4 — Sudoers regression after make install without CODEX_BIN=
install.py _generate_sudoers reads CODEX_BIN from os.environ with a fallback of /opt/homebrew/bin/codex. Operators who ran make install without setting CODEX_BIN (because they forgot, or because Bug 2 ate it) ended up with sudoers reverting to the Homebrew default — which doesn't match the npm-global path codex actually lives at. Sudo then can't find the binary and fails the spawn with a password is required.
The right fix: persist the codex binary path in install.conf via a wizard prompt (same change that closes Bug 2). The fallback can stay for first-time installs where the path is unknown, but it should never silently overwrite a previously correct value.
Combined fix
A single PR that adds a codex-binary-path prompt to _cmd_config, stores it in install.conf, and threads it through _apply_secrets and _generate_sudoers would close Bugs 2 and 4 outright, and provide the natural anchor for re-prompting DEFAULT_MODEL (Bug 1) and re-rendering the login hint per-user (Bug 3).
Acceptance
Background
The 2026-05-15 codex backend smoke test (#485) surfaced four interacting wizard / install-flow bugs that cost multiple deploy iterations to work around. Each is independent but they share a theme: the wizard and the
install applypath do not robustly carry codex-specific configuration through to the running bot.Bug 1 —
make configdoes not switchDEFAULT_MODELon backend changeWhen the operator runs
make configand switchesAGENT_BACKENDfrom claude to codex, the wizard accepts the priorDEFAULT_MODEL="opus"value and writes it intoinstall.conf. The nextinstall applywrites that value into/etc/kai/env. The bot starts up with a claude model selected against the codex backend; the pool dispatches aStarting persistent Codex app-server process (model=opus, ...)which codex either fails on or silently substitutes its default.The wizard should validate
(AGENT_BACKEND, DEFAULT_MODEL)together and either re-prompt or reset the model to a backend-appropriate default. The apply-time validator in_cmd_applyalready runsvalidate_model_for_providerand raisesSystemExiton bad combos for some providers, but in this session the operator ended up withAGENT_BACKEND=codex+DEFAULT_MODEL=opusin/etc/kai/envmore than once — either the validator missed the codex case or the wizard wrote past it.Repro: start with a working claude install, run
make config, switch backend to codex, accept defaults through the rest of the wizard. Inspectinstall.confand/etc/kai/envafterward.Bug 2 —
make installstripsCODEX_BINthrough its inner sudoThe Makefile recipe is:
When the operator runs
sudo CODEX_BIN=/path/to/codex make install,makeinheritsCODEX_BINbut does not forward it to the innersudocall. The python process never sees it, the apply path skips theCODEX_BINwrite, and/etc/kai/envdoes not get the path.The proven workaround during the smoke test was bypassing
makeentirely withsudo CODEX_BIN=... .venv/bin/python -m kai install apply. The proper fix is one of:CODEX_BIN="$(CODEX_BIN)"to the recipe so make forwards it.make configand persist it ininstall.conf, eliminating the env-var dance entirely.The second option also closes Bug 4.
Bug 3 — Post-install
codex loginhint is staleinstall.py_cmd_applyprints:But with the per-user
os_userwrap from PR #484, the auth lives at~<os_user>/.codex/auth.json, not at~kai/.codex/auth.json. The hint sends the operator down a path that producesOperation not permitted/Permission denied. The hint should enumerate the distinctos_uservalues fromusers.yamland tell each one to runcodex loginas themselves.There is also a stale variant earlier in
_cmd_config("After install, run 'codex login' as the service user.") that has the same problem.Bug 4 — Sudoers regression after
make installwithoutCODEX_BIN=install.py_generate_sudoersreadsCODEX_BINfromos.environwith a fallback of/opt/homebrew/bin/codex. Operators who ranmake installwithout settingCODEX_BIN(because they forgot, or because Bug 2 ate it) ended up with sudoers reverting to the Homebrew default — which doesn't match the npm-global path codex actually lives at. Sudo then can't find the binary and fails the spawn witha password is required.The right fix: persist the codex binary path in
install.confvia a wizard prompt (same change that closes Bug 2). The fallback can stay for first-time installs where the path is unknown, but it should never silently overwrite a previously correct value.Combined fix
A single PR that adds a codex-binary-path prompt to
_cmd_config, stores it ininstall.conf, and threads it through_apply_secretsand_generate_sudoerswould close Bugs 2 and 4 outright, and provide the natural anchor for re-promptingDEFAULT_MODEL(Bug 1) and re-rendering the login hint per-user (Bug 3).Acceptance
make configvalidatesDEFAULT_MODELagainst the chosen backend and re-prompts on mismatch.make configprompts for the codex binary path on codex installs, withwhich codexas the suggested default and validation that the path exists.install.confand used by both_apply_secrets(writesCODEX_BINto/etc/kai/env) and_generate_sudoers(writes the matching SETENV rule).sudo make installworks without an explicitCODEX_BIN=prefix.codex loginactions for eachos_user.