Skip to content

codex wizard hardening (post-#485 smoke test followups) #486

@dcellison

Description

@dcellison

Background

The 2026-05-15 codex backend smoke test (#485) surfaced four interacting wizard / install-flow bugs that cost multiple deploy iterations to work around. Each is independent but they share a theme: the wizard and the install apply path do not robustly carry codex-specific configuration through to the running bot.

Bug 1 — make config does not switch DEFAULT_MODEL on backend change

When the operator runs make config and switches AGENT_BACKEND from claude to codex, the wizard accepts the prior DEFAULT_MODEL="opus" value and writes it into install.conf. The next install apply writes that value into /etc/kai/env. The bot starts up with a claude model selected against the codex backend; the pool dispatches a Starting persistent Codex app-server process (model=opus, ...) which codex either fails on or silently substitutes its default.

The wizard should validate (AGENT_BACKEND, DEFAULT_MODEL) together and either re-prompt or reset the model to a backend-appropriate default. The apply-time validator in _cmd_apply already runs validate_model_for_provider and raises SystemExit on bad combos for some providers, but in this session the operator ended up with AGENT_BACKEND=codex + DEFAULT_MODEL=opus in /etc/kai/env more than once — either the validator missed the codex case or the wizard wrote past it.

Repro: start with a working claude install, run make config, switch backend to codex, accept defaults through the rest of the wizard. Inspect install.conf and /etc/kai/env afterward.

Bug 2 — make install strips CODEX_BIN through its inner sudo

The Makefile recipe is:

install:
	sudo DRY_RUN="$(DRY_RUN)" $(BIN)python -m kai install apply

When the operator runs sudo CODEX_BIN=/path/to/codex make install, make inherits CODEX_BIN but does not forward it to the inner sudo call. The python process never sees it, the apply path skips the CODEX_BIN write, and /etc/kai/env does not get the path.

The proven workaround during the smoke test was bypassing make entirely with sudo CODEX_BIN=... .venv/bin/python -m kai install apply. The proper fix is one of:

  • Add CODEX_BIN="$(CODEX_BIN)" to the recipe so make forwards it.
  • Better: prompt for the codex binary path in make config and persist it in install.conf, eliminating the env-var dance entirely.

The second option also closes Bug 4.

Bug 3 — Post-install codex login hint is stale

install.py _cmd_apply prints:

Codex subscription auth required:
  sudo -u kai codex login

But with the per-user os_user wrap from PR #484, the auth lives at ~<os_user>/.codex/auth.json, not at ~kai/.codex/auth.json. The hint sends the operator down a path that produces Operation not permitted / Permission denied. The hint should enumerate the distinct os_user values from users.yaml and tell each one to run codex login as themselves.

There is also a stale variant earlier in _cmd_config ("After install, run 'codex login' as the service user.") that has the same problem.

Bug 4 — Sudoers regression after make install without CODEX_BIN=

install.py _generate_sudoers reads CODEX_BIN from os.environ with a fallback of /opt/homebrew/bin/codex. Operators who ran make install without setting CODEX_BIN (because they forgot, or because Bug 2 ate it) ended up with sudoers reverting to the Homebrew default — which doesn't match the npm-global path codex actually lives at. Sudo then can't find the binary and fails the spawn with a password is required.

The right fix: persist the codex binary path in install.conf via a wizard prompt (same change that closes Bug 2). The fallback can stay for first-time installs where the path is unknown, but it should never silently overwrite a previously correct value.

Combined fix

A single PR that adds a codex-binary-path prompt to _cmd_config, stores it in install.conf, and threads it through _apply_secrets and _generate_sudoers would close Bugs 2 and 4 outright, and provide the natural anchor for re-prompting DEFAULT_MODEL (Bug 1) and re-rendering the login hint per-user (Bug 3).

Acceptance

  • make config validates DEFAULT_MODEL against the chosen backend and re-prompts on mismatch.
  • make config prompts for the codex binary path on codex installs, with which codex as the suggested default and validation that the path exists.
  • The path is persisted in install.conf and used by both _apply_secrets (writes CODEX_BIN to /etc/kai/env) and _generate_sudoers (writes the matching SETENV rule).
  • sudo make install works without an explicit CODEX_BIN= prefix.
  • Post-install hints enumerate the per-user codex login actions for each os_user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions