Skip to content

fix: harden browser mode stability#117

Open
frankekn wants to merge 3 commits intosteipete:mainfrom
frankekn:fix/browser-stability
Open

fix: harden browser mode stability#117
frankekn wants to merge 3 commits intosteipete:mainfrom
frankekn:fix/browser-stability

Conversation

@frankekn
Copy link

Summary

  • harden browser-mode prompt submission with stdin -p - support, dead-composer detection, tighter prompt truncation handling, and one controlled reload retry
  • lock browser response capture to the expected ChatGPT conversation so stale threads and old attachment turns are not misread as the current run
  • align browser model labels and smoke docs/scripts with the current ChatGPT picker targets, including current Pro routing and consult/browser default expectations

Testing

  • pnpm run lint
  • pnpm run test
  • pnpm run build
  • printf 'Summarize the attached file as exactly two markdown bullets: one for CPU and one for memory.\nDo not quote the prompt.\n' | node dist/bin/oracle-cli.js --engine browser --model gpt-5.4-pro --browser-manual-login --browser-keep-browser --browser-attachments always --verbose --slug browser-stability-smoke-pro-54-fix-4 -p - -f /tmp/browser-report-smoke-4.txt
  • printf 'Return exactly one line and nothing else: pro-ok\n' | node dist/bin/oracle-cli.js --engine browser --model gpt-5.2-instant --browser-manual-login --browser-keep-browser --verbose --slug browser-smoke-pro-deterministic-instant-2 -p -

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb3674866a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (isDeadComposer && !retriedDeadComposer) {
retriedDeadComposer = true;
await reloadPromptComposer();
const submission = await raceWithDisconnect(submitOnce(promptText, attachments));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve prompt-too-large fallback after dead-composer retry

In runBrowserMode, the dead-composer recovery path retries submitOnce directly, but that second submission is no longer inside the prompt-too-large fallback handling. If the reload succeeds but the retried submit then throws code: "prompt-too-large" (with fallbackSubmission available), the error escapes and we skip the upload-based fallback entirely, causing avoidable failures for large prompts.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e4a6e72. I factored submit/retry/fallback handling into a shared recovery helper so a dead-composer reload retry still preserves the prompt-too-large upload fallback path. Covered with a regression test in tests/browser/index.test.ts.

Comment on lines +649 to +650
if (!verified) {
throw new Error("Sent user message did not expose attachment UI after upload.");
logger("Sent user message did not expose attachment UI after upload; continuing.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail run when sent turn attachments cannot be verified

This branch now logs and continues when sent-turn attachment verification fails, which allows the run to complete even when the uploaded files are not actually attached to the submitted user turn. In attachment-dependent prompts, that can silently produce answers without file context (or from the wrong turn) instead of surfacing a hard failure, which is a correctness regression versus the previous behavior.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e4a6e72. Sent-turn attachment verification is a hard failure again when the attachment UI should be present but cannot be verified on the submitted user turn. Regression coverage remains in tests/browser/attachmentsCompletion.test.ts and the browser recovery tests.

@frankekn
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4a6e725cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1741 to +1743
!EXPECTED_PROMPT_PREFIX ||
text.includes(EXPECTED_PROMPT_PREFIX) ||
EXPECTED_PROMPT_PREFIX.includes(text.slice(0, Math.min(text.length, EXPECTED_PROMPT_PREFIX.length)));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty turn text in prompt-match guard

The new prompt guard can be bypassed when the extracted user-turn text is empty: EXPECTED_PROMPT_PREFIX.includes(text.slice(...)) is true for "", so promptMatches becomes true even though no prompt evidence was found. In runs where ChatGPT does not expose innerText for the latest user turn (for example, attachment-heavy or transiently collapsed turns), waitForUserTurnAttachments may incorrectly accept attachment evidence from a non-matching turn and report verification success for the wrong submission.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d031dfe. The prompt-match fallback now requires non-empty extracted turn text before we treat a shorter prefix as evidence, so empty/hidden user-turn text can no longer bypass the guard. Added expression-level regression coverage in tests/browser/pageActionsExpressions.test.ts.

@frankekn
Copy link
Author

frankekn commented Mar 17, 2026

@codex review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant