Skip to content

Add Python help contract regression coverage#47

Draft
t-kalinowski wants to merge 10 commits intomainfrom
add-python-help-contract-tests
Draft

Add Python help contract regression coverage#47
t-kalinowski wants to merge 10 commits intomainfrom
add-python-help-contract-tests

Conversation

@t-kalinowski
Copy link
Copy Markdown
Member

@t-kalinowski t-kalinowski commented Apr 25, 2026

Summary

Add public regression coverage for the native Python help contract.

The new tests cover help(len), pydoc.help(len), and an interactive help() roundtrip through the public Python REPL surface. They assert that help output stays inline, does not expose pager prompts, returns to the Python prompt, and leaves the session usable for follow-up input.

This also pins Python startup to pydoc's plain in-band pager so inherited PAGER, MANPAGER, or terminal settings cannot hand help output to an external pager. The PR adds a files-mode snapshot for the same help flow and moves the Python help contract plan from active to completed.

Public changes

  • Adds regression coverage for documented Python help() / pydoc.help() behavior.
  • Keeps Python help output in-band even when the parent environment has an interactive pager configured.
  • No public API or tool schema changes.

Internal changes

  • Adds a Python files-mode snapshot helper for McpSnapshot.
  • Overrides pydoc.pager to pydoc.plainpager during Python worker startup.
  • Normalizes Python help snapshots across Python versions while leaving prompt readiness to the direct behavior test.
  • Normalizes current Codex CLI-only approval metadata in the approval TUI snapshots.
  • Stabilizes existing worker-process unit tests that were blocking the required plain cargo test check under parallel execution.
  • Closes the checked-in Python help contract plan.

Diff composition

  • Runtime code: Python worker startup only.
  • Tests and snapshots: Python help integration coverage, cross-version files-mode help snapshots, approval snapshot normalization, and worker-process test stabilization.
  • Docs/plans: moves the Python help contract plan from active to completed.
  • Public API/schema: unchanged.

Finding:
- [P1] Gate the TestResult import on non-Windows — /Users/tomasz/.codex/worktrees/24fc/mcp-repl/tests/python_help_snapshots.rs:5-5
  On Windows, every item that references `TestResult` in this file is behind `#[cfg(not(windows))]`, but this import is unconditional. The Windows CI clippy step uses `-D warnings`, so the new test target fails with an unused-import warning before any tests run; put this import under the same cfg or gate the whole file.

Response:
- Moved the `TestResult` import under the existing `#[cfg(not(windows))]` gate alongside `McpSnapshot`, and gated the new `Regex` import the same way.

Finding:
- [P2] Normalize the Python help banner before snapshotting — /Users/tomasz/.codex/worktrees/24fc/mcp-repl/tests/snapshots/python_help_snapshots__python_help_contract.snap:70-70
  When this snapshot runs with any `python3` other than 3.13, `help()` emits that interpreter's minor version in both the welcome text and docs URL, while the test only checks `common::python_available()` and does not pin Python. This makes the new snapshot fail on non-3.13 environments; normalize or redact the versioned banner before asserting the snapshot.

Response:
- Normalized the Python help welcome line and docs URL before asserting both rendered and transcript snapshots, then updated the stored snapshots to use `<VERSION>`.
Finding:
- [P2] Isolate Python help tests from pager environment — /Users/tomasz/.codex/worktrees/24fc/mcp-repl/tests/python_backend.rs:732-735
  When the test process has `PAGER` or `MANPAGER` set to an interactive pager, e.g. `PAGER=less TERM=xterm`, this session inherits it; because the Python worker runs on a PTY, `pydoc` launches the pager for `help(len)` and the call times out as busy instead of returning inline text. The snapshot test uses the same inherited environment, so the new coverage is environment-dependent unless the test sanitizes the pager env or the runtime forces pydoc's plain in-band pager.

Response:
- Routed the Python help regression sessions through a test-only environment that clears `PAGER` and `MANPAGER` and sets `TERM=dumb`, which keeps pydoc on its plain in-band pager path. The snapshot normalization now also strips Python's overstrike bolding so snapshots stay stable across terminal settings.
Finding: [P2] Test help without disabling pydoc's pager

When the server inherits a normal TTY environment, such as `TERM=xterm` with `PAGER`/`MANPAGER` set to `less` or with `less` available, stdlib `pydoc` will launch an external pager because the Python worker only sets `PYTHON_BASIC_REPL`. This helper forces `PAGER`/`MANPAGER` empty and `TERM=dumb` for the new help tests, which disables pydoc's pager selection, so the regression tests can pass while the documented public help contract still fails for those users.

Response: Python worker startup now overrides `pydoc.pager` with the plain in-band pager and waits briefly for the request-start signal before returning from pydoc output, so the normal prompt path can complete the request. The Python help tests now run with `PAGER=less`, `MANPAGER=less`, and `TERM=xterm`, so they exercise the inherited interactive-pager environment instead of masking it.
Normalize current Codex CLI fields that are not part of the approval snapshots, and make the idle guardrail pager unit test assert against a full in-memory page so parallel test-worker output cannot push the notice off the first page.

This keeps the required plain cargo test command passing without accepting environment-specific snapshot churn.
Finding: [P1] Call plainpager with one argument

When the selected interpreter is Python 3.12 or older, `pydoc.plainpager` only accepts `text`; passing `title` raises `TypeError` for `help(len)` and `pydoc.help(len)`. Since the Python backend resolves whatever `python3`/venv Python is available, this breaks the new help contract on common supported installations; calling `pydoc.plainpager(text)` is compatible with both older Python and 3.13+.

Response: The pydoc wrapper now calls `pydoc.plainpager(text)` with one argument. Local Python 3.13 accepts the one-argument call, and the focused Python help tests plus the full required check set pass.
Finding: round 5 review stderr reported `git diff --check` failures in `tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap`:

tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:10: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:18: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:28: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:33: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:37: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:40: trailing whitespace.
+<<< help>
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:46: trailing whitespace.
+<<<
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:49: trailing whitespace.
+<<< help>
tests/snapshots/python_help_snapshots__python_help_contract@transcript.snap:54: trailing whitespace.
+<<<

Response: The Python help snapshot normalizer now trims trailing whitespace from rendered snapshot lines before assertion, and the transcript snapshot has been updated to remove prompt and blank-line trailing spaces while preserving the visible transcript content.
The plain cargo test check exposed two test-harness races:

- timed_out_request_end_with_exited_worker_reports_session_end_immediately assumed a short sleep was enough for the test child to exit and IPC request_end to be observed.

- pager_empty_input_preserves_idle_guardrail_notice could lose its guardrail notice when another parallel test reset the shared output ring mid-assertion.

Response: make the timed-out request test wait for the test child and use a bounded IPC completion wait; make the pager guardrail test retry its isolated setup when the shared output ring is reset concurrently.
The Linux CI cargo test job failed in python_help_contract_snapshot because the rendered help banner and prompt-only entries vary across Python installations.

Response: normalize the help() intro banner to a sentinel and remove incidental primary/continuation prompt-only entries from the rendered snapshot while keeping the direct behavior test responsible for prompt readiness and follow-up execution.
Linux CI showed Python help() can emit a blank transcript line before the help utility banner.

Response: treat that blank line as part of the normalized help banner prologue so the snapshot remains focused on the help contract instead of interpreter banner formatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant