leanprover · mhuisi · Apr 14, 2026 · Apr 29, 2026 · Apr 15, 2026
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -11,26 +11,66 @@ npm install
 # Build all packages
 npm run build
 
-# Run all tests (from repo root or vscode-lean4/ subdirectory)
-# Tests should not be run locally, only in CI
-npm run test
-
 # Watch mode for development (rebuilds on file changes)
 npm run watch
 
-# Watch mode including test compilation
-npm run watchTest
-
 # Lint all TypeScript/TSX files
 npm run lint
 
 # Package as .vsix for local install testing (from vscode-lean4/)
 cd vscode-lean4 && npm run package
 ```
 
-Tests require `leanprover/lean4:nightly`, `leanprover/lean4:stable`, and a `default` Lean 4 toolchain installed via elan. They download a separate VS Code instance into `.vscode-test/` on first run.
+## Testing
+
+The full test framework is documented in `vscode-lean4/test/README.md`. Read it once before authoring or debugging tests — it covers prompt stubbing, the helper map, and the per-file VS Code instance model. The summary below covers the day-to-day commands and how to diagnose failures.
+
+### Running tests locally
+
+Always use the `test:headless*` variants — they route through `scripts/headless.mjs`, which wraps in `xvfb-run -a` on Linux (avoiding focus-stealing for the duration of the suite; requires `xorg-x11-server-Xvfb` / `xvfb`) and runs the inner command directly on Windows / macOS. Plain `npm test` opens real VS Code windows on Linux.
+
+```bash
+# Full suite: vitest unit tests + infoview tsc check + grammar + vscode-test-cli + wdio.
+npm run test:headless
+
+# Just the wdio (UI/InfoView) subset, from vscode-lean4/.
+npm run test:headless:wdio --workspace=lean4
+
+# Network suite (real network: elan install, lake clone, mathlib cache).
+npm run test:headless:network --workspace=lean4
+
+# Vitest only — pure-TypeScript tests, no VS Code, no xvfb needed.
+npm run test:unit
+
+# Single vscode-test-cli file (replace <name>):
+cd vscode-lean4 && node scripts/headless.mjs npx vscode-test --label cli-<name>
+# e.g. cli-abbreviation, cli-launch-modes, cli-project-actions
+# (label list comes from filenames under test/vscode-test-cli/)
+```
+
+`npm ci` does not build workspace `dist/` directories; both type-aware lint and the `lean4-infoview` tsc check require them. Run `npm run build:dev` (or `npm run build`) first if you've just cloned or run `npm ci`.
+
+### Diagnosing test failures
+
+The framework writes failure artifacts to two gitignored directories under `vscode-lean4/`. Both are wiped at the start of each run.
+
+- **`vscode-test-cli-output/`** — populated by every `vscode-test-cli` and `nightly` test. On failure, `helpers/teardown.ts:dumpStateIfFailed` writes `<safe-test-name>.json` with the workbench's textual state at the failure point: diagnostics for every URI, active editor (uri/cursor/text), visible/open editors, and per-channel transcripts of every write the extension made via `window.createOutputChannel`. The output channels are also mirrored to the test runner's stdout in real time as `[output:<channel name>] <text>` — so the CI transcript reads like a live session.
+- **`wdio-output/`** — populated by the wdio suite. Always: `wdio-junit-<cid>.xml` (junit), `vscode-logs/` (extension host + renderer logs from `--logs-path`). On failure: `<safe-test-name>.png` (workbench screenshot from `afterTest`).
+
+`runCliTests.mjs` re-emits the failure detail of any failed `cli-*` label at the bottom of the run, so a single assertion is the last thing in the scrollback rather than scrolled off behind nine labels' output.
+
+`scripts/runAllTests.mjs` orchestrates the full top-level suite (unit / infoview tsc / grammar / vscode-test-cli / wdio) and **continues past failures** rather than `&&`-bailing on the first one — so a vitest regression doesn't hide a wdio regression in the same CI run. Final exit code is non-zero if any suite failed; the per-suite pass/fail summary lands at the bottom of the log.
+
+### Diagnosing CI failures
+
+CI runs three test jobs in `.github/workflows/test.yml` (`test`, `test-nightly`, `network`), each as an OS matrix over `ubuntu-latest` × `windows-latest`, plus `lint`/`package` in `.github/workflows/on-push.yml`. All test-job artifacts (`vscode-test-cli-output/`, `wdio-output/`, plus the raw stdout) are uploaded for inspection.
+
+For a failure, follow this order:
 
-To run a specific test suite, set `LEAN4_TEST_FOLDER` to one of: `bootstrap`, `info`, `simple`, `restarts`, `multi`, `lakefileTomlSchema`, `toolchains`.
+1. **Read the captured `[output:<channel>] …` lines** in the job log around the failing test. Lake stdout, LSP server output, and the extension's own diagnostics flow through here. Often the assertion is downstream of an earlier `error:` line that names the actual cause (e.g., a network failure or a missing toolchain step).
+2. **Download the artifact zip** via the GitHub Actions UI and unzip locally. The `vscode-test-cli-output/<safe-test-name>.json` for a failed test contains the post-failure workbench state in JSON form.
+3. **For wdio failures**, look at the saved screenshot first — it shows what the workbench looked like at failure time, including any unexpected modal, focus state, or partial render. Pair it with `vscode-logs/` for renderer / extension-host errors.
+4. **For "doesn't reproduce locally"**, the most common causes are: workspace `dist/` not built before the failing step (CI's fresh runner has empty `dist/`; `npm ci` does not build); `LEAN4_TEST_HOME_OVERRIDE` set to a path inside an outer git repo (CI sets it to `${{ github.workspace }}/.lean4-test-home`, which is inside the actions/checkout clone — `setupTestHome.mjs` sets `GIT_CEILING_DIRECTORIES` to defang this); GitHub-hosted runner image changes; or a Lean stable release that broke a fixture (the toolchain pin policy is documented in `test/README.md`).
 
 ## Architecture
 

diff --git a/.eslintrc.js b/.eslintrc.js
@@ -31,13 +31,16 @@ module.exports = {
             './lean4-unicode-input/tsconfig.json',
             './lean4-unicode-input-component/tsconfig.json',
             './vscode-lean4/tsconfig.json',
+            './vscode-lean4/tsconfig.test.json',
             './vscode-lean4/webview/tsconfig.json',
             './vscode-lean4/loogleview/tsconfig.json',
             './vscode-lean4/abbreviationview/tsconfig.json',
         ],
         sourceType: 'module',
     },
-    ignorePatterns: ['**/dist/**/*', '**/out/**/*', '*.js'],
+    // ESLint v8 ignores files starting with `.` by default; the negation
+    // re-includes our dotfile-prefixed configs (`.vscode-test.mjs`).
+    ignorePatterns: ['**/dist/**/*', '**/out/**/*', '*.js', '!.vscode-test.mjs'],
     plugins: ['@typescript-eslint', '@typescript-eslint/eslint-plugin'],
     rules: {
         '@typescript-eslint/adjacent-overload-signatures': 'error',
@@ -140,4 +143,16 @@ module.exports = {
         radix: 'off',
         'no-shadow': 'off',
     },
+    overrides: [
+        {
+            // .mjs config/scripts (`.vscode-test.mjs`, `wdio.conf.mjs`, `scripts/*.mjs`)
+            // are not part of any tsconfig, so the type-aware ruleset can't run on them.
+            // `disable-type-checked` turns off every rule from `recommended-requiring-type-checking`
+            // (which is extended at the top level) and `parserOptions.project: null` keeps the
+            // parser from trying to look the file up in a project.
+            files: ['*.mjs'],
+            parserOptions: { project: null },
+            extends: ['plugin:@typescript-eslint/disable-type-checked'],
+        },
+    ],
 }
diff --git a/.github/actions/cache-vscode/action.yml b/.github/actions/cache-vscode/action.yml
@@ -0,0 +1,38 @@
+name: Cache VS Code test binaries
+description: >
+  Resolve the current stable VS Code version and restore/save the VS Code
+  test binary cache keyed on that version. Caches both `.vscode-test`
+  (vscode-test-cli) and `.wdio-vscode-service` (wdio); jobs that don't
+  run wdio tolerate the second path being absent harmlessly.
+
+runs:
+  using: composite
+  steps:
+    # Resolve `:stable` to a concrete VS Code version so the cache key rotates
+    # automatically when stable bumps. The previous keying — bare
+    # `vscode-test-${{ runner.os }}-stable` — never changed when stable
+    # moved, so each runner kept reusing the version that was first cached.
+    - name: Resolve VS Code stable version
+      id: vscode
+      uses: actions/github-script@v7
+      with:
+        result-encoding: string
+        script: |
+          const r = await fetch('https://update.code.visualstudio.com/api/releases/stable')
+          const versions = await r.json()
+          core.setOutput('version', versions[0])
+
+    - name: Cache VS Code test binaries
+      uses: actions/cache@v4
+      with:
+        path: |
+          vscode-lean4/.vscode-test
+          vscode-lean4/.wdio-vscode-service
+        # Include the resolved version so a new stable triggers a fresh
+        # download. `restore-keys:` falls back to the most recent cache for
+        # this OS so the new run still benefits from partial reuse (the
+        # `.wdio-vscode-service/` extensions dir, etc.) while it pulls the
+        # new VS Code binary.
+        key: vscode-test-${{ runner.os }}-${{ steps.vscode.outputs.version }}
+        restore-keys: |
+          vscode-test-${{ runner.os }}-
diff --git a/.github/actions/setup-elan/action.yml b/.github/actions/setup-elan/action.yml
@@ -0,0 +1,94 @@
+name: Set up elan + Lean toolchain
+description: >
+  Install elan into the test-home directory (if not cached), prepend its
+  `bin/` to `$GITHUB_PATH`, and pre-install a Lean toolchain so the
+  download surfaces as a named CI step rather than blocking the first
+  test invocation. Optionally sets that toolchain as the elan default.
+  The caller is responsible for restoring/saving the elan cache around
+  this action; cache-key choice is job-specific (resolved version for
+  `:stable`, UTC date for `:nightly`).
+
+inputs:
+  channel:
+    description: Lean toolchain channel (e.g. leanprover/lean4:stable, leanprover/lean4:nightly).
+    required: true
+  default:
+    description: If `'true'`, set the channel as elan's default toolchain after install.
+    required: false
+    default: 'false'
+
+runs:
+  using: composite
+  steps:
+    # Run the same install scripts the extension itself executes (see
+    # `vscode-lean4/scripts/elan-install/install-{unix.sh,windows.ps1}`,
+    # imported by `src/utils/leanInstaller.ts` via webpack `asset/source`).
+    # Single source of truth — edit the script files and both production
+    # and CI pick up the change.
+    #
+    # The scripts default-toolchain to `leanprover/lean4:stable`; callers
+    # that need a different default (e.g. the nightly-Lean job) pass
+    # `default: true` so the pre-install step below flips it.
+    - name: Install elan (Linux/macOS, if not cached)
+      if: runner.os != 'Windows'
+      shell: bash
+      run: |
+        HOME_OVERRIDE="$GITHUB_WORKSPACE/.lean4-test-home"
+        mkdir -p "$HOME_OVERRIDE"
+        if [ ! -x "$HOME_OVERRIDE/.elan/bin/elan" ]; then
+          HOME="$HOME_OVERRIDE" ELAN_HOME="$HOME_OVERRIDE/.elan" \
+            bash vscode-lean4/scripts/elan-install/install-unix.sh
+        fi
+        echo "$HOME_OVERRIDE/.elan/bin" >> "$GITHUB_PATH"
+
+    - name: Install elan (Windows, if not cached)
+      if: runner.os == 'Windows'
+      shell: pwsh
+      run: |
+        $homeOverride = Join-Path $env:GITHUB_WORKSPACE '.lean4-test-home'
+        New-Item -ItemType Directory -Force -Path $homeOverride | Out-Null
+        $elanBin = Join-Path $homeOverride '.elan\bin\elan.exe'
+        if (-not (Test-Path $elanBin)) {
+          $env:USERPROFILE = $homeOverride
+          $env:ELAN_HOME = (Join-Path $homeOverride '.elan')
+          & "$env:GITHUB_WORKSPACE\vscode-lean4\scripts\elan-install\install-windows.ps1"
+          if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
+        }
+        Add-Content -Path $env:GITHUB_PATH -Value (Join-Path $homeOverride '.elan\bin')
+
+    # `elan-init` only installs the elan binary and records `:stable` as the
+    # default toolchain — it does *not* fetch the toolchain itself. Without
+    # this step, the first test that resolves the channel eats the full
+    # toolchain download and a network blip there surfaces as a confusing
+    # test failure rather than a named CI step. `elan toolchain install`
+    # eagerly re-resolves the channel, so it also covers the cache-hit /
+    # channel-rotated case (cache restored from a previous run with an older
+    # resolution → this step downloads the new toolchain).
+    #
+    # `elan toolchain install` is *not* idempotent: it errors with
+    # `'leanprover/lean4:vX.Y.Z' is already installed` when the cache
+    # already holds the resolved version. Treat that specific message as
+    # success — we wanted the toolchain installed and it is — while
+    # propagating any other non-zero exit (network failure, bad channel,
+    # etc.).
+    #
+    # `shell: bash` works on Windows too because GitHub-hosted Windows
+    # runners ship Git Bash; `elan` is the same idempotency contract on
+    # both OSes once on PATH.
+    - name: Pre-install ${{ inputs.channel }}${{ inputs.default == 'true' && ' + set as elan default' || '' }}
+      shell: bash
+      env:
+        ELAN_HOME: ${{ github.workspace }}/.lean4-test-home/.elan
+        CHANNEL: ${{ inputs.channel }}
+        SET_DEFAULT: ${{ inputs.default }}
+      run: |
+        if out=$(elan toolchain install "$CHANNEL" 2>&1); then
+          echo "$out"
+        else
+          ec=$?
+          echo "$out"
+          echo "$out" | grep -q 'is already installed' || exit "$ec"
+        fi
+        if [ "$SET_DEFAULT" = "true" ]; then
+          elan default "$CHANNEL"
+        fi
diff --git a/.github/workflows/on-push.yml b/.github/workflows/on-push.yml
@@ -14,12 +14,26 @@ on:
   pull_request:
     branches:
     - '*'
+  # Restrict `push` to maintainer-curated refs. Feature branches with an open
+  # PR fire `pull_request` instead, so this avoids the duplicate run while
+  # still gating direct merges to `master` / release branches and tag pushes
+  # (the publish-vsce / release jobs below only run on tag pushes anyway).
   push:
     branches:
-      - '*'
+      - master
+      - 'release/**'
     tags:
       - '*'
 
+# Cancel in-progress runs of this workflow when a newer commit lands on the
+# same ref. Rapid-fire pushes (or a `push` and `pull_request` event for the
+# same ref) coalesce — only the latest run survives. `workflow_dispatch`
+# runs get their own group via `github.run_id` so a manual publish/release
+# never cancels another in-flight publish/release.
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name == 'workflow_dispatch' && github.run_id || '' }}
+  cancel-in-progress: true
+
 permissions:
   contents: write
   id-token: write
@@ -83,65 +97,33 @@ jobs:
           name: vscode-lean4
           path: 'vscode-lean4/lean4-*.vsix'
 
-  build-and-test:
+  lint:
     if: github.event_name != 'workflow_dispatch'
-    strategy:
-      fail-fast: false
-      matrix:
-        include:
-          - name: build-and-test-linux
-            os: ubuntu-latest
-            artifact: build-Linux release
-          # - name: build-and-test-macos
-          #   os: macos-latest
-          #   artifact: build-macOS
-          - name: build-and-test-windows
-            os: windows-latest
-            artifact: build-Windows
-    name: ${{ matrix.name }}
-    runs-on: ${{ matrix.os }}
-
+    runs-on: ubuntu-latest
     steps:
       - name: Checkout
         uses: actions/checkout@v2
-        with:
-          fetch-depth: 0
 
       - name: Setup Node.js
         uses: actions/setup-node@v4
         with:
           node-version: '24'
-          registry-url: 'https://registry.npmjs.org'
 
-      - name: Build
-        run: |
-          npm ci
-          npm run build
+      - name: Install
+        run: npm ci
+
+      # Type-aware lint rules (`no-redundant-type-constituents`,
+      # `no-unnecessary-type-assertion`) need workspace packages' `dist/`
+      # populated so cross-package imports resolve to real types instead of
+      # implicit `any`. Without this, `T | undefined` looks like
+      # `any | undefined` to typescript-eslint and the rule fires
+      # spuriously. Same rationale as the build step in `test.yml`.
+      - name: Build (for type-aware lint)
+        run: npm run build:dev
 
       - name: Lint
         run: npm run lint
 
-      - name: Install Brew Packages
-        run: |
-          brew install ccache tree zstd coreutils
-        if: matrix.os == 'macos-latest'
-
-      - name: Set path to elan on Linux or macOS
-        if: matrix.os == 'ubuntu-latest' || matrix.os == 'macos-latest'
-        run: |
-          echo "$HOME/.elan/bin" >> $GITHUB_PATH
-
-      - name: Set path to elan on Windows
-        shell: pwsh
-        if: matrix.os == 'windows-latest'
-        run: |
-          echo "$HOME\.elan\bin" >> $env:GITHUB_PATH
-
-      - name: Run tests
-        uses: GabrielBB/xvfb-action@v1.0
-        with:
-          run: npm run test
-
   publish-vsce:
     needs: package
     if: startsWith(github.ref, 'refs/tags/v')
@@ -193,9 +175,7 @@ jobs:
   github-release:
     needs: [package, publish-vsce, publish-ovsx]
     # Run even if one of the publish jobs failed, so a GitHub Release with notes
-    # is always created on a tag build. Tests are intentionally not a dependency:
-    # releases ship in parallel with the build-and-test matrix, matching the
-    # previous single-job behavior where publish ran before tests.
+    # is always created on a tag build.
     if: |
       !cancelled() &&
       needs.package.result == 'success' &&