Skip to content

Harden bundled Copilot CLI extraction against corrupt/partial installs#1635

Open
jmoseley wants to merge 1 commit into
mainfrom
jmoseley/harden-cli-extraction
Open

Harden bundled Copilot CLI extraction against corrupt/partial installs#1635
jmoseley wants to merge 1 commit into
mainfrom
jmoseley/harden-cli-extraction

Conversation

@jmoseley

Copy link
Copy Markdown
Contributor

Summary

On a Windows user's laptop, the GitHub Copilot desktop app's bundled gh.exe failed to launch with ERROR_BAD_EXE_FORMAT ("The specified executable is not a valid application for this OS platform"). Investigation traced the class of bug to non-atomic binary extraction that returns an unverified file. This PR applies the equivalent fix for the Copilot CLI binary that copilot-sdk bundles/extracts (the bundled-cli path in the Rust crate).

This is the "CLI in copilot-sdk" half of Tim Clem's ask. The "gh in the app" half is a separate PR in github/github-app. The safe-publish pattern here mirrors src-tauri/src/embedded_git.rs (stage in same-dir temp → write integrity marker → atomic fs::rename, with explicit Windows non-atomic-overwrite handling).

Root cause

rust/src/embeddedcli.rs extracted the embedded archive and wrote the binary with a non-atomic open → truncate → write directly on the live final path, then returned without re-verifying the written bytes. The "already installed?" fast path trusted a bare is_file() check. So an interrupted write, a multi-process race, or Windows Defender / ASR quarantining the freshly-written executable could leave a truncated or corrupt image that gets handed back as "good" — and fails to launch.

Fix

  • Atomic write — extract to a unique temp file (pid + counter + nanos) in the same target directory, write_allflushsync_all, set 0o755 on unix, then fs::rename onto the final path. rename overwrites atomically on POSIX; on Windows (which won't atomically overwrite) we remove-then-rename, with the race window guarded by re-verification and by accepting a peer's identical install.
  • Verify after publish — the bytes extracted from the trusted, signed embedded archive are the known-good reference. We verify the staged temp file and re-verify the published file byte-for-byte (size checked first), catching truncation or AV tampering between staging and publishing.
  • Trustworthy fast path — an existing install is reused only after a cheap re-check: an integrity marker recording the expected size plus a platform executable-image header check (PE MZ / Mach-O / ELF). A truncated, empty, or quarantined binary is re-extracted instead of trusted.
  • Retry + actionable fallback — re-extract/re-publish up to 3 times to ride out transient AV interference; if it still fails, surface a clear error ("bundled CLI appears blocked or corrupt … possibly quarantined by antivirus") instead of returning a broken path.

The change is scoped to the CLI extraction/install path. Public behavior is unchanged for the happy path; the installer is just no longer able to return a broken executable.

Tests

New unit tests in embeddedcli.rs cover:

  • temp + atomic-rename publish writes the exact bytes, records the marker, and leaves no temp files;
  • publish overwrites a stale existing binary;
  • corrupt / truncated / empty / unmarked / invalid-image installs are detected and rejected by the fast-path re-check;
  • post-write verification rejects both size and content mismatches.

The existing install_bundled_cli_* integration tests (which exercise the full extract → publish → verify → marker flow against the real downloaded archive and assert idempotency) continue to pass.

Validation

  • cargo +nightly fmt --check
  • cargo clippy --all-targets --features test-support -- -D warnings -D clippy::unwrap_used … ✅ (both the skip-download and real-bundled paths)
  • cargo doc --no-deps --all-features with -D warnings
  • cargo build (bundled, downloads real CLI) ✅
  • cargo test --lib embeddedcli + install_bundled_cli integration tests ✅

The embedded-CLI installer opened the final binary path with
open→truncate→write and returned without re-verifying the written bytes.
An interrupted write, a multi-process race, or Windows Defender/ASR
quarantining the freshly-written executable could leave an invalid image
that was handed back as "good" — surfacing downstream as
ERROR_BAD_EXE_FORMAT ("not a valid application for this OS platform").

Replace the in-place write with a safe publish pattern:

- Atomic write: extract to a unique temp file (pid + counter + nanos) in
  the *same* target directory, flush + fsync, set 0o755 on unix, then
  fs::rename onto the final path. rename overwrites atomically on POSIX;
  on Windows (no atomic overwrite) we remove-then-rename, with the race
  guarded by re-verification and peer-install acceptance.
- Verify after publish: the bytes extracted from the trusted embedded
  archive are the known-good reference. Verify the staged temp file and
  again re-verify the published file byte-for-byte (size first), catching
  truncation or AV tampering between staging and publishing.
- Trustworthy fast path: an existing install is reused only after a cheap
  re-check (integrity-marker size + executable-image header); a truncated,
  empty, or quarantined binary is re-extracted instead of trusted.
- Retry + fallback: re-extract/re-publish up to 3 times to ride out
  transient AV interference, then surface a clear, actionable error
  ("appears blocked or corrupt ... possibly quarantined by antivirus")
  rather than returning a broken path.

Adds unit tests covering the temp+rename publish, marker recording,
detection of corrupt/truncated/unmarked/invalid-image installs, and
post-write size/content verification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmoseley jmoseley requested a review from a team as a code owner June 11, 2026 21:00
Copilot AI review requested due to automatic review settings June 11, 2026 21:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Rust SDK’s bundled Copilot CLI extraction path (bundled-cli) so it does not return a truncated/corrupt executable after partial installs, races, or antivirus interference by staging to a temp file, verifying bytes, publishing via rename, and recording an integrity marker for a safer fast-path.

Changes:

  • Adds an integrity marker + executable-header check to validate an existing installed bundled CLI before reusing it.
  • Implements a staged write + fsync + atomic publish + post-publish verification flow with retries and clearer failure reporting.
  • Adds unit tests covering publish/overwrite behavior, fast-path rejection of corrupt installs, and verification failures.
Show a summary per file
File Description
rust/src/lib.rs Updates public docs for install_bundled_cli() to reflect integrity re-check behavior.
rust/src/embeddedcli.rs Reworks bundled CLI install to stage+verify+publish with marker-based fast path and unit tests.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment thread rust/src/embeddedcli.rs
Comment on lines +348 to +358
fn publish(tmp: &Path, final_path: &Path) -> Result<(), EmbeddedCliError> {
match fs::rename(tmp, final_path) {
Ok(()) => Ok(()),
Err(_) if final_path.exists() => {
let _ = fs::remove_file(final_path);
fs::rename(tmp, final_path)
.map_err(|e| EmbeddedCliError::new(EmbeddedCliErrorKind::Publish, e))
}
Err(e) => Err(EmbeddedCliError::new(EmbeddedCliErrorKind::Publish, e)),
}
}
Comment thread rust/src/embeddedcli.rs
//! The embedded bytes are part of the consumer's signed binary and therefore
//! trusted *as the source of truth* — but the bytes that land on disk are not.
//! A non-atomic write, a multi-process race, or antivirus quarantining the
//! freshly-written executable can leave a truncated or corrupt image that, if

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but I thought AV quarantining was generally done by setting some other flag in filesystem metadata, not modifying the bytes of the executable. Maybe some AV software differs.

Doesn't change the validity of the PR though

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: looks like Windows Defender will actually move the file into a different dir if it quarantines it.

Comment thread rust/src/embeddedcli.rs
/// image header for the current platform (PE on Windows, Mach-O on macOS,
/// ELF elsewhere). Returns `false` on any I/O error or unrecognized header.
#[cfg(any(has_bundled_cli, test))]
fn looks_like_valid_image(path: &Path) -> bool {

@SteveSandersonMS SteveSandersonMS Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted a more end-to-end problem detector we could consider something like detecting whether the spawn fails (that is, we never get to the point of having a valid handshake) and if so track a flag so that next time the app is launched, we do a full re-extraction. Then we wouldn't need so many other checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants