Skip to content

fix(gix-pack)!: accept non-canonical pack entry size headers#2616

Open
_WD_ (0WD0) wants to merge 1 commit into
GitoxideLabs:mainfrom
0WD0:wd/pack
Open

fix(gix-pack)!: accept non-canonical pack entry size headers#2616
_WD_ (0WD0) wants to merge 1 commit into
GitoxideLabs:mainfrom
0WD0:wd/pack

Conversation

@0WD0
Copy link
Copy Markdown

Summary

Accept non-canonical pack entry size headers in gix-pack, matching C Git's read behavior.

Details

C Git accepts overlong pack entry size encodings. For example, b3 00 decodes to a blob with size 3, even though the canonical encoding would be 33.

This can appear in real fetch/clone traffic when a server reuses existing pack data. A real-world example is web3infra-foundation/mega (tested on commit d2d797b333e8aa8a78fc1d8f5c6cae33360bc9bc), where GitHub sends a pack entry for moon/.nvmrc with bytes starting b3 00. C Git can read it and git fsck --full accepts it, but gix-pack currently rejects it with:

Pack entry is truncated: pack entry header uses a non-canonical size encoding

This change removes the canonical-size-length rejection and records the actual number of header bytes consumed.

Preserving the actual header length is important: accepting non-canonical headers while recomputing a canonical length would break pack_offset() reconstruction and ofs-delta base offset calculations.

Pack writing still emits canonical headers.

Copilot AI review requested due to automatic review settings May 22, 2026 07:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adjusts pack entry parsing to accept Git-compatible non-canonical pack entry size encodings by preserving the actual encoded header byte length, preventing offset math from relying on recomputed canonical header lengths.

Changes:

  • Store the actual consumed pack entry header byte length on data::Entry and use it for offset computations.
  • Stop rejecting non-canonical size encodings during pack entry header parsing.
  • Add regression tests to ensure non-canonical encodings decode correctly and keep delta base offsets correct.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
gix-pack/tests/pack/data/fuzzed.rs Adds an integration-style regression test for decoding a pack with a non-canonical entry header.
gix-pack/src/data/mod.rs Extends data::Entry to persist the actual encoded header length (needed to accept non-canonical encodings safely).
gix-pack/src/data/entry/mod.rs Updates offset calculations to rely on the stored header length and fall back when unknown.
gix-pack/src/data/entry/decode.rs Removes canonical-encoding rejection, records consumed header bytes, and updates tests accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gix-pack/src/data/mod.rs Outdated
Comment thread gix-pack/src/data/mod.rs Outdated
Comment thread gix-pack/src/data/entry/mod.rs
C Git accepts overlong pack entry size encodings, and real servers can send
them when reusing existing pack data. Accept these headers while recording
the actual header length consumed from the pack.

Keeping the actual header length avoids recomputing a canonical length from
the decoded size, which would break pack offset reconstruction and
ofs-delta base offset calculations for non-canonical entries.

BREAKING CHANGE: gix_pack::data::Entry now has a public encoded_header_size field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants