Skip to content

Release prep: exgit 0.1.0 (hex.pm + open source)#7

Merged
ivarvong merged 13 commits into
mainfrom
release-prep-0.1.0
Jul 1, 2026
Merged

Release prep: exgit 0.1.0 (hex.pm + open source)#7
ivarvong merged 13 commits into
mainfrom
release-prep-0.1.0

Conversation

@ivarvong

@ivarvong ivarvong commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Release prep for exgit 0.1.0 (hex.pm + open source)

Gets the library clean, packaged, and hardened for its first public release. Six focused commits, full quality gauntlet green, and a red-team pass on the surfaces that matter for a git client that parses untrusted network data and handles credentials.

What's in it

  • feat: size-aware readsExgit.FS.size/3 + the ObjectStore.object_size/2 protocol callback let callers learn a blob's size without materializing it: the guard an agent needs before pulling a large file into memory. O(1) for the in-memory store, header-only for on-disk loose objects, and fetch-free ({:error, :not_local}) for un-hydrated promisor objects.
  • security: credential + DoS hardening — tokens embedded in a remote URL (https://token@host/…) are now redacted before entering any telemetry span or security event; ls-refs responses are capped so a hostile server can't stream unbounded refs into client memory.
  • chore: hex.pm packagingvfs moved to the published ~> 0.1.0 hex release, ex_doc added for HexDocs, docs/PERFORMANCE.md shipped so the README link resolves, plus a consumer-install CI smoke test that builds the package as it ships and clones+reads a real repo from a clean :prod project — catching missing files: entries and dev/test compile-coupling the in-repo suite can't see.
  • fix: Dialyzer clean under Elixir 1.20 (MapSet opacity false-positives resolved at the source, not suppressed).
  • chore: Elixir 1.20 toolchain migration (pin-operator bitstring sizes). Minimum-supported version stays 1.17.

Red-team results (hardened before release)

A three-way review (untrusted-input parsing, credential handling, release readiness) found no release blockers. Confirmed solid: pack path has real DoS bounds (max_object_bytes, max_objects, inflate-ratio, delta-depth), path traversal is well-defended, SECURITY.md threat model is thorough, licensing is clean. The two genuine code findings — telemetry credential leak and the unbounded ls-refs path — are fixed in this PR with tests. Two "critical" parsing findings were investigated and disproven (the pkt-line length is 4 hex digits, not 4GB; the varint decoder is tail-recursive, no stack blow-up).

Verification

  • mix format --check-formatted
  • mix compile --warnings-as-errors
  • mix credo --strict ✅ (no issues)
  • mix dialyzer ✅ (0 errors)
  • mix test --warnings-as-errors ✅ (702 tests + 52 properties, 0 failures)
  • New telemetry-redaction test verified stable across seeds; PR diff scanned — no secrets, no internal notes.

Out of scope / maintainer follow-ups

  • Rotate the local dev credentials in .env (never committed; gitignored) as routine hygiene.
  • Run mix hex.publish when ready — this PR makes the package build cleanly (mix hex.build verified), it does not publish.
  • Internal notes / third-party critique live under a gitignored scratch/ and are not part of this diff.

🤖 Generated with Claude Code

https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH

ivarvong and others added 13 commits July 1, 2026 09:21
Adopt 1.20's explicit pin-operator requirement for computed bitstring sizes (`binary-size(^n)`) across the object, pack, pkt-line, and workspace modules, and bump .tool-versions to 1.20.0-rc.3. The mix.exs floor stays at `~> 1.17`: pinned bitstring sizes have been valid since 1.14, so the minimum-supported version is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Add `Exgit.FS.size/3` and the `Exgit.ObjectStore.object_size/2` protocol callback so callers can learn a blob's size WITHOUT materializing it — the guard an agent needs before pulling a large file into memory. Memory keeps a parallel `sha => size` index (O(1), no decompression); Disk inflates only the loose-object header; Promisor answers from cache or returns `{:error, :not_local}` without triggering a fetch. `FS.size/3` resolves the path (trees only, on a lazy clone) and never fetches the blob.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Two hardening fixes on the HTTP transport:

- A token embedded in a remote URL (`https://token@host/…`) is now redacted to `***` before the URL enters any :telemetry span metadata (ls_refs/fetch/push) or the ref_rejected security event. Previously such a token could reach telemetry exporters and log aggregators.
- ls-refs responses are capped at 1,000,000 refs; the transport stream halts once the cap trips, so a hostile server can no longer stream unbounded refs into client memory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Under the 1.20 toolchain, two MapSet opacity false-positives surfaced. Replace the internal `seen` membership set in the reachability walk with a plain map (not opaque, semantically identical, marginally faster), and fold the registry's EXIT cleanup with MapSet.delete instead of MapSet.difference (which trips the warning on the untyped GenServer state). Dialyzer is now clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Ready the package for its first hex.pm release:

- Depend on the published `vfs ~> 0.1.0` (was a git ref) and add ex_doc so `mix hex.publish` can build HexDocs; ship docs/PERFORMANCE.md so the README link resolves on hex.
- Add a consumer-install smoke test (scripts/consumer_smoke.sh + a CI job) that builds the package as it ships, depends on it from a throwaway :prod project, and clones+reads a public repo — catching missing files() entries and dev/test compile-coupling the in-repo suite can't see.
- README: flag Exgit.Index as experimental. CHANGELOG: record the size-aware and security work, and note the vfs integration is pre-release. gitignore the local scratch/ and hex.build tarball.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Bootstrap-clone and first-touch-latency benchmark across exgit's clone modes against CF Artifacts. Tagged :cloudflare_perf and excluded by default (and when CF creds are absent), so it never runs in ordinary CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Promote newest-stable Elixir 1.20 on OTP 29 to primary (owns format, credo, dialyzer, integration gates); keep 1.19/OTP-28 as a compile+test smoke and 1.17/OTP-27 as the minimum-supported tier. Elixir 1.20 supports OTP 27-29, so the pair resolves cleanly in setup-beam.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Move .tool-versions off the 1.20.0-rc.3 release candidate to stable 1.20.2 on OTP 29 — matching the CI primary tier. Verified the full gauntlet on this exact combo locally: format, credo --strict, dialyzer (0 errors, PLT rebuilt for OTP 29), and 702 tests + 52 properties all green. Also drop a dead `mean([])` clause in the CF perf test that 1.20.2's stricter unused-clause analysis flagged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Upgrade all dependencies to latest allowed by the mix.exs constraints. Notably req 0.5.17 -> 0.6.2 and mint 1.7.1 -> 1.9.0, which resolve the HIGH decompression-bomb DoS (CVE-2026-49755), a multipart header-injection (CVE-2026-49756), and the mint HTTP/2 CONTINUATION flood (CVE-2026-49754) in the production HTTP path. req 0.6 is a 0.x-minor bump; the full suite — including the cross-origin credential-leak tests SECURITY.md calls out — passes, and dialyzer is clean. The only advisories mix hex.audit still reports are in cowlib, reached solely via the only: :test bypass dep, so they never ship in the package or reach a consumer's runtime; cowlib 2.17.1 is already the latest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YVcfxkfN7A7KYLmNYt9UcH
Review findings for 0.1.0:

- PktLine.decode_all/decode_stream return
  {:error, {:malformed_pkt_line, snippet}} instead of raising, so
  Transport.HTTP.capabilities/1 and push/4 no longer crash on
  hostile or broken server bytes ({:error, {:malformed_response, _}}).
- PktLine.encode/1 raises ArgumentError past git's 65516-byte pkt
  payload max instead of silently emitting corrupt framing.
- Stream finalization checks accumulated domain errors before the
  decoder's truncation artifact, so the ls-refs ref-cap
  ({:too_many_refs, cap}), sideband channel-3 errors, and
  StreamParser rejections surface instead of {:truncated, n}.
- do_fetch propagates non-binary StreamParser errors.
- New :max_refs option on Transport.HTTP.new/2 (default 1,000,000)
  makes the ls-refs cap tunable and testable; cap + boundary now
  covered by chunked-streaming regression tests.
- Push-span and ref_rejected credential redaction now tested.
- @doc/@SPEC on capabilities/ls_refs/fetch/push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VEo6srWr3aFwhtwaXDTCZZ
…ity, size-index sync

Review findings for 0.1.0:

- Tree.canonical_mode/2 no longer coerces zero-padded directory/
  symlink/gitlink modes (040000, 0120000, 0160000) into blob modes;
  canonicalization now dispatches on the file-type bits. Regression
  SHAs pinned against real git.
- Tag.decode/1 preserves unknown headers and continuation lines in
  order (headers list like Commit), so decode |> encode is byte-exact
  and re-encoding cannot change a tag's SHA. Accessors type/1, tag/1,
  tagger/1 replace the removed struct fields.
- Memory.delete_object/2 keeps the objects and sizes indexes in
  lockstep; promisor cache eviction uses it instead of reaching into
  Memory internals, fixing stale sizes for evicted objects.
- Disk.object_size/2 streams loose-object headers in bounded chunks
  (constant memory AND constant I/O) instead of File.read on the
  whole compressed file; packed fallback measures the inflated
  content directly. Truncated/corrupt/oversized-header inputs now
  covered by tests, as is the packed path.
- object_size instrumented with the [:exgit, :object_store,
  :object_size] telemetry span in all three defimpls; protocol
  contract promoted from a code comment to real @doc.
- Repository.memory_report/1 reports :unknown (per its doc) for
  non-introspected backends instead of a misleading 0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VEo6srWr3aFwhtwaXDTCZZ
…sistency

Review findings for 0.1.0:

- push/3 defaults to HEAD's symref target instead of hardcoded
  refs/heads/main; a repo on another branch no longer silently
  pushes nothing ({:error, :no_refspecs} when HEAD is detached,
  {:error, {:local_ref_not_found, name}} for unresolvable named
  refspecs).
- String.replace_prefix replaces misused String.trim_leading in
  tracking-ref mapping and file:// URL handling, so a branch
  literally named refs/heads/foo maps to the right remote ref.
- RepoHandle.fetch_once/4 monitors its fetch task: a task that dies
  via throw/exit/kill now fails waiters with {:error,
  {:fetch_crashed, reason}} and clears the in-flight entry instead
  of poisoning the key forever; safe_fetch catches all three kinds.
- RepoHandle.get/1 renamed to fetch!/1, restoring the
  get/fetch/fetch! convention. fetch_once @SPEC covers its timeout.
- FS.glob/3 returns the plain sorted list (no error path existed);
  :resolve_lfs_pointers renamed to :detect_lfs_pointers (it never
  fetched LFS content); gitlink entries short-circuit as
  {:error, :submodule} in read_path/size and stat as
  %{type: :submodule} instead of doomed object lookups.
- init/1 wraps disk failures as {:error, {:init_failed, reason}};
  @doc added to init/open/fetch; shared remote() type across
  clone/fetch/push specs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VEo6srWr3aFwhtwaXDTCZZ
…ease

- Fold [Unreleased] into the [0.1.0] entry dated 2026-07-01 — nothing
  was ever published, so the first hex release carries it all.
- README: real test counts (870 across default/slow/real_git), the
  actual three-tier CI matrix, and the consumer-install smoke job.
- mix.exs: docs/PERFORMANCE.md added to ex_doc extras so the README
  link resolves on hexdocs.pm instead of 404ing; stale CI-matrix
  comment corrected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VEo6srWr3aFwhtwaXDTCZZ
@ivarvong ivarvong merged commit 768bbc1 into main Jul 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant