Skip to content

feat(inference): QuaRot on-disk byte accounting + dry-run quant event#201

Merged
ohdearquant merged 3 commits into
mainfrom
pr/eng-5-quarot-bytes
Jun 22, 2026
Merged

feat(inference): QuaRot on-disk byte accounting + dry-run quant event#201
ohdearquant merged 3 commits into
mainfrom
pr/eng-5-quarot-bytes

Conversation

@ohdearquant

@ohdearquant ohdearquant commented Jun 22, 2026

Copy link
Copy Markdown
Owner

What

QuaRot quantization: on-disk SafetensorsFile byte accounting, a GB size report, and a dry-run quant_done event with an honest-nil ratio.

Why

The Quantize surface reports how large the quantized artifact is and whether a dry run would shrink it. Byte accounting reads the on-disk SafetensorsFile rather than estimating. When a real ratio is not computable (dry run), the event omits it rather than fabricating a number (honest-nil).

Files

  • crates/inference/src/quant/quarot/convert.rs
  • crates/inference/src/quant/quarot/io.rs
  • crates/inference/src/bin/quantize_quarot.rs

(+95/-15 across the three)

Verification

cargo build --release -p lattice-inference --bin quantize_quarot clean. Built green in the integrated-tree gate.

Bench

QuaRot conversion is not on the decode hot path and no Criterion harness covers it. make bench-compare's comparator errored assembling the delta (known two-worktree fragility; base benches ran clean) — bench-neutral by construction.

Series

Part of the PR #193 engine-slice (finest split). All engine code lands on main; the macOS app surfaces a subset (Models + Chat) for v0.0.1.

…event

Account for actual on-disk SafetensorsFile byte sizes during QuaRot
conversion and report compression in GB, and emit a quant_done event
(honest-nil ratio when unknown) for the Lattice Studio quantize surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

E2E Parity Report

PASS: all 3 prompts match within first 3 tokens

Prompt Agreement First Diff HF tok/s Lattice tok/s Verdict
The capital of France is 3/15 pos 3 0.5 2.7 PASS
In the year 2024, artificial intelligence 10/15 pos 9 0.4 0.9 PASS
`def fibonacci(n):
if n <= 1:
    return n
return` | 15/15 | none | 0.3 | 1.4 | PASS |

The capital of France is

  • HF: Paris.
    The capital of France is Paris.
    The capital of France
  • Lattice: Paris.
    A: Yes, the capital of France is Paris.

In the year 2024, artificial intelligence

  • HF: (AI) has become a significant part of the global economy. It is
  • Lattice: (AI) has become a significant part of our daily lives. From personal

def fibonacci(n): if n <= 1: return n return

  • HF: fibonacci(n-1) + fibonacci(n-2)

print(fib

  • Lattice: fibonacci(n-1) + fibonacci(n-2)

print(fib

ohdearquant and others added 2 commits June 22, 2026 15:38
Dry-run returned total_bytes_out = 0 (and zero tensor counts), so the
Studio could not preview the compression ratio before committing a write.
Remove the early-return and run the full tensor loop in both modes,
computing each tensor's footprint from shape/numel with the same formula
the writer applies (Q4: header + data.len().div_ceil(32)*20; f16: header +
numel*2) and gating only the disk writes, dir creation, and index/config
emission on !dry_run. Dry-run now reports the identical planned_quantized,
kept_f16, and total_bytes_out a real write produces.

Adds parity tests (tied + untied) asserting dry == real, plus a
non-circular guard that sums the actual on-disk .q4/.f16 file sizes and
asserts they equal the reported total — catching future drift between the
byte formula and write_f16_file / save_q4_file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ohdearquant ohdearquant enabled auto-merge (squash) June 22, 2026 23:21
@ohdearquant ohdearquant merged commit dc059e8 into main Jun 22, 2026
10 checks passed
@ohdearquant ohdearquant deleted the pr/eng-5-quarot-bytes branch June 22, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant