feat(tune): stream generate_lora output via generate_streaming by ohdearquant · Pull Request #198 · ohdearquant/lattice

ohdearquant · 2026-06-22T18:23:46Z

What

Stream tokens from generate_lora via generate_streaming, with sampler controls.

Why

generate_lora (the LoRA-adapter generation binary) previously buffered the full output. It now uses the streaming entry point from PR-1 so adapter-driven generation streams token-by-token like chat_metal.

Files

crates/tune/src/bin/generate_lora.rs (+119/-24)

Verification

cargo build --release -p lattice-tune --bin generate_lora --features safetensors,inference-hook clean. This binary's compile break (E0599: generate_streaming not in scope) was one of the two regressions that motivated the PR-G gate. Built green in the integrated-tree gate.

Base

Stacked on pr/eng-1-streaming-detok (depends on generate_streaming). Review/merge PR-1 first.

Series

Part of the PR #193 engine-slice (finest split). All engine code lands on main; the macOS app surfaces a subset (Models + Chat) for v0.0.1.

Add token-by-token streaming via generate_streaming plus sampler controls to the LoRA generation CLI, mirroring chat_metal. Used by the Lattice Studio A/B compare surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-22T19:56:53Z

E2E Parity Report

PASS: all 3 prompts match within first 3 tokens

Prompt	Agreement	First Diff	HF tok/s	Lattice tok/s	Verdict
`The capital of France is`	3/15	pos 3	0.5	4.2	PASS
`In the year 2024, artificial intelligence`	10/15	pos 9	0.4	3.7	PASS
`def fibonacci(n):

if n <= 1:
    return n
return` | 15/15 | none | 0.4 | 2.9 | PASS |

The capital of France is

HF: Paris.
The capital of France is Paris.
The capital of France
Lattice: Paris.
A: Yes, the capital of France is Paris.

In the year 2024, artificial intelligence

HF: (AI) has become a significant part of the global economy. It is
Lattice: (AI) has become a significant part of our daily lives. From personal

def fibonacci(n): if n <= 1: return n return

HF: fibonacci(n-1) + fibonacci(n-2)

print(fib

Lattice: fibonacci(n-1) + fibonacci(n-2)

print(fib

ohdearquant changed the base branch from pr/eng-1-streaming-detok to main June 22, 2026 19:50

ohdearquant closed this Jun 22, 2026

ohdearquant reopened this Jun 22, 2026

Merge branch 'main' into pr/eng-2b-generate-lora

dd64b54

ohdearquant enabled auto-merge June 22, 2026 23:31

ohdearquant merged commit d473231 into main Jun 22, 2026
10 checks passed

ohdearquant deleted the pr/eng-2b-generate-lora branch June 22, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tune): stream generate_lora output via generate_streaming#198

feat(tune): stream generate_lora output via generate_streaming#198
ohdearquant merged 2 commits into
mainfrom
pr/eng-2b-generate-lora

ohdearquant commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented Jun 22, 2026

What

Why

Files

Verification

Base

Series

Uh oh!

github-actions Bot commented Jun 22, 2026

E2E Parity Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant