Skip to content

feat(tune): stream generate_lora output via generate_streaming#198

Merged
ohdearquant merged 2 commits into
mainfrom
pr/eng-2b-generate-lora
Jun 22, 2026
Merged

feat(tune): stream generate_lora output via generate_streaming#198
ohdearquant merged 2 commits into
mainfrom
pr/eng-2b-generate-lora

Conversation

@ohdearquant

Copy link
Copy Markdown
Owner

What

Stream tokens from generate_lora via generate_streaming, with sampler controls.

Why

generate_lora (the LoRA-adapter generation binary) previously buffered the full output. It now uses the streaming entry point from PR-1 so adapter-driven generation streams token-by-token like chat_metal.

Files

  • crates/tune/src/bin/generate_lora.rs (+119/-24)

Verification

cargo build --release -p lattice-tune --bin generate_lora --features safetensors,inference-hook clean. This binary's compile break (E0599: generate_streaming not in scope) was one of the two regressions that motivated the PR-G gate. Built green in the integrated-tree gate.

Base

Stacked on pr/eng-1-streaming-detok (depends on generate_streaming). Review/merge PR-1 first.

Series

Part of the PR #193 engine-slice (finest split). All engine code lands on main; the macOS app surfaces a subset (Models + Chat) for v0.0.1.

Add token-by-token streaming via generate_streaming plus sampler controls
to the LoRA generation CLI, mirroring chat_metal. Used by the Lattice Studio
A/B compare surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ohdearquant ohdearquant changed the base branch from pr/eng-1-streaming-detok to main June 22, 2026 19:50
@ohdearquant ohdearquant reopened this Jun 22, 2026
@github-actions

Copy link
Copy Markdown

E2E Parity Report

PASS: all 3 prompts match within first 3 tokens

Prompt Agreement First Diff HF tok/s Lattice tok/s Verdict
The capital of France is 3/15 pos 3 0.5 4.2 PASS
In the year 2024, artificial intelligence 10/15 pos 9 0.4 3.7 PASS
`def fibonacci(n):
if n <= 1:
    return n
return` | 15/15 | none | 0.4 | 2.9 | PASS |

The capital of France is

  • HF: Paris.
    The capital of France is Paris.
    The capital of France
  • Lattice: Paris.
    A: Yes, the capital of France is Paris.

In the year 2024, artificial intelligence

  • HF: (AI) has become a significant part of the global economy. It is
  • Lattice: (AI) has become a significant part of our daily lives. From personal

def fibonacci(n): if n <= 1: return n return

  • HF: fibonacci(n-1) + fibonacci(n-2)

print(fib

  • Lattice: fibonacci(n-1) + fibonacci(n-2)

print(fib

@ohdearquant ohdearquant enabled auto-merge June 22, 2026 23:31
@ohdearquant ohdearquant merged commit d473231 into main Jun 22, 2026
10 checks passed
@ohdearquant ohdearquant deleted the pr/eng-2b-generate-lora branch June 22, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant