WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper by Butanium · Pull Request #73 · johny-b/llmcomp

Butanium · 2026-05-06T05:57:00Z

⚠️ NOT YET REVIEWED BY ME (@Butanium). This branch was written by Claude (Anthropic's Claude Code agent) on my machine. I'm opening this as a draft so I can self-review before any merge. Treat my participation as authorial-but-unverified until I leave a self-review on the diff. (I tried adding myself as assignee/reviewer but the upstream rejected it — I'm not a collaborator. Self-review will land as a normal review comment thread.)

Motivation

llmcomp's Tinker finetuning loop currently print(...)s the per-step loss and nothing else. There is no way to recover the training curve after a run completes — no metrics.jsonl, no W&B integration, and tinker.RestClient.get_training_run does not return any per-step events. So if you launch a run and your shell scrollback rolls over, the curve is gone.

What this changes

llmcomp/finetuning/tinker_finetune.py:

Per-step local logging (always on) to \$XDG_CACHE_HOME/llmcomp/<sanitised-training_run_id>/ (default ~/.cache/llmcomp/...):
- metrics.jsonl — one row per step: {step, epoch, loss, lr, elapsed_s, wall_time}
- params.json — full TinkerTrainingParams dump (api_key stripped) + resolved seed + training_run_id
- summary.json — status, final_loss, total_steps, final_path, checkpoints
Optional W&B integration gated on three new fields on TinkerTrainingParams:
- enable_wandb: bool = False
- wandb_run_name: str | None = None (default: params.suffix)
- wandb_tags: list[str] | None = None
- project / entity are intentionally not exposed; wandb.init() resolves them from WANDB_PROJECT / WANDB_ENTITY. enable_wandb=True with wandb not installed raises ImportError — no silent fallback.
load_run(key, *, data_dir=\"llmcomp_models\") convenience function returning a RunInfo(cache_dir, run_id, metrics: pd.DataFrame, params, summary). key accepts a training_run_id (<uuid>:train:N), a tinker:// URI, or a suffix (resolved via <data_dir>/tinker_models.jsonl; newest non-checkpoint match wins). Exported alongside FinetuningManager from llmcomp.finetuning.

Cache key is training_client.model_id (the Tinker training_run_id) rather than the user-supplied suffix. Globally unique, matches the same id RestClient queries with, and re-submitting the same suffix never collides.

Verification

Smoke-tested with a real Qwen3.5-4B LoRA training run (64 toy chat samples, batch=32, 1 epoch → 2 steps). All three artifacts materialised; load_run resolved correctly via run_id, tinker:// URI, and suffix; unknown suffix raised KeyError.

Out of scope / not done

No CHANGELOG / version bump — left for the maintainer.
No new tests in tests/. Behaviour is exercised by an end-to-end smoke against real Tinker; happy to convert to a pytest with a stub TrainingClient if you'd like.
Detached worker path (tinker_spawn / tinker_worker) inherits the metrics logging automatically because it calls the same _run_training. Not separately verified end-to-end.

🤖 Generated with Claude Code

Adds a local jsonl + summary.json under ~/.cache/llmcomp/<training_run_id>/ written from the Tinker LoRA training loop. Per-step metrics: step, epoch, loss, lr, elapsed_s, wall_time. Optional W&B integration gated on TinkerTrainingParams.enable_wandb (project / entity resolved by wandb itself from WANDB_PROJECT / WANDB_ENTITY env vars; run name defaults to suffix).

Resolves a Tinker run by training_run_id, tinker:// URI, or suffix (via tinker_models.jsonl) and returns metrics (DataFrame) + params + summary. Exported from llmcomp.finetuning alongside the existing manager/params API.

johny-b · 2026-05-06T13:38:58Z

FYI, I have 0 opinions on how this should work, and I'm happy for you to implement the thing you will find most convenient (as long as it doesn't require anything from people not using it ...).

Butanium added 2 commits May 6, 2026 05:47

Butanium changed the title ~~Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper~~ WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper May 6, 2026

Butanium self-assigned this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73

WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73
Butanium wants to merge 2 commits into
johny-b:mainfrom
Butanium:metrics-logging

Butanium commented May 6, 2026

Uh oh!

johny-b commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Butanium commented May 6, 2026

Motivation

What this changes

Verification

Out of scope / not done

Uh oh!

johny-b commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants