Skip to content

WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73

Draft
Butanium wants to merge 2 commits into
johny-b:mainfrom
Butanium:metrics-logging
Draft

WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73
Butanium wants to merge 2 commits into
johny-b:mainfrom
Butanium:metrics-logging

Conversation

@Butanium
Copy link
Copy Markdown
Collaborator

@Butanium Butanium commented May 6, 2026

⚠️ NOT YET REVIEWED BY ME (@Butanium). This branch was written by Claude (Anthropic's Claude Code agent) on my machine. I'm opening this as a draft so I can self-review before any merge. Treat my participation as authorial-but-unverified until I leave a self-review on the diff. (I tried adding myself as assignee/reviewer but the upstream rejected it — I'm not a collaborator. Self-review will land as a normal review comment thread.)

Motivation

llmcomp's Tinker finetuning loop currently print(...)s the per-step loss and nothing else. There is no way to recover the training curve after a run completes — no metrics.jsonl, no W&B integration, and tinker.RestClient.get_training_run does not return any per-step events. So if you launch a run and your shell scrollback rolls over, the curve is gone.

What this changes

llmcomp/finetuning/tinker_finetune.py:

  • Per-step local logging (always on) to \$XDG_CACHE_HOME/llmcomp/<sanitised-training_run_id>/ (default ~/.cache/llmcomp/...):
    • metrics.jsonl — one row per step: {step, epoch, loss, lr, elapsed_s, wall_time}
    • params.json — full TinkerTrainingParams dump (api_key stripped) + resolved seed + training_run_id
    • summary.jsonstatus, final_loss, total_steps, final_path, checkpoints
  • Optional W&B integration gated on three new fields on TinkerTrainingParams:
    • enable_wandb: bool = False
    • wandb_run_name: str | None = None (default: params.suffix)
    • wandb_tags: list[str] | None = None
    • project / entity are intentionally not exposed; wandb.init() resolves them from WANDB_PROJECT / WANDB_ENTITY. enable_wandb=True with wandb not installed raises ImportError — no silent fallback.
  • load_run(key, *, data_dir=\"llmcomp_models\") convenience function returning a RunInfo(cache_dir, run_id, metrics: pd.DataFrame, params, summary). key accepts a training_run_id (<uuid>:train:N), a tinker:// URI, or a suffix (resolved via <data_dir>/tinker_models.jsonl; newest non-checkpoint match wins). Exported alongside FinetuningManager from llmcomp.finetuning.

Cache key is training_client.model_id (the Tinker training_run_id) rather than the user-supplied suffix. Globally unique, matches the same id RestClient queries with, and re-submitting the same suffix never collides.

Verification

Smoke-tested with a real Qwen3.5-4B LoRA training run (64 toy chat samples, batch=32, 1 epoch → 2 steps). All three artifacts materialised; load_run resolved correctly via run_id, tinker:// URI, and suffix; unknown suffix raised KeyError.

Out of scope / not done

  • No CHANGELOG / version bump — left for the maintainer.
  • No new tests in tests/. Behaviour is exercised by an end-to-end smoke against real Tinker; happy to convert to a pytest with a stub TrainingClient if you'd like.
  • Detached worker path (tinker_spawn / tinker_worker) inherits the metrics logging automatically because it calls the same _run_training. Not separately verified end-to-end.

🤖 Generated with Claude Code

Butanium added 2 commits May 6, 2026 05:47
Adds a local jsonl + summary.json under ~/.cache/llmcomp/<training_run_id>/
written from the Tinker LoRA training loop. Per-step metrics: step, epoch,
loss, lr, elapsed_s, wall_time. Optional W&B integration gated on
TinkerTrainingParams.enable_wandb (project / entity resolved by wandb itself
from WANDB_PROJECT / WANDB_ENTITY env vars; run name defaults to suffix).
Resolves a Tinker run by training_run_id, tinker:// URI, or suffix
(via tinker_models.jsonl) and returns metrics (DataFrame) + params
+ summary. Exported from llmcomp.finetuning alongside the existing
manager/params API.
@Butanium Butanium changed the title Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper May 6, 2026
@johny-b
Copy link
Copy Markdown
Owner

johny-b commented May 6, 2026

FYI, I have 0 opinions on how this should work, and I'm happy for you to implement the thing you will find most convenient (as long as it doesn't require anything from people not using it ...).

@Butanium Butanium self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants