WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73
Draft
Butanium wants to merge 2 commits into
Draft
WIP: Tinker FT: per-step metrics logging (local jsonl + optional W&B) + load_run helper#73Butanium wants to merge 2 commits into
Butanium wants to merge 2 commits into
Conversation
Adds a local jsonl + summary.json under ~/.cache/llmcomp/<training_run_id>/ written from the Tinker LoRA training loop. Per-step metrics: step, epoch, loss, lr, elapsed_s, wall_time. Optional W&B integration gated on TinkerTrainingParams.enable_wandb (project / entity resolved by wandb itself from WANDB_PROJECT / WANDB_ENTITY env vars; run name defaults to suffix).
Resolves a Tinker run by training_run_id, tinker:// URI, or suffix (via tinker_models.jsonl) and returns metrics (DataFrame) + params + summary. Exported from llmcomp.finetuning alongside the existing manager/params API.
Owner
|
FYI, I have 0 opinions on how this should work, and I'm happy for you to implement the thing you will find most convenient (as long as it doesn't require anything from people not using it ...). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
llmcomp's Tinker finetuning loop currentlyprint(...)s the per-step loss and nothing else. There is no way to recover the training curve after a run completes — nometrics.jsonl, no W&B integration, andtinker.RestClient.get_training_rundoes not return any per-step events. So if you launch a run and your shell scrollback rolls over, the curve is gone.What this changes
llmcomp/finetuning/tinker_finetune.py:\$XDG_CACHE_HOME/llmcomp/<sanitised-training_run_id>/(default~/.cache/llmcomp/...):metrics.jsonl— one row per step:{step, epoch, loss, lr, elapsed_s, wall_time}params.json— fullTinkerTrainingParamsdump (api_keystripped) + resolvedseed+training_run_idsummary.json—status,final_loss,total_steps,final_path,checkpointsTinkerTrainingParams:enable_wandb: bool = Falsewandb_run_name: str | None = None(default:params.suffix)wandb_tags: list[str] | None = Noneproject/entityare intentionally not exposed;wandb.init()resolves them fromWANDB_PROJECT/WANDB_ENTITY.enable_wandb=Truewithwandbnot installed raisesImportError— no silent fallback.load_run(key, *, data_dir=\"llmcomp_models\")convenience function returning aRunInfo(cache_dir, run_id, metrics: pd.DataFrame, params, summary).keyaccepts a training_run_id (<uuid>:train:N), atinker://URI, or a suffix (resolved via<data_dir>/tinker_models.jsonl; newest non-checkpoint match wins). Exported alongsideFinetuningManagerfromllmcomp.finetuning.Cache key is
training_client.model_id(the Tinker training_run_id) rather than the user-supplied suffix. Globally unique, matches the same idRestClientqueries with, and re-submitting the same suffix never collides.Verification
Smoke-tested with a real Qwen3.5-4B LoRA training run (64 toy chat samples, batch=32, 1 epoch → 2 steps). All three artifacts materialised;
load_runresolved correctly via run_id, tinker:// URI, and suffix; unknown suffix raisedKeyError.Out of scope / not done
tests/. Behaviour is exercised by an end-to-end smoke against real Tinker; happy to convert to a pytest with a stubTrainingClientif you'd like.tinker_spawn/tinker_worker) inherits the metrics logging automatically because it calls the same_run_training. Not separately verified end-to-end.🤖 Generated with Claude Code