W&B Logging support for Finetuning by psinger-prior · Pull Request #815 · PriorLabs/TabPFN

psinger-prior · 2026-03-11T11:21:40Z

Issue

Closes #810

Motivation and Context

Tracking training runs is important, implemented a logging class that can be expanded for new logger support. First support is for W&B.

This PR builds on top of #812 and should only merged afterwards.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

How Has This Been Tested?

Checklist

The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
A changelog entry has been added (see changelog/README.md), or "no changelog needed" label requested.
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

gemini-code-assist

Code Review

This pull request introduces two major features: multi-GPU training support using DistributedDataParallel (DDP) and experiment tracking with Weights & Biases. The DDP implementation is well-structured, handling process group initialization, data sampling, and metric synchronization correctly. The introduction of a logging protocol with a W&B implementation is a great addition for experiment tracking. I've identified a critical issue regarding DDP support for multiple models, which will cause a crash. I've also made a couple of medium-severity suggestions to improve code clarity and documentation.

gemini-code-assist · 2026-03-11T12:10:15Z

src/tabpfn/finetuning/finetuned_base.py

            self.finetuned_estimator_.model_.recompute_layer = True  # type: ignore

+        # --- DDP model wrapping ---
+        model_for_optimization = self.finetuned_estimator_.model_


This line assumes that self.finetuned_estimator_ has a single model, as it accesses the .model_ property. However, TabPFNClassifier can be initialized with multiple models, in which case len(self.finetuned_estimator_.models_) > 1 and accessing .model_ will raise a ValueError. Finetuning, especially with DDP, seems designed for a single model. It would be safer to explicitly check for this and raise a more informative error if multiple models are provided for finetuning.

gemini-code-assist · 2026-03-11T12:10:15Z

changelog/812.added.md

@@ -0,0 +1 @@
+Add multi-GPU DDP support for finetuning via torchrun (auto-detected, no code changes needed)


The changelog entry only mentions the multi-GPU DDP support. This pull request also introduces W&B logging support, which is a significant feature. It would be beneficial to also include this in the changelog to accurately reflect all the changes.

Suggested change

Add multi-GPU DDP support for finetuning via torchrun (auto-detected, no code changes needed)

Add multi-GPU DDP support for finetuning via torchrun and W&B logging support.

gemini-code-assist · 2026-03-11T12:10:15Z

src/tabpfn/finetuning/finetuned_base.py

+
        # Store the original training size for checkpoint naming
        train_size = X.shape[0]
        start_time = time.monotonic()


The start_time variable is initialized here but it is reassigned later at line 731 before being used. This initial assignment is redundant and can be removed to improve clarity.

psinger-prior added 13 commits March 6, 2026 09:14

auto using all gpus

3296566

changelog

b6c1958

ruff

8c0813b

init

94eabf2

fix skip batch deadlock potential

c309246

refine example scripts

92304a9

minor

1b057f5

Merge branch 'main' into psi/finetuning_v1

a340007

fix

d36f7d6

changelog

1389ea8

implementing logging

de49b6b

changelog

ef72efe

adjust example

0d5f031

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

egeonur mentioned this pull request Mar 11, 2026

Fix inference precision. #802

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

W&B Logging support for Finetuning#815

W&B Logging support for Finetuning#815
psinger-prior wants to merge 13 commits intomainfrom
psi/finetuning_v2

psinger-prior commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1 @@
		Add multi-GPU DDP support for finetuning via torchrun (auto-detected, no code changes needed)

Conversation

psinger-prior commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Motivation and Context

Public API Changes

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

psinger-prior commented Mar 11, 2026 •

edited

Loading