Skip to content

Conversation

@Kovbo
Copy link
Collaborator

@Kovbo Kovbo commented Jan 22, 2026

No description provided.

@Kovbo Kovbo requested a review from angkywilliam January 22, 2026 02:48
@Kovbo Kovbo marked this pull request as ready for review January 22, 2026 21:42
src/art/types.py Outdated


class SFTConfig(pydantic.BaseModel):
learning_rate: float = 1e-4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove custom_lr_schedule
Make learning_rate: float | list[float]

Used to identify where assistant turns begin (train on responses only).
"""

instruction_part: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can keep this class as empty?
Unsure if instruction_part and response_part is a good fit for experimental feature

batch_size = 2 # Default to 2 for SFT

# Determine learning rates
if config.custom_lr_schedule and len(config.custom_lr_schedule) > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Refactor/Remove custom_lr_schedule.learning_rate is float | list[float]
  2. Add validation for num_learning_rate == num_batches


# Save checkpoint after training
# Name checkpoint by final training step: starting_step + num_batches
final_step = get_step_from_dir(self.output_dir) + len(sft_batches)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoint step should be still incremented by 1.
Checkpoint step != Gradient step

response_part="<|im_start|>assistant\n",
),
# Qwen 3 models (with thinking tokens)
"Qwen/Qwen3-8B": ModelConfig(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How we decide to support all of this model?
  2. Prefer to keep it simple and start with model that's widely use in OpenPipe Platform and ART?
  3. Research Qwen chat template, iirc <think></think> only show up at the last turn. We may need to remove <think></think> in response_part in Qwen.

progress_bar.close()


def iterate_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have iterate_file take in epoch
See the following PR for reference

yield _parse_jsonl_line(line)


async def train_sft_from_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify this so user can have the training continue running after closing their laptop.

  1. Iterate_file(file, epoch)
  2. Write to local disk
  3. Upload to wandb artifact
  4. Calculate lr
  5. Call train_sft(url, lr)
  6. Monitor training status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants