General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data.
We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models (
- Download CoT entropy data for MMLU to
data/out/cot_entropy - Download reasoning data for MMLU to
data/out/reasoning_entropy
Other datasets are included in the repo and also published on Huggingface:
- Main training pipeline -
src/experiments/pipeline/pipeline - Alternative baseline -
src/experiments/pipeline/alternative_baseline - Full distillation baseline -
src/experiments/pipeline/full_distill - Full SFT baseline -
src/experiments/pipeline/sft_baseline - Curriculum SFT baseline -
src/experiments/pipeline/sft_curriculum_baseline
uv run src/experiments/REPLACE_ME.py