Code for "Complexity-aware fine-tuning" paper

General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($\approx 3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.58$ vs $0.45$ average accuracy) and outperforms the distillation approach ($0.58$ vs $0.56$ average accuracy) while using $81%$ less data.

Prerequisites

uv

Data

Download CoT entropy data for MMLU to data/out/cot_entropy
Download reasoning data for MMLU to data/out/reasoning_entropy

Other datasets are included in the repo and also published on Huggingface:

Training pipeline

Main training pipeline - src/experiments/pipeline/pipeline
Alternative baseline - src/experiments/pipeline/alternative_baseline
Full distillation baseline - src/experiments/pipeline/full_distill
Full SFT baseline - src/experiments/pipeline/sft_baseline
Curriculum SFT baseline - src/experiments/pipeline/sft_curriculum_baseline

Running experiments

uv run src/experiments/REPLACE_ME.py

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
data		data
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for "Complexity-aware fine-tuning" paper

Prerequisites

Data

Training pipeline

Running experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for "Complexity-aware fine-tuning" paper

Prerequisites

Data

Training pipeline

Running experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages