DriveFusion Training

A comprehensive training framework for fine-tuning and pre-training large language models (LLMs) and multimodal large language models (MLLMs) with specialized support for driving-related tasks. This project is built upon the excellent LLaMAFactory framework and extends it with custom data processing, training workflows, and model optimization capabilities.

Overview

DriveFusion Training provides a flexible and scalable solution for training vision-language models with focus on autonomous driving applications. It supports various training paradigms including supervised fine-tuning (SFT), and pre-training.

Built Upon LLaMAFactory

This project extends LLaMAFactory, a powerful framework for training large language models. We've customized and extended it with:

Vision-Language Integration: Enhanced support for multimodal (vision + text) model training
Driving Dataset Support: Specialized data processors for autonomous driving datasets
Advanced Optimization: Integration with DeepSpeed, Liger Kernel, and other performance optimization techniques
Custom Training Workflows: Tailored training pipelines for different use cases

Features

Multiple Training Modes:
- Supervised Fine-Tuning (SFT) with 4D attention masks
- Pre-training for foundation models
- Pairwise and Feedback-based training
Model Support:
- Compatible with transformer-based models (Qwen, LLaMA, etc.)
- Parameter-efficient fine-tuning with LoRA and QLoRA
- Mixed precision training (BF16, FP16, FP32)
- Quantization support (4-bit, 8-bit)
Data Processing:
- Flexible data loading with multiple format support
- Custom data converters for different dataset formats
- Template-based formatting for consistent prompt structures
- Efficient batch collation with optimized memory usage
Performance Optimization:
- Distributed training with torchrun and Ray
- DeepSpeed integration for large-scale training
- Liger Kernel for efficient kernel operations
- Long LoRA for extended context windows
- Gradient checkpointing and activation checkpointing
Evaluation & Monitoring:
- Built-in metrics computation (accuracy, similarity)
- Wandb integration for experiment tracking
- Loss plotting and visualization
- Model card generation and HuggingFace Hub integration

Installation

Requirements

Python 3.9 or higher
CUDA 11.8+ (for GPU training)
PyTorch 2.0+

Setup

Clone the repository:

git clone https://github.com/DriveFusion/train.git
cd train

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -e .

For additional features, install optional dependencies:

pip install -e ".[torch,metrics,deepspeed]"

Quick Start

Configuration

Training is configured via YAML files in the config/ directory:

config/train/drivefusion_pretrain.yaml - Pre-training configuration
config/train/drivefusion_finetune.yaml - Fine-tuning configuration
config/train/drivefusion.yaml - Standard training configuration
config/train/qwen2_5_vl.yaml - Qwen 2.5 Vision-Language model configuration

Running Training

Use the CLI tool to launch training:

# Start training with a configuration file
drivefusion-train train config/train/drivefusion_finetune.yaml

# Or using the short alias
lmf train config/train/drivefusion_finetune.yaml

Distributed Training

For multi-GPU training:

# Automatic multi-GPU training (uses torchrun)
drivefusion-train train config/train/drivefusion.yaml

# Manual distributed training
NPROC_PER_NODE=4 drivefusion-train train config/train/drivefusion.yaml

Exporting Models

Export trained models for inference:

drivefusion-train export config/merge/drivefusion_qa_merge.yaml

Environment Information

Check system compatibility:

drivefusion-train env

Project Structure

drivefusion_train/
├── cli.py                 # Command-line interface
├── launcher.py            # Training launcher
├── data/                  # Data handling modules
│   ├── loader.py         # Dataset loading
│   ├── processor/        # Data processors for different training modes
│   │   ├── supervised.py # SFT processor
│   │   ├── pretrain.py   # Pre-training processor
│   │   ├── feedback.py   # RLHF feedback processor
│   │   └── pairwise.py   # Pairwise preference processor
│   ├── collator.py       # Batch collation with attention masks
│   ├── template.py       # Prompt templates
│   └── formatter.py      # Data formatting utilities
├── model/                 # Model loading and configuration
│   ├── loader.py         # Model and tokenizer loading
│   ├── patcher.py        # Model patching utilities
│   └── model_utils/      # Optimization utilities
│       ├── quantization.py    # Quantization support
│       ├── liger_kernel.py    # Liger kernel integration
│       ├── longlora.py        # Long LoRA support
│       ├── rope.py           # RoPE position embedding
│       └── attention.py       # Attention optimizations
├── train/                 # Training logic
│   ├── tuner.py          # Training orchestration
│   └── sft/              # Supervised fine-tuning
│       ├── trainer.py    # Custom trainer class
│       ├── workflow.py   # SFT workflow
│       └── metric.py     # Evaluation metrics
└── hparams/              # Hyperparameter definitions
    ├── model_args.py     # Model configuration
    ├── data_args.py      # Data configuration
    ├── training_args.py  # Training configuration
    └── parser.py         # Argument parsing

Configuration Guide

Configuration files use YAML format. Key sections:

### Model Configuration
model_name_or_path: path/to/model
model_type: qwen  # or llama, etc.

### Data Configuration
dataset_dir: data/
dataset:
  - dataset_name:
      file_name: path/to/file.json
      formatting: alph  # or other formatting options

### Training Arguments
output_dir: output/checkpoint
per_device_train_batch_size: 4
num_train_epochs: 3
learning_rate: 5.0e-5

### Fine-tuning Configuration
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05

Data Format

Input data should be in JSON format. Example dataset structure:

{
  "conversations": [
    {
      "from": "human",
      "value": "What is autonomous driving?"
    },
    {
      "from": "gpt",
      "value": "Autonomous driving is..."
    }
  ]
}

See data/drivefusion_example.json and data/mllm_example.json for complete examples.

Advanced Features

Fine-tuning Methods

LoRA: Low-Rank Adaptation for parameter-efficient training
QLoRA: Quantized LoRA for reduced memory usage
DoRA: Domain-specific adapter training
Full Fine-tuning: Training all model parameters

Training Techniques

Mixed Precision Training: Reduce memory and computation
Gradient Accumulation: Train with effective larger batch sizes
Gradient Checkpointing: Further reduce memory consumption
Distributed Training: Scale across multiple GPUs/nodes

Evaluation

Training supports multiple evaluation metrics:

Accuracy computation
Semantic similarity (with generation)
Custom metric functions

Troubleshooting

CUDA Out of Memory

Reduce per_device_train_batch_size
Enable gradient checkpointing: gradient_checkpointing: true
Use QLoRA instead of LoRA for reduced memory
Enable deepspeed configuration

Slow Training

Increase per_device_train_batch_size if memory allows
Enable multi-GPU training
Use DeepSpeed ZeRO optimizations
Install liger-kernel for faster kernels

Contributing

Contributions are welcome! Please ensure:

Code follows the project's style guide (enforced by Ruff)
All tests pass
Documentation is updated
Commits are clear and descriptive

Support

For issues and questions:

Check existing GitHub issues
Review configuration examples in config/
Consult LLaMAFactory documentation: https://github.com/hiyouga/LLaMA-Factory

License

This project is licensed under the Apache License 2.0. See the LICENSE file for the full license text.

This project extends LLaMAFactory, which is also licensed under Apache License 2.0, ensuring compatibility between the projects.

Acknowledgments

This project is built upon LLaMAFactory by hiyouga and contributors. We extend our gratitude for the excellent foundation that made this project possible.

Special thanks to:

The Hugging Face transformers and datasets teams
The PEFT library developers for LoRA implementations
Contributors of DeepSpeed and other optimization libraries

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
assets		assets
config		config
data		data
drivefusion_train		drivefusion_train
.env.local		.env.local
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriveFusion Training

Overview

Built Upon LLaMAFactory

Features

Installation

Requirements

Setup

Quick Start

Configuration

Running Training

Distributed Training

Exporting Models

Environment Information

Project Structure

Configuration Guide

Data Format

Advanced Features

Fine-tuning Methods

Training Techniques

Evaluation

Troubleshooting

CUDA Out of Memory

Slow Training

Contributing

Support

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

DriveFusion/drivefusion-train

Folders and files

Latest commit

History

Repository files navigation

DriveFusion Training

Overview

Built Upon LLaMAFactory

Features

Installation

Requirements

Setup

Quick Start

Configuration

Running Training

Distributed Training

Exporting Models

Environment Information

Project Structure

Configuration Guide

Data Format

Advanced Features

Fine-tuning Methods

Training Techniques

Evaluation

Troubleshooting

CUDA Out of Memory

Slow Training

Contributing

Support

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages