Skip to content

Training framework for the DriveFusion project that fine-tunes and train LLMs and multimodal vision-language models for driving tasks. Built on LLaMAFactory, it adds dataset processing, distributed training workflows, optimization for scalable autonomous-driving model development.

License

Notifications You must be signed in to change notification settings

DriveFusion/drivefusion-train

Repository files navigation

DriveFusion Logo

DriveFusion Training

A comprehensive training framework for fine-tuning and pre-training large language models (LLMs) and multimodal large language models (MLLMs) with specialized support for driving-related tasks. This project is built upon the excellent LLaMAFactory framework and extends it with custom data processing, training workflows, and model optimization capabilities.

Python License Status


Overview

DriveFusion Training provides a flexible and scalable solution for training vision-language models with focus on autonomous driving applications. It supports various training paradigms including supervised fine-tuning (SFT), and pre-training.

Built Upon LLaMAFactory

This project extends LLaMAFactory, a powerful framework for training large language models. We've customized and extended it with:

  • Vision-Language Integration: Enhanced support for multimodal (vision + text) model training
  • Driving Dataset Support: Specialized data processors for autonomous driving datasets
  • Advanced Optimization: Integration with DeepSpeed, Liger Kernel, and other performance optimization techniques
  • Custom Training Workflows: Tailored training pipelines for different use cases

Features

  • Multiple Training Modes:

    • Supervised Fine-Tuning (SFT) with 4D attention masks
    • Pre-training for foundation models
    • Pairwise and Feedback-based training
  • Model Support:

    • Compatible with transformer-based models (Qwen, LLaMA, etc.)
    • Parameter-efficient fine-tuning with LoRA and QLoRA
    • Mixed precision training (BF16, FP16, FP32)
    • Quantization support (4-bit, 8-bit)
  • Data Processing:

    • Flexible data loading with multiple format support
    • Custom data converters for different dataset formats
    • Template-based formatting for consistent prompt structures
    • Efficient batch collation with optimized memory usage
  • Performance Optimization:

    • Distributed training with torchrun and Ray
    • DeepSpeed integration for large-scale training
    • Liger Kernel for efficient kernel operations
    • Long LoRA for extended context windows
    • Gradient checkpointing and activation checkpointing
  • Evaluation & Monitoring:

    • Built-in metrics computation (accuracy, similarity)
    • Wandb integration for experiment tracking
    • Loss plotting and visualization
    • Model card generation and HuggingFace Hub integration

Installation

Requirements

  • Python 3.9 or higher
  • CUDA 11.8+ (for GPU training)
  • PyTorch 2.0+

Setup

  1. Clone the repository:
git clone https://github.com/DriveFusion/train.git
cd train
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -e .

For additional features, install optional dependencies:

pip install -e ".[torch,metrics,deepspeed]"

Quick Start

Configuration

Training is configured via YAML files in the config/ directory:

  • config/train/drivefusion_pretrain.yaml - Pre-training configuration
  • config/train/drivefusion_finetune.yaml - Fine-tuning configuration
  • config/train/drivefusion.yaml - Standard training configuration
  • config/train/qwen2_5_vl.yaml - Qwen 2.5 Vision-Language model configuration

Running Training

Use the CLI tool to launch training:

# Start training with a configuration file
drivefusion-train train config/train/drivefusion_finetune.yaml

# Or using the short alias
lmf train config/train/drivefusion_finetune.yaml

Distributed Training

For multi-GPU training:

# Automatic multi-GPU training (uses torchrun)
drivefusion-train train config/train/drivefusion.yaml

# Manual distributed training
NPROC_PER_NODE=4 drivefusion-train train config/train/drivefusion.yaml

Exporting Models

Export trained models for inference:

drivefusion-train export config/merge/drivefusion_qa_merge.yaml

Environment Information

Check system compatibility:

drivefusion-train env

Project Structure

drivefusion_train/
├── cli.py                 # Command-line interface
├── launcher.py            # Training launcher
├── data/                  # Data handling modules
│   ├── loader.py         # Dataset loading
│   ├── processor/        # Data processors for different training modes
│   │   ├── supervised.py # SFT processor
│   │   ├── pretrain.py   # Pre-training processor
│   │   ├── feedback.py   # RLHF feedback processor
│   │   └── pairwise.py   # Pairwise preference processor
│   ├── collator.py       # Batch collation with attention masks
│   ├── template.py       # Prompt templates
│   └── formatter.py      # Data formatting utilities
├── model/                 # Model loading and configuration
│   ├── loader.py         # Model and tokenizer loading
│   ├── patcher.py        # Model patching utilities
│   └── model_utils/      # Optimization utilities
│       ├── quantization.py    # Quantization support
│       ├── liger_kernel.py    # Liger kernel integration
│       ├── longlora.py        # Long LoRA support
│       ├── rope.py           # RoPE position embedding
│       └── attention.py       # Attention optimizations
├── train/                 # Training logic
│   ├── tuner.py          # Training orchestration
│   └── sft/              # Supervised fine-tuning
│       ├── trainer.py    # Custom trainer class
│       ├── workflow.py   # SFT workflow
│       └── metric.py     # Evaluation metrics
└── hparams/              # Hyperparameter definitions
    ├── model_args.py     # Model configuration
    ├── data_args.py      # Data configuration
    ├── training_args.py  # Training configuration
    └── parser.py         # Argument parsing

Configuration Guide

Configuration files use YAML format. Key sections:

### Model Configuration
model_name_or_path: path/to/model
model_type: qwen  # or llama, etc.

### Data Configuration
dataset_dir: data/
dataset:
  - dataset_name:
      file_name: path/to/file.json
      formatting: alph  # or other formatting options

### Training Arguments
output_dir: output/checkpoint
per_device_train_batch_size: 4
num_train_epochs: 3
learning_rate: 5.0e-5

### Fine-tuning Configuration
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05

Data Format

Input data should be in JSON format. Example dataset structure:

{
  "conversations": [
    {
      "from": "human",
      "value": "What is autonomous driving?"
    },
    {
      "from": "gpt",
      "value": "Autonomous driving is..."
    }
  ]
}

See data/drivefusion_example.json and data/mllm_example.json for complete examples.

Advanced Features

Fine-tuning Methods

  • LoRA: Low-Rank Adaptation for parameter-efficient training
  • QLoRA: Quantized LoRA for reduced memory usage
  • DoRA: Domain-specific adapter training
  • Full Fine-tuning: Training all model parameters

Training Techniques

  • Mixed Precision Training: Reduce memory and computation
  • Gradient Accumulation: Train with effective larger batch sizes
  • Gradient Checkpointing: Further reduce memory consumption
  • Distributed Training: Scale across multiple GPUs/nodes

Evaluation

Training supports multiple evaluation metrics:

  • Accuracy computation
  • Semantic similarity (with generation)
  • Custom metric functions

Troubleshooting

CUDA Out of Memory

  1. Reduce per_device_train_batch_size
  2. Enable gradient checkpointing: gradient_checkpointing: true
  3. Use QLoRA instead of LoRA for reduced memory
  4. Enable deepspeed configuration

Slow Training

  1. Increase per_device_train_batch_size if memory allows
  2. Enable multi-GPU training
  3. Use DeepSpeed ZeRO optimizations
  4. Install liger-kernel for faster kernels

Contributing

Contributions are welcome! Please ensure:

  1. Code follows the project's style guide (enforced by Ruff)
  2. All tests pass
  3. Documentation is updated
  4. Commits are clear and descriptive

Support

For issues and questions:

License

This project is licensed under the Apache License 2.0. See the LICENSE file for the full license text.

This project extends LLaMAFactory, which is also licensed under Apache License 2.0, ensuring compatibility between the projects.

Acknowledgments

This project is built upon LLaMAFactory by hiyouga and contributors. We extend our gratitude for the excellent foundation that made this project possible.

Special thanks to:

  • The Hugging Face transformers and datasets teams
  • The PEFT library developers for LoRA implementations
  • Contributors of DeepSpeed and other optimization libraries

About

Training framework for the DriveFusion project that fine-tunes and train LLMs and multimodal vision-language models for driving tasks. Built on LLaMAFactory, it adds dataset processing, distributed training workflows, optimization for scalable autonomous-driving model development.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages