A comprehensive training framework for fine-tuning and pre-training large language models (LLMs) and multimodal large language models (MLLMs) with specialized support for driving-related tasks. This project is built upon the excellent LLaMAFactory framework and extends it with custom data processing, training workflows, and model optimization capabilities.
DriveFusion Training provides a flexible and scalable solution for training vision-language models with focus on autonomous driving applications. It supports various training paradigms including supervised fine-tuning (SFT), and pre-training.
This project extends LLaMAFactory, a powerful framework for training large language models. We've customized and extended it with:
- Vision-Language Integration: Enhanced support for multimodal (vision + text) model training
- Driving Dataset Support: Specialized data processors for autonomous driving datasets
- Advanced Optimization: Integration with DeepSpeed, Liger Kernel, and other performance optimization techniques
- Custom Training Workflows: Tailored training pipelines for different use cases
-
Multiple Training Modes:
- Supervised Fine-Tuning (SFT) with 4D attention masks
- Pre-training for foundation models
- Pairwise and Feedback-based training
-
Model Support:
- Compatible with transformer-based models (Qwen, LLaMA, etc.)
- Parameter-efficient fine-tuning with LoRA and QLoRA
- Mixed precision training (BF16, FP16, FP32)
- Quantization support (4-bit, 8-bit)
-
Data Processing:
- Flexible data loading with multiple format support
- Custom data converters for different dataset formats
- Template-based formatting for consistent prompt structures
- Efficient batch collation with optimized memory usage
-
Performance Optimization:
- Distributed training with torchrun and Ray
- DeepSpeed integration for large-scale training
- Liger Kernel for efficient kernel operations
- Long LoRA for extended context windows
- Gradient checkpointing and activation checkpointing
-
Evaluation & Monitoring:
- Built-in metrics computation (accuracy, similarity)
- Wandb integration for experiment tracking
- Loss plotting and visualization
- Model card generation and HuggingFace Hub integration
- Python 3.9 or higher
- CUDA 11.8+ (for GPU training)
- PyTorch 2.0+
- Clone the repository:
git clone https://github.com/DriveFusion/train.git
cd train- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -e .For additional features, install optional dependencies:
pip install -e ".[torch,metrics,deepspeed]"Training is configured via YAML files in the config/ directory:
config/train/drivefusion_pretrain.yaml- Pre-training configurationconfig/train/drivefusion_finetune.yaml- Fine-tuning configurationconfig/train/drivefusion.yaml- Standard training configurationconfig/train/qwen2_5_vl.yaml- Qwen 2.5 Vision-Language model configuration
Use the CLI tool to launch training:
# Start training with a configuration file
drivefusion-train train config/train/drivefusion_finetune.yaml
# Or using the short alias
lmf train config/train/drivefusion_finetune.yamlFor multi-GPU training:
# Automatic multi-GPU training (uses torchrun)
drivefusion-train train config/train/drivefusion.yaml
# Manual distributed training
NPROC_PER_NODE=4 drivefusion-train train config/train/drivefusion.yamlExport trained models for inference:
drivefusion-train export config/merge/drivefusion_qa_merge.yamlCheck system compatibility:
drivefusion-train envdrivefusion_train/
├── cli.py # Command-line interface
├── launcher.py # Training launcher
├── data/ # Data handling modules
│ ├── loader.py # Dataset loading
│ ├── processor/ # Data processors for different training modes
│ │ ├── supervised.py # SFT processor
│ │ ├── pretrain.py # Pre-training processor
│ │ ├── feedback.py # RLHF feedback processor
│ │ └── pairwise.py # Pairwise preference processor
│ ├── collator.py # Batch collation with attention masks
│ ├── template.py # Prompt templates
│ └── formatter.py # Data formatting utilities
├── model/ # Model loading and configuration
│ ├── loader.py # Model and tokenizer loading
│ ├── patcher.py # Model patching utilities
│ └── model_utils/ # Optimization utilities
│ ├── quantization.py # Quantization support
│ ├── liger_kernel.py # Liger kernel integration
│ ├── longlora.py # Long LoRA support
│ ├── rope.py # RoPE position embedding
│ └── attention.py # Attention optimizations
├── train/ # Training logic
│ ├── tuner.py # Training orchestration
│ └── sft/ # Supervised fine-tuning
│ ├── trainer.py # Custom trainer class
│ ├── workflow.py # SFT workflow
│ └── metric.py # Evaluation metrics
└── hparams/ # Hyperparameter definitions
├── model_args.py # Model configuration
├── data_args.py # Data configuration
├── training_args.py # Training configuration
└── parser.py # Argument parsing
Configuration files use YAML format. Key sections:
### Model Configuration
model_name_or_path: path/to/model
model_type: qwen # or llama, etc.
### Data Configuration
dataset_dir: data/
dataset:
- dataset_name:
file_name: path/to/file.json
formatting: alph # or other formatting options
### Training Arguments
output_dir: output/checkpoint
per_device_train_batch_size: 4
num_train_epochs: 3
learning_rate: 5.0e-5
### Fine-tuning Configuration
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05Input data should be in JSON format. Example dataset structure:
{
"conversations": [
{
"from": "human",
"value": "What is autonomous driving?"
},
{
"from": "gpt",
"value": "Autonomous driving is..."
}
]
}See data/drivefusion_example.json and data/mllm_example.json for complete examples.
- LoRA: Low-Rank Adaptation for parameter-efficient training
- QLoRA: Quantized LoRA for reduced memory usage
- DoRA: Domain-specific adapter training
- Full Fine-tuning: Training all model parameters
- Mixed Precision Training: Reduce memory and computation
- Gradient Accumulation: Train with effective larger batch sizes
- Gradient Checkpointing: Further reduce memory consumption
- Distributed Training: Scale across multiple GPUs/nodes
Training supports multiple evaluation metrics:
- Accuracy computation
- Semantic similarity (with generation)
- Custom metric functions
- Reduce
per_device_train_batch_size - Enable gradient checkpointing:
gradient_checkpointing: true - Use QLoRA instead of LoRA for reduced memory
- Enable
deepspeedconfiguration
- Increase
per_device_train_batch_sizeif memory allows - Enable multi-GPU training
- Use DeepSpeed ZeRO optimizations
- Install
liger-kernelfor faster kernels
Contributions are welcome! Please ensure:
- Code follows the project's style guide (enforced by Ruff)
- All tests pass
- Documentation is updated
- Commits are clear and descriptive
For issues and questions:
- Check existing GitHub issues
- Review configuration examples in
config/ - Consult LLaMAFactory documentation: https://github.com/hiyouga/LLaMA-Factory
This project is licensed under the Apache License 2.0. See the LICENSE file for the full license text.
This project extends LLaMAFactory, which is also licensed under Apache License 2.0, ensuring compatibility between the projects.
This project is built upon LLaMAFactory by hiyouga and contributors. We extend our gratitude for the excellent foundation that made this project possible.
Special thanks to:
- The Hugging Face transformers and datasets teams
- The PEFT library developers for LoRA implementations
- Contributors of DeepSpeed and other optimization libraries