Skip to content

Latest commit

 

History

History
16 lines (14 loc) · 914 Bytes

File metadata and controls

16 lines (14 loc) · 914 Bytes

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

This is the repo for the paper (ACL2025)Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration. Illustration of multi-actor collaborative framework

Updates

Release plan

TODOs:

  • Model Checkpoints
  • BERT Topic Model Checkpoint
  • Labeled Slimpajama-670B datasets
  • Code for methods ......