I am currently conducting research on fine-tuning Google's newly released FunctionGemma using the latest TRL v0.26+ capabilities, specifically focusing on GRPO for Agent training.
The goal is to fine-tune an agent in an interactive environment by leveraging the new tool-use support in GRPOTrainer.
Which environment? I don't know yet, lol.
The spectrum ranges from a simple game to a coding/research agent.
It might be useful, maybe not, but it will for sure be fun :)
📅 ETA: A full blog post and code breakdown will be published here in Q1 2026.
⭐ Star or Watch this repository to get notified when the post goes live!