Fine-tuning a small language model with GRPO and child-likeness rewards to generate toddler-style utterances, then deploying the model on a Cozmo robot for human-subject evaluation.
reinforcement-learning robots rewards language-model human-robot-interaction cozmo-sdk fine-tuning huggingface-transformers llms human-subjects-study grpo
-
Updated
Jan 30, 2026 - Python