Skip to content

Latest commit

 

History

History
89 lines (70 loc) · 5.99 KB

File metadata and controls

89 lines (70 loc) · 5.99 KB

🚀 Predefined Training Examples

This document provides a comprehensive overview of all the predefined training examples available in the verl/run_agents/ folder. Each example has been tested and configured for specific agent types and tasks.

📋 Training Examples Overview

Example Name Checkpoint Model Agent Type Dataset Tools Reward Function Max Steps Training Steps Batch Size Learning Rate Advantage Estimator
🖥️ GUI Agent checkpoint Qwen2.5-VL-3B-Instruct GUI GUI R1 Train/Test pyautogui_code_generator gui_reward 4 200 64 4e-7 GRPO
🔍 VLM QA Agent checkpoint Qwen2.5-VL-7B-Instruct React InfoSeek Train/Val asyncdense_retrieve, answer_qa infoseek_reward 5 200 128 5e-7 Reinforce++
💻 Code Agent checkpoint Qwen2.5-3B-Instruct Code Orz Math 57K Train code_interpreter math_reward_tool 8 200 64 5e-7 GRPO
🛒 Webshop Agent checkpoint Qwen2.5-3B-Instruct React Webshop Goals Train/Val webshop_browser webshop_reward 8 200 128 4e-7 GRPO
🔎 Search Agent checkpoint Qwen2.5-3B-Instruct React HotpotQA Train asyncdense_retrieve, answer_qa qa_f1_reward_format 4 100 128 5e-7 Reinforce++
🧪 Science World Agent checkpoint Qwen2.5-7B-Instruct React ScienceWorld Train/Val scienceworld_explorer scienceworld_reward 20 100 128 4e-7 Reinforce++
🏠 ALFWorld Agent checkpoint Qwen2.5-3B-Instruct React ALFWorld Train/Val alfworld_step, alfworld_get_admissible_commands, alfworld_get_task_objective alfworld_episode_reward 10 150 64 1e-6 Reinforce++
📚 Retrieve Agent checkpoint Qwen2.5-3B-Instruct React HotpotQA Train asyncdense_retrieve, answer_qa qa_f1_reward_format 4 100 128 5e-7 Reinforce++

⚙️ Detailed Configurations

For detailed configurations, please refer to the training scripts.

Using Trained Agents

The trained agents can be easily tested and used through the provided test examples. Each agent type has a corresponding test function that demonstrates its capabilities.

Available Agent Demos

Agent Type Test Function Command
Code Agent test_code_agent pytest agentfly/tests/docs/examples/test_react_agents.py::test_code_agent -s
VLM QA Agent test_react_vqa_agent pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_agent -s
VLM Retrieval Agent test_react_vqa_retrieval_agent pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_retrieval_agent -s
Science World Agent test_react_scienceworld_agent pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_scienceworld_agent -s
Webshop Agent test_react_webshop_agent pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_webshop_agent -s

Running the Demos

To run any of the agent demos, use the pytest command with the -s flag to see the streaming output:

# Example: Run the Code Agent demo
pytest agentfly/tests/docs/examples/test_react_agents.py::test_code_agent -s

# Example: Run the VLM QA Agent demo
pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_agent -s

Demo Descriptions

  • Code Agent: Solves mathematical problems using code interpretation
  • VLM QA Agent: Answers questions about images using vision-language understanding
  • VLM Retrieval Agent: Retrieves relevant information and answers questions about images
  • Science World Agent: Performs scientific reasoning tasks in a virtual environment
  • Webshop Agent: Navigates and shops in an e-commerce environment

Each demo will show the agent's reasoning process, tool usage, and final results in real-time.

Training Curves

Training curves and metrics are logged to WandB for each experiment.

W&B Logo
🦋 AgentFly
📊 Curves & Logs

WandB Logo View Training Curves on WandB