🚀 Predefined Training Examples

This document provides a comprehensive overview of all the predefined training examples available in the verl/run_agents/ folder. Each example has been tested and configured for specific agent types and tasks.

📋 Training Examples Overview

Example Name	Checkpoint	Model	Agent Type	Dataset	Tools	Reward Function	Max Steps	Training Steps	Batch Size	Learning Rate	Advantage Estimator
🖥️ GUI Agent	checkpoint	Qwen2.5-VL-3B-Instruct	GUI	GUI R1 Train/Test	pyautogui_code_generator	gui_reward	4	200	64	4e-7	GRPO
🔍 VLM QA Agent	checkpoint	Qwen2.5-VL-7B-Instruct	React	InfoSeek Train/Val	asyncdense_retrieve, answer_qa	infoseek_reward	5	200	128	5e-7	Reinforce++
💻 Code Agent	checkpoint	Qwen2.5-3B-Instruct	Code	Orz Math 57K Train	code_interpreter	math_reward_tool	8	200	64	5e-7	GRPO
🛒 Webshop Agent	checkpoint	Qwen2.5-3B-Instruct	React	Webshop Goals Train/Val	webshop_browser	webshop_reward	8	200	128	4e-7	GRPO
🔎 Search Agent	checkpoint	Qwen2.5-3B-Instruct	React	HotpotQA Train	asyncdense_retrieve, answer_qa	qa_f1_reward_format	4	100	128	5e-7	Reinforce++
🧪 Science World Agent	checkpoint	Qwen2.5-7B-Instruct	React	ScienceWorld Train/Val	scienceworld_explorer	scienceworld_reward	20	100	128	4e-7	Reinforce++
🏠 ALFWorld Agent	checkpoint	Qwen2.5-3B-Instruct	React	ALFWorld Train/Val	alfworld_step, alfworld_get_admissible_commands, alfworld_get_task_objective	alfworld_episode_reward	10	150	64	1e-6	Reinforce++
📚 Retrieve Agent	checkpoint	Qwen2.5-3B-Instruct	React	HotpotQA Train	asyncdense_retrieve, answer_qa	qa_f1_reward_format	4	100	128	5e-7	Reinforce++

⚙️ Detailed Configurations

For detailed configurations, please refer to the training scripts.

Using Trained Agents

The trained agents can be easily tested and used through the provided test examples. Each agent type has a corresponding test function that demonstrates its capabilities.

Available Agent Demos

Agent Type	Test Function	Command
Code Agent	`test_code_agent`	`pytest agentfly/tests/docs/examples/test_react_agents.py::test_code_agent -s`
VLM QA Agent	`test_react_vqa_agent`	`pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_agent -s`
VLM Retrieval Agent	`test_react_vqa_retrieval_agent`	`pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_retrieval_agent -s`
Science World Agent	`test_react_scienceworld_agent`	`pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_scienceworld_agent -s`
Webshop Agent	`test_react_webshop_agent`	`pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_webshop_agent -s`

Running the Demos

To run any of the agent demos, use the pytest command with the -s flag to see the streaming output:

# Example: Run the Code Agent demo
pytest agentfly/tests/docs/examples/test_react_agents.py::test_code_agent -s

# Example: Run the VLM QA Agent demo
pytest agentfly/tests/docs/examples/test_react_agents.py::test_react_vqa_agent -s

Demo Descriptions

Code Agent: Solves mathematical problems using code interpretation
VLM QA Agent: Answers questions about images using vision-language understanding
VLM Retrieval Agent: Retrieves relevant information and answers questions about images
Science World Agent: Performs scientific reasoning tasks in a virtual environment
Webshop Agent: Navigates and shops in an e-commerce environment

Each demo will show the agent's reasoning process, tool usage, and final results in real-time.

Training Curves

Training curves and metrics are logged to WandB for each experiment.

🦋 AgentFly
📊 Curves & Logs

WandB Logo View Training Curves on WandB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Predefined Training Examples

📋 Training Examples Overview

⚙️ Detailed Configurations

Using Trained Agents

Available Agent Demos

Running the Demos

Demo Descriptions

Training Curves

FilesExpand file tree

predefined_training_examples.md

Latest commit

History

predefined_training_examples.md

File metadata and controls

🚀 Predefined Training Examples

📋 Training Examples Overview

⚙️ Detailed Configurations

Using Trained Agents

Available Agent Demos

Running the Demos

Demo Descriptions

Training Curves