An open-source web automation framework powered by large multimodal models.
Developed under Nexus Labs
This project is based on Avenir-Web, which itself builds upon the SeeAct framework developed by the OSU NLP Group.
OpenFlo enables autonomous web agents to perform tasks on any website using vision-language models. The system combines robust browser automation with intelligent action prediction to execute complex workflows.
src/seeact/: core agent implementation (SeeActAgent)agent/: main agent logicagent.py: central agent class and execution flowconfig.py: configuration loading and validationreporting.py: result saving and summary generationevaluation.py: task success evaluation and termination logicexecutor.py: action execution logicpredictor.py: LLM interaction and action prediction
src/run_agent.py: single-process runner (demo + batch)src/config/*.toml: sample configsdata/: example data and task files
- Python
>=3.9(src/pyproject.toml) - A browser for Playwright (Chromium recommended)
- An API key for your chosen provider (OpenRouter preferred)
From the repository root:
# Create a conda environment
conda create -n seeact python=3.11
conda activate seeact
# Install the package in editable mode
pip install -e src
# Set up Playwright and install browser kernels
playwright installSet an API key:
export OPENROUTER_API_KEY="your-key"You can also put keys in src/config/*.toml under [api_keys]. Environment variables take precedence.
Run scripts from src/ (paths in configs are written relative to src/):
cd srcIn your config (src/config/auto_mode.toml), set experiment.task_file_path, then run:
python run_agent.py -c config/auto_mode.tomlBatch mode expects a JSON array of tasks like:
[
{
"task_id": "task_001",
"confirmed_task": "Find the official API docs for X",
"website": "https://example.com/"
}
]Configs are TOML files; see src/config/auto_mode.toml.
[basic]save_file_dir: output root directorydefault_task,default_website: defaults for single-task runs
[experiment]task_file_path: JSON tasks list for batch modeoverwrite: skip or overwrite existing task output foldersmax_op,max_continuous_no_op,highlight
[model]name: model identifier (commonlyopenrouter/...)temperature,rate_limit- optional:
reasoning_model,checklist_model,completion_eval_model
[api_keys]openrouter_api_key(or setOPENROUTER_API_KEY)- optional:
gemini_api_key(or setGEMINI_API_KEY)
[playwright]headless,viewport,tracing,save_video,locale,geolocation
Each task writes to basic.save_file_dir/<task_id>/:
agent.log: per-task execution logresult.json: final summary (handled bysrc/seeact/agent/reporting.py)config.toml: resolved config snapshotall_predictions.json: recorded LLM I/O for the taskscreenshots/:screen_<step>.pngand sometimesscreen_<step>_labeled.png
The runners also write run-level logs to src/logs/.
- Missing API key: set
OPENROUTER_API_KEY(preferred) or configure[api_keys] - Playwright browser not found: run
python -m playwright install chromium - Want to watch the browser: set
playwright.headless = false - Config paths look wrong: run from
src/or pass an absolute-cconfig path
OpenFlo is built upon Avenir-Web, which extends the original SeeAct framework by the OSU NLP Group.
If you use this work, please cite the original SeeAct paper:
@article{zheng2024seeact,
title={GPT-4V(ision) is a Generalist Web Agent, if Grounded},
author={Zheng, Boyuan and Gou, Boyu and Kil, Jihyung and Sun, Huan and Su, Yu},
journal={arXiv preprint arXiv:2401.01614},
year={2024}
}This project maintains the same license as the original SeeAct framework. See the LICENSE file for details.