An implementation of the Human Goal Stack (HGS) framework with Domain-Specific Language (DSL) layers for web automation and task execution.
Agent_Zero is a modular system that bridges high-level human goals with low-level execution primitives. It uses LLM-powered reasoning to decompose goals into executable action trees following the Human Goal Stack (HGS) framework. Currently operating in visualization-only mode, the system focuses on tree generation and visualization, with execution infrastructure preserved for future use. The system features a comprehensive React frontend for visualization and control, a FastAPI backend for reasoning and tree generation, and a Chrome extension for browser interaction and demonstration recording.
- Goal Decomposition: LLM-powered decomposition of high-level goals into executable action trees (HGS trees)
- Tree Visualization: Interactive tree visualization with minimalist Graphviz SVG rendering (visualization-only mode)
- LLM-Based Generation: Pure LLM-based tree generation for flexible goal decomposition
- Interactive Chat: Context-aware conversation interface for refining goals and tasks
- Chrome Extension: Browser extension for demonstration recording and DOM event capture
- Records user interactions (clicks, inputs, navigation) on any website
- Captures DOM events, screenshots, and optional video recordings
- Single-tab recording (records only the tab where recording started)
- Real-time event streaming to backend via WebSocket
- Demonstration Learning: Record user demonstrations and learn reusable workflows (infrastructure available)
- Data Cleaning Pipeline: Automated cleaning and enrichment of demonstration data
- Skill Library: Persistent storage and retrieval of learned skills
- Training Interface: Train on datasets like Mind2Web for procedure synthesis
- Real-Time Communication: WebSocket support for live recording and monitoring
Mode: Visualization-only (execution disabled)
- Tree generation and visualization are fully functional
- Execution workflows are temporarily disabled for visualization focus
Active Tree Type: HGS Tree
- Linear goal decomposition: Goal → Task → SubGoal → Action
- Only HGS trees are currently enabled in the UI
- AND/OR trees and Decision trees are available via API but commented out in the frontend
Generation Method: LLM Only
- Pure LLM-based generation (flexible but requires API key)
- Demo-based, frequency analysis, and hybrid methods are temporarily disabled
- All generation uses OpenAI API or compatible LLM endpoints
Chrome Extension Status: Active and Fully Functional
- Extension Name: Agent Zero - DOM Recorder (Version 2.0.1)
- Manifest Version: 3 (Chrome's latest standard)
- Recording Features:
- DOM event capture (clicks, inputs, scrolls, navigation, form submissions)
- Screenshot capture linked to events (throttled to prevent excessive captures)
- Optional video recording (WebM format) of browser tab
- Single-tab recording mode (only records from the tab where recording started)
- User Interface:
- Extension popup with start/stop controls
- Live event counter during recording
- Real-time status updates
- Integration:
- Works with all websites (http/https protocols)
- Integrated with main UI for session management
- Backend API integration for data storage
- WebSocket support for real-time event streaming
- Data Storage: Recording data stored in
data/sessions/YYYY-MM-DD/{session_id}/organized by date - Status: Extension is production-ready and actively used for demonstration recording
Agent_Zero is open source and available on GitHub:
- Repository: https://github.com/jeffelin/Agent_Zero
- Author: Jeff Lin (@jeffelin)
git clone https://github.com/jeffelin/Agent_Zero.git
cd Agent_ZeroAgent_Zero consists of a Python FastAPI backend for reasoning and execution, and a React frontend for visualization and control.
-
Human Goal Stack (HGS)
- Goal: The high-level objective (e.g., "Research AI agents")
- Task: A specific unit of work derived from the Goal
- SubGoal: An actionable step that maps to a platform-specific action
-
Domain-Specific Languages (DSL)
- DSL_sys (System Primitives): Universal instruction set for digital agents. Atomic operations like
click,type,scroll,navigate, andextract. Platform-independent. - DSL_p (Platform Actions): Compositions of
DSL_sysprimitives tailored for specific platforms (e.g.,ArxivSearch,HandshakeLogin)
- DSL_sys (System Primitives): Universal instruction set for digital agents. Atomic operations like
-
Stepper & Execution Engine (Infrastructure Available, Execution Disabled)
- Stepper: Reasoning engine that prunes the search space and decides which
DSL_paction to take based on the currentSubGoal - Mapping Function M: Logic that maps a
SubGoalto a specificDSL_pselection using heuristic inference or LLM reasoning - Note: Execution workflows are temporarily disabled; components remain in codebase for future use
- Stepper: Reasoning engine that prunes the search space and decides which
-
Programming by Example (PBE)
- Learning module (
backend/pbe) that generalizes newDSL_pactions from human demonstrations - Synthesizes reusable procedures from multiple demonstrations
- Supports anti-unification and pattern extraction
- Learning module (
-
State Management
- State detection and matching for robust execution
- Context-aware execution that adapts to UI changes
- State snapshots for replay and debugging
-
Recording System
- Records user demonstrations for learning
- Analyzes demonstrations to extract workflows
- Replays recorded workflows for similar tasks
Agent_Zero/
├── backend/ # FastAPI backend application
│ ├── api/ # REST API endpoints (routes.py)
│ ├── core/ # Core logic (DSL, Stepper, Execution Controller)
│ │ ├── dsl_sys.py # System primitives
│ │ ├── dsl_p.py # Platform actions
│ │ ├── stepper.py # Action selection logic
│ │ └── execution_controller.py
│ ├── pbe/ # Programming by Example learning module
│ │ ├── synthesizer.py # Procedure synthesis from demos
│ │ ├── generalizer.py # Pattern generalization
│ │ └── evaluator.py # Evaluation metrics
│ ├── automation/ # Browser and desktop automation
│ │ ├── backends/ # Playwright, visual, and hybrid executors
│ │ └── service.py # Backend selector/orchestrator
│ ├── recording/ # Demonstration recording and analysis
│ │ ├── recorder.py # Records user actions
│ │ ├── analyzer.py # Analyzes demonstrations
│ │ └── replayer.py # Replays recorded workflows
│ ├── memory/ # Persistent conversation + execution memory
│ │ ├── conversation_store.py # Writes transcripts to data/conversations
│ │ └── memory_retrieval.py # Fetches relevant memories for new runs
│ ├── state/ # State detection and matching
│ │ ├── state_detector.py # Detects UI state
│ │ ├── state_snapshot.py # Captures DOM/visual diffs
│ │ └── state_context.py # Manages execution context
│ ├── services/ # Business logic services
│ │ ├── workflow_service.py # HGS tree generation
│ │ ├── execution_service.py # Tree execution + subgoal planning
│ │ ├── data_cleaning_service.py # Data cleaning pipeline
│ │ ├── session_processor.py # Session processing and consolidation
│ │ ├── tree_visualization_service.py # Tree visualization (Graphviz)
│ │ ├── feature_extractor_service.py # Feature extraction for ML
│ │ ├── multi_strategy_element_finder.py # Multi-strategy element finding
│ │ ├── storage_manager.py # Storage management utilities
│ │ ├── storage.py # Skill library storage
│ │ └── telemetry.py # Structured logging
│ ├── execution/ # Controllers/orchestrators + retry policies
│ ├── pflow/ # Plan/execute/reflect helper nodes
│ ├── data/ # Storage for skills, procedures, and state
│ │ ├── procedures/ # Synthesized procedures
│ │ ├── mind2web_loader.py # Mind2Web dataset loader
│ │ └── procedure_storage.py # Procedure storage utilities
│ ├── tools/ # External tool integrations
│ ├── scripts/ # Utilities (e.g., convert_sessions_to_demos.py)
│ ├── clients/ # LLM client integrations
│ ├── config/ # Configuration management
│ ├── models/ # Data models
│ ├── utils/ # Utility functions
│ ├── validators/ # Validation logic
│ └── results/ # Runtime artifacts (git-ignored)
├── frontend/ # React + TypeScript frontend
│ ├── src/ # Source code
│ │ ├── components/ # React components
│ │ ├── pages/ # Page components
│ │ └── hooks/ # Custom React hooks
│ ├── extension/ # Chrome Extension for demonstration recording
│ │ ├── manifest.json # Extension manifest (Manifest v3)
│ │ ├── content_script.js # DOM event capture (~1,800 lines)
│ │ ├── background.js # Service worker for state management
│ │ ├── popup.html # Extension popup UI
│ │ └── popup.js # Popup logic and controls
│ └── dist/ # Production build artifacts
├── data/ # Storage for skills and demonstrations
│ ├── sessions/ # Browser recording sessions (organized by date)
│ │ └── YYYY-MM-DD/ # Date-organized session folders
│ │ └── {session_id}/ # Individual session data
│ │ ├── jsons/ # JSON files (session.json, events.jsonl, processed.json)
│ │ ├── videos/ # Video recordings (video.webm)
│ │ └── screenshots/ # Screenshots linked to events
│ ├── conversations/ # Session conversation transcripts
│ ├── demonstrations/ # Recorded demonstrations (legacy)
│ ├── examples/ # Curated example workflows
│ └── skills.json # Skill library
├── requirements.txt # Python dependencies
├── SETUP_AND_RUN.md # Additional set up and run directions
└── README.md # This file
The frontend is a dual-interface system:
-
Main UI (React + TypeScript): Visualizes HGS trees using React Flow plus backend-generated SVG rendering with minimalist styling
- Three-panel layout: Input (left), Visualization (center), Chat/Logs (right)
- Interactive tree visualization with scrolling and zoom
- Chat interface for goal refinement and clarification
- PNG export of the current tree visualization (via Export dialog)
- Demonstration recording controls integrated in the UI
-
Chrome Extension (
frontend/extension/): Browser extension for demonstration recording (Fully Functional)- Manifest Version 3 (Chrome's latest extension standard)
- Components:
content_script.js: Injected into web pages to capture DOM events and user interactions (~1,800 lines)background.js: Service worker managing recording state and backend communicationpopup.html/popup.js: Extension popup UI with start/stop controls and live event counter
- Features:
- Records clicks, inputs, scrolls, navigation, and form submissions
- Captures screenshots linked to important events
- Optional video recording of the browser tab (WebM format)
- Single-tab recording mode (only records events from the tab where recording started)
- Real-time event streaming to backend via WebSocket
- Session management and state persistence
- Status: Fully functional for demonstration recording
- Permissions:
<all_urls>,activeTab,scripting,storage,tabs
- Python 3.9+ (Python 3.10+ recommended)
- Node.js 18+
- OpenAI API key (or compatible LLM API)
- Playwright browsers (installed automatically with dependencies)
- Graphviz (for tree visualization)
- macOS:
brew install graphviz - Ubuntu/Debian:
sudo apt-get install graphviz - Windows: Download from graphviz.org
- macOS:
- Google Chrome (for extension and browser automation)
Use the provided script:
./RUN.shThis will:
- Activate the virtual environment (if available)
- Check for API keys
- Build the frontend if needed
- Start the server on http://localhost:8000
Note: After starting the server, you still need to:
- Load the Chrome extension (see "Chrome Extension Setup" below)
- Optionally set up Mind2Web dataset (see "Mind2Web Dataset Setup" below)
-
Navigate to the project root:
cd Agent_Zero -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install
-
Install Graphviz (required for tree visualization):
# macOS brew install graphviz # Ubuntu/Debian sudo apt-get install graphviz # Windows # Download from: https://graphviz.org/download/
-
Set environment variables:
export OPENAI_API_KEY='your-key-here'
Or create a
.envfile in the project root:echo "OPENAI_API_KEY=your-key-here" > .env
-
Run the server:
uvicorn backend.api.routes:app --reload --port 8000
Or use the start script:
./backend/START_SERVER.sh
The data/ directory is automatically created when you first use the application. It stores:
- Session recordings (
data/sessions/): Browser demonstration sessions organized by date - Conversation transcripts (
data/conversations/): Session conversation history - Example workflows (
data/examples/): Saved example workflows - Demonstrations (
data/demonstrations/): Processed demonstration data (legacy)
Directory Structure:
data/
├── sessions/
│ └── YYYY-MM-DD/ # Organized by date
│ └── {session_id}/ # Individual session folders
│ ├── jsons/ # JSON files (session.json, events.jsonl, processed.json)
│ ├── videos/ # Video recordings (optional)
│ └── screenshots/ # Screenshots linked to events
├── conversations/ # Session conversation transcripts
└── examples/ # Curated example workflows
The directory is created automatically on first use. No manual setup required.
Mind2Web is a dataset containing 2,350+ real-world web tasks from 137 websites. It's used for training and evaluation.
Location: Agent_Zero/datasets/Mind2Web/data/
-
Navigate to the Mind2Web directory:
cd Agent_Zero/datasets/Mind2Web -
Clone the training data from Hugging Face:
git clone https://huggingface.co/datasets/osunlp/Mind2Web data
-
Download and extract test splits (password:
mind2web):cd data # Download these files manually from: # https://huggingface.co/datasets/osunlp/Mind2Web/tree/main # Then extract: unzip -P mind2web test_task.zip unzip -P mind2web test_website.zip unzip -P mind2web test_domain.zip
-
Verify the structure:
ls -la # Should see: train/, test_task/, test_website/, test_domain/ -
Test the loader:
cd Agent_Zero python3 scripts/testing/test_mind2web.py
Expected Output:
✅ Loader available: True
✅ Dataset root: /path/to/Agent_Zero/datasets/Mind2Web/data
✅ Available splits: ['train', 'test_task', 'test_website', 'test_domain']
✅ Sample task loaded
Note: Mind2Web setup is optional. The application works without it, but you won't be able to use the training interface with Mind2Web tasks.
For detailed setup instructions, see datasets/Mind2Web/SETUP.md.
The Chrome extension is required for demonstration recording and browser interaction. It's a Manifest v3 extension that captures user interactions for learning workflows.
Installation Steps:
-
Open Google Chrome and navigate to
chrome://extensions/ -
Enable "Developer mode" (toggle switch in the top right corner)
-
Click "Load unpacked"
-
Navigate to and select the extension directory:
Agent_Zero/frontend/extension/ -
The extension "Agent Zero - DOM Recorder" should appear in your extensions list with version 2.0.1
-
Verify installation: Click the extension icon in Chrome's toolbar - you should see the popup with "Start Recording" button
Extension Features:
- DOM Event Recording: Captures clicks, inputs, scrolls, form submissions, and navigation
- Screenshot Capture: Takes screenshots linked to important events (throttled to prevent excessive captures)
- Video Recording: Optional tab video recording (WebM format) when enabled
- Single-Tab Mode: Records only from the tab where recording started (ignores other tabs)
- Live Event Counter: Popup shows real-time event count during recording
- Session Management: Tracks recording sessions and manages state persistence
Extension Components:
manifest.json: Extension configuration (Manifest v3)content_script.js: Injected into web pages to capture DOM events (~1,800 lines)background.js: Service worker managing state and backend communicationpopup.html/popup.js: Extension popup UI with controls
Using the Extension:
-
From Extension Popup:
- Click the extension icon in Chrome's toolbar
- Click "Start Recording" button
- Perform actions on the webpage
- Click "Stop Recording" when done
- View event count in the popup
-
From Main UI:
- Use the "Demonstrate" source option in the left panel
- Click "Start Recording" - this communicates with the extension
- Record your demonstration
- Stop recording from the UI
Extension Permissions (Required):
activeTab: Access to current tab for recordingscripting: Inject content scripts into web pagesstorage: Store recording session state locallytabs: Manage and track browser tabs<all_urls>: Record demonstrations on any website (http/https)
Current Status:
- Extension is fully functional for demonstration recording
- Recording infrastructure works end-to-end
- Data is stored in
data/sessions/YYYY-MM-DD/{session_id}/ - Integration with backend API for session management
- WebSocket support for real-time event streaming
Troubleshooting:
- Reload the extension in
chrome://extensions/after code changes - Check browser console (F12) for extension errors
- Verify backend is running on
http://localhost:8000 - Ensure extension has proper permissions enabled
- Check that the website is not a Chrome internal page (chrome://, about:, etc.)
Note: The extension must be loaded before starting recording sessions. After making changes to extension files, reload it in Chrome.
The backend exposes a comprehensive REST API and WebSocket endpoints. See backend/README.md for detailed API documentation.
GET /api- Health checkPOST /api/generate- Generate HGS tree from promptPOST /api/generate_mvp- Generate minimal viable tree with visualizationPOST /api/execute- Execute an HGS tree (endpoint available but execution temporarily disabled)POST /api/generate_and_execute- Generate and execute in one call (execution temporarily disabled)POST /api/plan_primitives- Plan primitive actionsPOST /api/execute_plan_stream- Stream execution of a plan (execution temporarily disabled)POST /api/execute_plan- Execute a plan synchronously (execution temporarily disabled)
POST /api/tree/visualize- Generate tree visualization (SVG)POST /api/and_or_tree/add_or_branch- Add OR branch to tree (for AND/OR trees) - API available but frontend disabledPOST /api/and_or_tree/add_condition- Add condition to tree (for AND/OR trees) - API available but frontend disabled
Current Status:
- Tree Type: Only HGS trees are enabled in the frontend UI
- Generation Method: Only LLM-based generation is enabled
- Execution: Temporarily disabled (visualization-only mode)
- Demo-based and frequency analysis methods are temporarily disabled in the UI (code preserved for future use)
POST /api/recording/start_session- Start a new recording sessionPOST /api/recording/stop_session- Stop recording sessionPOST /api/recording/event- Record a DOM eventPOST /api/recording/video- Upload video recordingPOST /api/recording/screenshot- Upload screenshotGET /api/recording/sessions- List all recording sessionsGET /api/recording/sessions/{session_id}- Get session detailsGET /api/recording_status- Get current recording statusPOST /api/recording/clean- Clean recording dataPOST /api/start_demonstration- Start demonstration recordingPOST /api/stop_demonstration- Stop demonstration recordingWebSocket /ws/recording_stream- WebSocket stream for live recording
GET /api/demonstrations- List all demonstrationsGET /api/demonstrations/{demo_id}- Get specific demonstrationPOST /api/execute_demo- Execute a demonstrationPOST /api/execute_demonstration- Execute demonstration with optionsPOST /api/execute_demonstration_template- Execute demonstration templateGET /api/demonstrations/{demo_id}/template- Get demonstration templateDELETE /api/demonstrations/{demo_id}- Delete demonstrationPOST /api/analyze_demonstration- Analyze a demonstration
GET /api/skills- List saved skillsGET /api/skills/{skill_id}- Get a specific skillPOST /api/save_skill- Save a skillGET /api/examples- List examplesGET /api/examples/{example_id}- Get specific examplePOST /api/save_example- Save an examplePOST /api/execute_example- Execute an exampleDELETE /api/examples/{example_id}- Delete example
GET /api/train/datasets- List available datasetsGET /api/train/tasks- List training tasksGET /api/train/tasks/{task_id}- Get specific taskGET /api/train/tasks/synthesized- Get synthesized tasksGET /api/train/tasks/stats/summary- Get task statisticsPOST /api/train/tasks/similarity- Find similar tasksPOST /api/train/sample-task- Sample a taskPOST /api/train/run- Start a training jobGET /api/train/status/{job_id}- Get training job statusGET /api/train/subsets- Get training subsetsGET /api/train/procedures- List synthesized proceduresGET /api/train/procedures/{procedure_id}- Get specific procedureDELETE /api/train/procedures/{procedure_id}- Delete procedurePOST /api/train/estimate-cost- Estimate training costs
POST /api/automation/run- Run automation actionGET /api/automation/screenshots/latest- Get latest screenshotPOST /api/detect_state- Detect current browser/desktop state
- Create a new file in
backend/tools/(e.g.,my_tool.py) - Define your tool class/function
- The
ToolRegistrywill automatically discover it (if configured)
pytest tests/- Backend: FastAPI application with modular services
- Frontend: React application with TypeScript
- Extension: Chrome extension for browser automation
- Experiments: Research and experimental architectures
-
Start the Backend (Terminal 1):
cd Agent_Zero source venv/bin/activate # or your virtual environment uvicorn backend.api.routes:app --reload --port 8000
Backend will be available at:
http://localhost:8000 -
Start the Frontend (Terminal 2):
cd Agent_Zero/frontend npm run devFrontend will be available at:
http://localhost:5173 -
Load the Chrome Extension:
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode"
- Click "Load unpacked"
- Select
Agent_Zero/frontend/extension/
- Open Chrome and go to
-
Access the Application:
- Open
http://localhost:5173in Chrome - The extension will be active and ready to record demonstrations
- Open
-
Build the frontend:
cd frontend npm run build -
Start the backend (serves both API and frontend):
uvicorn backend.api.routes:app --port 8000
-
Access at
http://localhost:8000(both API and frontend)
Configuration is managed through:
- Environment variables (
.envfile in project root) backend/config/settings.py- Settings loaderbackend/config/env.py- Environment configuration
Key settings:
OPENAI_API_KEY- LLM API key (required for tree generation)LOG_LEVEL- Logging level (DEBUG, INFO, WARNING, ERROR)ENABLE_TELEMETRY- Enable telemetry logging
Create a .env file in the project root:
OPENAI_API_KEY=your-key-here
LOG_LEVEL=INFO- Backend Documentation - Comprehensive backend API and architecture documentation
- Frontend Documentation - Frontend components, pages, and development guide
- Experiments Documentation - Experimental framework and research implementations
- Data Documentation - Data storage structure and file formats
- Setup Guide - Detailed setup and running instructions
When contributing to Agent_Zero:
- Backend Changes: Follow the service-oriented architecture. Keep business logic in
services/, API endpoints inapi/routes.py, and models inmodels/. - Frontend Changes: Use TypeScript, follow component structure in
src/components/, and maintain type safety. - New Features: Document in the appropriate README and update API documentation.
- Testing: Add tests for new functionality and ensure existing tests pass.
Backend won't start:
- Check Python version (3.9+ required)
- Verify virtual environment is activated
- Ensure all dependencies are installed:
pip install -r requirements.txt - Check for API key:
export OPENAI_API_KEY='your-key-here'
Frontend build errors:
- Clear node_modules and reinstall:
rm -rf node_modules && npm install - Check Node.js version (18+ required)
- Verify TypeScript compilation:
npm run build
Extension not working:
- Reload extension in Chrome after code changes (
chrome://extensions/→ reload icon) - Check browser console for errors (F12 → Console tab)
- Verify
manifest.jsonis valid JSON - Ensure backend is running on
http://localhost:8000 - Check extension permissions are enabled
- Verify extension is loaded from correct directory:
Agent_Zero/frontend/extension/
Recording issues:
- Extension not recording:
- Check that extension popup shows "Start Recording" button
- Verify backend is running and accessible
- Check browser console for connection errors
- Ensure WebSocket connection to backend is working
- No events captured:
- Verify you're on a webpage (not Chrome internal pages like chrome://)
- Check that recording was started from the active tab
- Ensure content script is injected (check browser console)
- Events from wrong tab:
- Extension uses single-tab recording mode
- Only records from the tab where recording started
- Open a new tab before starting recording if needed
- Backend connection errors:
- Verify backend is running on
http://localhost:8000 - Check CORS settings in backend
- Verify WebSocket endpoint is accessible
- Check backend logs for recording endpoint errors
- Verify backend is running on
- Storage issues:
- Check browser storage permissions
- Verify
data/sessions/directory is writable - Ensure sufficient disk space for recordings
- Check backend logs for file write errors
For more detailed troubleshooting, see SETUP_AND_RUN.md.