Web Automation Test Agent

A comprehensive AI-powered web automation testing system similar to MAA, designed for testing job assistant applications. This system can simulate user interactions, automatically analyze UI elements, generate test flows, and verify system responses with the help of Ollama's qwen3-vl:8b multimodal model.

Features

Webpage Opening: Opens specified URLs in the default browser
Screen Capture: Captures screenshots of the current screen
AI UI Analysis: Uses qwen3-vl:8b to analyze UI elements from screenshots
Test Flow Generation: Automatically generates comprehensive test flows based on UI analysis
Automated Execution: Simulates clicks, typing, and other user actions
AI Validation: Validates operation results using AI analysis
Error Handling: Handles various exception scenarios
Test Reporting: Generates detailed test reports
OCR Integration: Includes OCR functionality for text recognition

Installation

Clone the repository (or download the files):

# Navigate to your desired directory
cd e:\agent自动测试web

Install dependencies:
```
pip install -r requirements.txt
```
Ensure Ollama is running:
- Install Ollama from ollama.com
- Pull the qwen3-vl:8b model:
```
ollama pull qwen3-vl:8b
```
- Start the Ollama server (it should run automatically)

Configuration

Edit run_test.py:
- Set BASE_URL to your job assistant application URL
- No need to manually set function positions - AI will automatically detect UI elements

Usage

Run the test:
```
python run_test.py
```
Test process:
- The agent will open the specified URL
- Capture a screenshot of the initial page
- Send the screenshot to qwen3-vl:8b for analysis
- AI will generate a comprehensive test flow
- The agent will execute each test step automatically
- Validate each step with AI analysis
- Generate a detailed test report
View results:
- Test reports will be generated in the test_results directory
- Screenshots for each test step will be saved
- Detailed logs will be available in test_report.txt

Project Structure

web-automation-tester/
├── web_automation_tester.py  # Main test class with AI integration
├── ocr_module.py            # OCR functionality for text recognition
├── run_test.py              # Test runner with configuration
├── requirements.txt         # Dependencies
├── README.md               # This documentation
└── test_results/           # Generated test results (auto-created)
    ├── initial_page.png     # Initial page screenshot for AI analysis
    ├── step_1_[element].png # Screenshots for each test step
    └── test_report.txt      # Detailed test report

Core Class: WebAutomationTester

Methods

open_webpage(url): Opens a webpage
get_screenshot(filename): Captures and saves a screenshot
load_screenshot(filename): Loads a saved screenshot
click_position(x, y, description): Clicks at specified coordinates
encode_image(image_path): Encodes image to base64 for AI analysis
validate_with_ollama(prompt, image_path): Validates results using Ollama with optional image
generate_test_flow(): Generates test flow using AI based on screenshot
execute_test_flow(test_flow): Executes test flow generated by AI
test_all_functions(function_positions): Tests all functions (AI-generated or manual)
generate_test_report(): Generates a comprehensive report

OCR Module

The system includes an OCR module (ocr_module.py) that provides text recognition capabilities, similar to the reference OCRer code. This module can be used to extract text from images for additional analysis.

Key Features:

Text recognition from images
Post-processing of OCR results
Text replacement based on rules
Filtering and validation of OCR results

Customization

Change AI model:
- Update the model parameter in the __init__ method of WebAutomationTester
- Supported models: qwen3-vl:8b (recommended), other multimodal models
Customize test flow generation:
- Modify the prompt in the generate_test_flow method to adjust AI instructions
- Add specific test requirements to the prompt
Adjust execution speed:
- Modify sleep times in the click_position and execute_test_flow methods
Add manual fallback:
- If AI test flow generation fails, you can provide manual function positions
- Pass a function_positions dictionary to test_all_functions

Troubleshooting

Ollama connection issues:
- Ensure Ollama server is running
- Verify the URL in ollama_url (default: http://localhost:11434/api/generate)
- Check that the qwen3-vl:8b model is properly pulled
AI test flow generation issues:
- Ensure the screenshot is clear and all UI elements are visible
- Check that the browser window is not minimized or obscured
- Try restarting Ollama if the API is unresponsive
Execution issues:
- Ensure no other applications are interfering with mouse/keyboard
- Check that the browser window is in focus during testing
- Adjust coordinates in the AI-generated test flow if clicks are inaccurate
Performance issues:
- Adjust sleep times in the code if tests are too fast or too slow
- Large screenshots may take longer to process - consider resizing before analysis

AI Capabilities

The system leverages qwen3-vl:8b's multimodal capabilities to:

UI Analysis: Identify buttons, links, input fields, and other UI elements
Coordinate Detection: Estimate screen coordinates for interactive elements
Test Flow Generation: Create logical test sequences based on UI structure
Result Validation: Analyze screenshots to verify successful operations
Error Detection: Identify and report issues in the application

Performance Considerations

AI Processing Time: Each screenshot analysis may take 5-15 seconds
Network Usage: Sending images to Ollama requires network bandwidth
System Resources: Running qwen3-vl:8b requires significant RAM (16GB+ recommended)
Test Duration: Complex applications may take several minutes to test completely

License

This project is open for personal and commercial use. Modify and extend it as needed for your specific testing requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
web-automation-platform		web-automation-platform
README.md		README.md
check_ai_response.py		check_ai_response.py
test_api.py		test_api.py
test_fix.py		test_fix.py
test_ocr.py		test_ocr.py
test_ocr2.py		test_ocr2.py
test_ocr_fix.py		test_ocr_fix.py
test_ollama_api.py		test_ollama_api.py
test_pure_vision_ai.py		test_pure_vision_ai.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Automation Test Agent

Features

Installation

Configuration

Usage

Project Structure

Core Class: WebAutomationTester

Methods

OCR Module

Key Features:

Customization

Troubleshooting

AI Capabilities

Performance Considerations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Automation Test Agent

Features

Installation

Configuration

Usage

Project Structure

Core Class: WebAutomationTester

Methods

OCR Module

Key Features:

Customization

Troubleshooting

AI Capabilities

Performance Considerations

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages