A comprehensive AI-powered web automation testing system similar to MAA, designed for testing job assistant applications. This system can simulate user interactions, automatically analyze UI elements, generate test flows, and verify system responses with the help of Ollama's qwen3-vl:8b multimodal model.
- Webpage Opening: Opens specified URLs in the default browser
- Screen Capture: Captures screenshots of the current screen
- AI UI Analysis: Uses qwen3-vl:8b to analyze UI elements from screenshots
- Test Flow Generation: Automatically generates comprehensive test flows based on UI analysis
- Automated Execution: Simulates clicks, typing, and other user actions
- AI Validation: Validates operation results using AI analysis
- Error Handling: Handles various exception scenarios
- Test Reporting: Generates detailed test reports
- OCR Integration: Includes OCR functionality for text recognition
-
Clone the repository (or download the files):
# Navigate to your desired directory cd e:\agent自动测试web
-
Install dependencies:
pip install -r requirements.txt
-
Ensure Ollama is running:
- Install Ollama from ollama.com
- Pull the qwen3-vl:8b model:
ollama pull qwen3-vl:8b
- Start the Ollama server (it should run automatically)
- Edit
run_test.py:- Set
BASE_URLto your job assistant application URL - No need to manually set function positions - AI will automatically detect UI elements
- Set
-
Run the test:
python run_test.py
-
Test process:
- The agent will open the specified URL
- Capture a screenshot of the initial page
- Send the screenshot to qwen3-vl:8b for analysis
- AI will generate a comprehensive test flow
- The agent will execute each test step automatically
- Validate each step with AI analysis
- Generate a detailed test report
-
View results:
- Test reports will be generated in the
test_resultsdirectory - Screenshots for each test step will be saved
- Detailed logs will be available in
test_report.txt
- Test reports will be generated in the
web-automation-tester/
├── web_automation_tester.py # Main test class with AI integration
├── ocr_module.py # OCR functionality for text recognition
├── run_test.py # Test runner with configuration
├── requirements.txt # Dependencies
├── README.md # This documentation
└── test_results/ # Generated test results (auto-created)
├── initial_page.png # Initial page screenshot for AI analysis
├── step_1_[element].png # Screenshots for each test step
└── test_report.txt # Detailed test report
open_webpage(url): Opens a webpageget_screenshot(filename): Captures and saves a screenshotload_screenshot(filename): Loads a saved screenshotclick_position(x, y, description): Clicks at specified coordinatesencode_image(image_path): Encodes image to base64 for AI analysisvalidate_with_ollama(prompt, image_path): Validates results using Ollama with optional imagegenerate_test_flow(): Generates test flow using AI based on screenshotexecute_test_flow(test_flow): Executes test flow generated by AItest_all_functions(function_positions): Tests all functions (AI-generated or manual)generate_test_report(): Generates a comprehensive report
The system includes an OCR module (ocr_module.py) that provides text recognition capabilities, similar to the reference OCRer code. This module can be used to extract text from images for additional analysis.
- Text recognition from images
- Post-processing of OCR results
- Text replacement based on rules
- Filtering and validation of OCR results
-
Change AI model:
- Update the
modelparameter in the__init__method ofWebAutomationTester - Supported models: qwen3-vl:8b (recommended), other multimodal models
- Update the
-
Customize test flow generation:
- Modify the prompt in the
generate_test_flowmethod to adjust AI instructions - Add specific test requirements to the prompt
- Modify the prompt in the
-
Adjust execution speed:
- Modify sleep times in the
click_positionandexecute_test_flowmethods
- Modify sleep times in the
-
Add manual fallback:
- If AI test flow generation fails, you can provide manual function positions
- Pass a
function_positionsdictionary totest_all_functions
-
Ollama connection issues:
- Ensure Ollama server is running
- Verify the URL in
ollama_url(default:http://localhost:11434/api/generate) - Check that the qwen3-vl:8b model is properly pulled
-
AI test flow generation issues:
- Ensure the screenshot is clear and all UI elements are visible
- Check that the browser window is not minimized or obscured
- Try restarting Ollama if the API is unresponsive
-
Execution issues:
- Ensure no other applications are interfering with mouse/keyboard
- Check that the browser window is in focus during testing
- Adjust coordinates in the AI-generated test flow if clicks are inaccurate
-
Performance issues:
- Adjust sleep times in the code if tests are too fast or too slow
- Large screenshots may take longer to process - consider resizing before analysis
The system leverages qwen3-vl:8b's multimodal capabilities to:
- UI Analysis: Identify buttons, links, input fields, and other UI elements
- Coordinate Detection: Estimate screen coordinates for interactive elements
- Test Flow Generation: Create logical test sequences based on UI structure
- Result Validation: Analyze screenshots to verify successful operations
- Error Detection: Identify and report issues in the application
- AI Processing Time: Each screenshot analysis may take 5-15 seconds
- Network Usage: Sending images to Ollama requires network bandwidth
- System Resources: Running qwen3-vl:8b requires significant RAM (16GB+ recommended)
- Test Duration: Complex applications may take several minutes to test completely
This project is open for personal and commercial use. Modify and extend it as needed for your specific testing requirements.