Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
e906570
feat!(agent): enable `askui` model + response format with `get()`
adi-wan-askui Apr 14, 2025
ba524ec
feat!(agent): raise error with get() if response schema is not implem…
adi-wan-askui Apr 15, 2025
3d805c0
refactor(utils): clean up where functions, classes etc. are defined
adi-wan-askui Apr 15, 2025
9fa65c2
refactor: remove obsolete code
adi-wan-askui Apr 15, 2025
7c5c061
feat!(agent): switch `get()` from dictionary to pydantic.BaseModel fo…
adi-wan-askui Apr 15, 2025
47da12e
docs: document `VisionAgent.get()`
adi-wan-askui Apr 15, 2025
e80fc50
refactor!(agent): rename `model_name` parameter to `model`
adi-wan-askui Apr 15, 2025
b6f7837
feat!(agent): enable selecting models using composition / for whole a…
adi-wan-askui Apr 16, 2025
b4a584a
refactor!(locators): rename `Class` to `Element`
adi-wan-askui Apr 17, 2025
0e3238c
feat(agent): support primitive types as response_schema in get method
adi-wan-askui Apr 17, 2025
469058c
refactor: validate all public methods & make locators non-pydantic based
adi-wan-askui Apr 17, 2025
5fb40b4
fix(reporting): fix reports overriding each other
adi-wan-askui Apr 17, 2025
c04b4b9
refactor(agent): make agent more modular / better testable
adi-wan-askui Apr 17, 2025
e4bbf11
feat(agent): make it easier to pass image to locate() and get()
adi-wan-askui Apr 17, 2025
30bac04
docs(locators): improve docs of relations
adi-wan-askui Apr 22, 2025
5406f26
feat(reporting): add image for get() to report
adi-wan-askui Apr 22, 2025
3712cfd
refactor(locators)!: rename Description to Prompt
adi-wan-askui Apr 22, 2025
5ff4774
feat(locators): change default reference point to center for right_of…
adi-wan-askui Apr 22, 2025
a256e5e
docs(locators): document all parameters
adi-wan-askui Apr 22, 2025
b7e9c57
feat(reporting): add image of Image / AIElement to report
adi-wan-askui Apr 23, 2025
2768389
Merge pull request #42 from askui/general-optimisations
adi-wan-askui Apr 23, 2025
ae4ae46
Merge pull request #41 from askui/model-selection
adi-wan-askui Apr 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 86 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ pip install askui
| | AskUI [INFO](https://hub.askui.com/) | Anthropic [INFO](https://console.anthropic.com/settings/keys) |
|----------|----------|----------|
| ENV Variables | `ASKUI_WORKSPACE_ID`, `ASKUI_TOKEN` | `ANTHROPIC_API_KEY` |
| Supported Commands | `click()`, `locate()`, `mouse_move()` | `act()`, `get()`, `click()`, `locate()`, `mouse_move()` |
| Supported Commands | `click()`, `get()`, `locate()`, `mouse_move()` | `act()`, `click()`, `get()`, `locate()`, `mouse_move()` |
| Description | Faster Inference, European Server, Enterprise Ready | Supports complex actions |

To get started, set the environment variables required to authenticate with your chosen model provider.
Expand Down Expand Up @@ -130,7 +130,7 @@ You can test the Vision Agent with Huggingface models via their Spaces API. Plea

**Example Code:**
```python
agent.click("search field", model_name="OS-Copilot/OS-Atlas-Base-7B")
agent.click("search field", model="OS-Copilot/OS-Atlas-Base-7B")
```

### 3c. Host your own **AI Models**
Expand All @@ -143,7 +143,7 @@ You can use Vision Agent with UI-TARS if you provide your own UI-TARS API endpoi

2. Step: Provide the `TARS_URL` and `TARS_API_KEY` environment variables to Vision Agent.

3. Step: Use the `model_name="tars"` parameter in your `click()`, `get()` and `act()` etc. commands.
3. Step: Use the `model="tars"` parameter in your `click()`, `get()` and `act()` etc. commands or when initializing the `VisionAgent`.


## ▶️ Start Building
Expand Down Expand Up @@ -171,18 +171,34 @@ with VisionAgent() as agent:

### 🎛️ Model Selection

Instead of relying on the default model for the entire automation script, you can specify a model for each `click()` (or `act()`, `get()` etc.) command using the `model_name` parameter.
Instead of relying on the default model for the entire automation script, you can specify a model for each `click()` (or `act()`, `get()` etc.) command using the `model` parameter or when initializing the `VisionAgent` (overridden by the `model` parameter of individual commands).

| | AskUI | Anthropic |
|----------|----------|----------|
| `act()` | | `anthropic-claude-3-5-sonnet-20241022` |
| `click()` | `askui`, `askui-combo`, `askui-pta`, `askui-ocr`, `askui-ai-element` | `anthropic-claude-3-5-sonnet-20241022` |
| `get()` | | `anthropic-claude-3-5-sonnet-20241022` |
| `get()` | | `askui`, `anthropic-claude-3-5-sonnet-20241022` |
| `locate()` | `askui`, `askui-combo`, `askui-pta`, `askui-ocr`, `askui-ai-element` | `anthropic-claude-3-5-sonnet-20241022` |
| `mouse_move()` | `askui`, `askui-combo`, `askui-pta`, `askui-ocr`, `askui-ai-element` | `anthropic-claude-3-5-sonnet-20241022` |


**Example:** `agent.click("Preview", model_name="askui-combo")`
**Example:**

```python
from askui import VisionAgent

with VisionAgent() as agent:
# Uses the default model (depending on the environment variables set, see above)
agent.click("Next")

with VisionAgent(model="askui-combo") as agent:
# Uses the "askui-combo" model because it was specified when initializing the agent
agent.click("Next")
# Uses the "anthropic-claude-3-5-sonnet-20241022" model
agent.click("Previous", model="anthropic-claude-3-5-sonnet-20241022")
# Uses the "askui-combo" model again as no model was specified
agent.click("Next")
```

<details>
<summary>AskUI AI Models</summary>
Expand Down Expand Up @@ -353,20 +369,82 @@ agent.type("********")

you can build more sophisticated locators.

**⚠️ Warning:** Support can vary depending on the model you are using. Currently, only, the `askui` model provides best support for locators. This model is chosen by default if `ASKUI_WORKSPACE_ID` and `ASKUI_TOKEN` environment variables are set and it is not overridden using the `model_name` parameter.
**⚠️ Warning:** Support can vary depending on the model you are using. Currently, only, the `askui` model provides best support for locators. This model is chosen by default if `ASKUI_WORKSPACE_ID` and `ASKUI_TOKEN` environment variables are set and it is not overridden using the `model` parameter.

Example:

```python
from askui import locators as loc

password_textfield_label = loc.Text("Password")
password_textfield = loc.Class("textfield").right_of(password_textfield_label)
password_textfield = loc.Element("textfield").right_of(password_textfield_label)

agent.click(password_textfield)
agent.type("********")
```

### 📊 Extracting information

The `get()` method allows you to extract information from the screen. You can use it to:

- Get text or data from the screen
- Check the state of UI elements
- Make decisions based on screen content
- Analyze static images

#### Basic usage

```python
# Get text from screen
url = agent.get("What is the current url shown in the url bar?")
print(url) # e.g., "github.com/login"

# Check UI state
# Just as an example, may be flaky if used as is, better use a response schema to check for a boolean value (see below)
is_logged_in = agent.get("Is the user logged in? Answer with 'yes' or 'no'.") == "yes"
if is_logged_in:
agent.click("Logout")
else:
agent.click("Login")
```

#### Using custom images

Instead of taking a screenshot, you can analyze specific images:

```python
from PIL import Image

# From PIL Image
image = Image.open("screenshot.png")
result = agent.get("What's in this image?", image)

# From file path
result = agent.get("What's in this image?", "screenshot.png")
```

#### Using response schemas

For structured data extraction, use Pydantic models extending `JsonSchemaBase`:

```python
from askui import JsonSchemaBase

class UserInfo(JsonSchemaBase):
username: str
is_online: bool

# Get structured data
user_info = agent.get(
"What is the username and online status?",
response_schema=UserInfo
)
print(f"User {user_info.username} is {'online' if user_info.is_online else 'offline'}")
```

**⚠️ Limitations:**
- Nested Pydantic schemas are not currently supported
- Response schema is currently only supported by "askui" model (default model if `ASKUI_WORKSPACE_ID` and `ASKUI_TOKEN` are set)

## What is AskUI Vision Agent?

Expand Down
5 changes: 5 additions & 0 deletions src/askui/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,19 @@
__version__ = "0.2.4"

from .agent import VisionAgent
from .models.router import ModelRouter
from .models.types.response_schemas import ResponseSchema, ResponseSchemaBase
from .tools.toolbox import AgentToolbox
from .tools.agent_os import AgentOs, ModifierKey, PcKey


__all__ = [
"AgentOs",
"AgentToolbox",
"ModelRouter",
"ModifierKey",
"PcKey",
"ResponseSchema",
"ResponseSchemaBase",
"VisionAgent",
]
Loading