Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"preLaunchTask": "Create .env.tmp file",
"postDebugTask": "Delete .env.tmp file",
"module": "uvicorn",
"args": ["src.chat.api.app:app","--reload","--port","8000"],
"args": ["src.askui.chat.api.app:app","--reload","--port","9261"],
"envFile": "${workspaceFolder}/.env.tmp",
"env": {
"ASKUI_WORKSPACES__LOG__FORMAT": "logfmt",
Expand Down
28 changes: 13 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -775,34 +775,39 @@ If you would like to disable the recording of usage data, set the `ASKUI__VA__TE
### AskUI Chat

AskUI Chat is a web application that allows interacting with an AskUI Vision Agent similar how it can be
done with `VisionAgent.act()` but in a more interactive manner that involves less code. Aside from
telling the AskUI Vision Agent what to do, the user can also demonstrate what to do (currently, only
done with `VisionAgent.act()` or `AndroidVisionAgent.act()` but in a more interactive manner that involves less code. Aside from
telling the agent what to do, the user can also demonstrate what to do (currently, only
clicking is supported).

**⚠️ Warning:** AskUI Chat is currently in an experimental stage and has several limitations (see below).

#### Architecture

This repository only includes the AskUI Chat API (`src/askui/chat`). The AskUI Chat UI can be accessed through the [AskUI Hub](https://hub.askui.com/) and connects to the local Chat API after it has been started.

#### Configuration

To use the chat, configure the following environment variables:

- `ASKUI_TOKEN`: AskUI Vision Agent behind chat uses currently the AskUI API
- `ASKUI_WORKSPACE_ID`: AskUI Vision Agent behind chat uses currently the AskUI API
- `ASKUI__CHAT_API__DATA_DIR` (optional, defaults to `$(pwd)/chat`): Currently, the AskUI chat stores its data in a directory locally. You can change the default directory by setting this environment variable.
- `ASKUI__CHAT_API__DATA_DIR` (optional, defaults to `$(pwd)/chat`): Currently, the AskUI chat stores all data in a directory locally. You can change the default directory by setting this environment variable.
- `ASKUI__CHAT_API__HOST` (optional, defaults to `127.0.0.1`): The host to bind the chat API to.
- `ASKUI__CHAT_API__PORT` (optional, defaults to `9261`): The port to bind the chat API to.
- `ASKUI__CHAT_API__LOG_LEVEL` (optional, defaults to `info`): The log level to use for the chat API.

#### Installation

```bash
pdm install # is going to install the dependencies of the api
pdm run chat:ui:install # is going to install the dependencies of the ui
pip install askui[chat]
```

You may need to give permissions on the fast run of the Chat UI to demonstrate actions (aka record clicks).

#### Usage

```bash
pdm run chat:api # is going to start the api at port 8000
pdm run chat:ui # is going to start the ui at port 3000
python -m askui.chat
```

You can use the chat to record a workflow and redo it later. For that, just tell the agent to redo all previous steps.
Expand All @@ -815,7 +820,7 @@ You can use the chat to record a workflow and redo it later. For that, just tell
#### Limitations

- A lot of errors are not handled properly and we allow the user to do a lot of actions that can lead to errors instead of properly guiding the user.
- The chat currently only allows rerunning actions through `VisionAgent.act()` which can be expensive, slow and is not necessary the most reliable way to do it.
- The chat currently only allows rerunning actions through `VisionAgent.act()` (or `AndroidVisionAgent.act()` or `WebVisionAgent.act()`) which can be expensive, slow and is not necessary the most reliable way to do it.
- A lot quirks in UI and API.
- Currently, api and ui need to be run in dev mode.
- When demonstrating actions, the corresponding screenshot may not reflect the correct state of the screen before the action. In this case, cancel demonstrating, delete messages and try again.
Expand All @@ -824,10 +829,3 @@ You can use the chat to record a workflow and redo it later. For that, just tell
- The agent is going to fail if there are no messages in the conversation, there is no tool use result message following the tool use message somewhere in the conversation, a message is too long etc.
Just adding or deleting the message in this case should fix the issue.
- You should not switch the conversation while waiting for an agent's answers or demonstrating actions.



#### Architecture

- The chat api/backend is a [FastAPI](https://fastapi.tiangolo.com/) application that provides a REST API similar to [OpenAI's Assistants API](https://platform.openai.com/docs/assistants/overview).
- The chat ui/frontend is a [Next.js](https://nextjs.org/) application that provides a web interface to the chat api.
Loading