feat: add ScreenSwitchTool and display information retrieval#100
feat: add ScreenSwitchTool and display information retrieval#100danyalxahid-askui wants to merge 8 commits intofeat/update-controller-grpcfrom
Conversation
- Introduced `ScreenSwitchTool` for cycling through multiple displays. - Added methods in `AgentOs` for retrieving display information and active display. - Updated `VisionAgent` to utilize the new `ScreenSwitchTool`. - Enhanced `AskUiControllerClient` with display information retrieval capabilities.
…gent-to-search-through-display
… indentation - Added import for `Coordinate` in `askui_controller.py`. - Adjusted the indentation of the `@telemetry.record_call()` decorator for the `get_process_list` method.
- Reformatted import statements in `agent.py`, `agent_os.py`, and `askui_controller.py` for improved readability. - Removed unnecessary blank lines to maintain consistent formatting across files.
- Reformatted comments and docstrings in `screen_switch_tool.py` for better clarity. - Cleaned up import statements in `askui_controller.py` for improved organization and readability.
- Improved formatting of import statements in `agent_os.py` and `askui_controller.py` for better readability. - Removed unnecessary comments to enhance code clarity.
There was a problem hiding this comment.
Solid for first contribution :)
- WDYT about adjusting the system prompt to include a line similar to:
"If you don't see something on one display, list all available displays and see if it can be seen on another screen"? - I think we may need to adapt the computer tool as currently it does not adapt the display resolution when another display is selected which may lead to the scaling of coordinates and screenshots not working.
|
|
||
| @override | ||
| def _get_default_tools_for_act(self, model_choice: str) -> list[Tool]: | ||
| self._tools.append(ScreenSwitchTool(agent_os=self.tools.os)) |
There was a problem hiding this comment.
WDYT about including it right next to the ExceptionTool in the constructor? Seems easier and more consistent.
|
|
||
| @override | ||
| def _get_default_tools_for_act(self, model_choice: str) -> list[Tool]: | ||
| self._tools.append(ScreenSwitchTool(agent_os=self.tools.os)) |
There was a problem hiding this comment.
Aren't we missing the tool for listing all available displays?
| from askui.tools.agent_os import AgentOs, DisplayInformation | ||
|
|
||
|
|
||
| class ScreenSwitchTool(Tool): |
There was a problem hiding this comment.
Can we
- call the file
display_tools.py - add the tool for listing displays
- rename the tool here to
SetDisplayTool
so that we are more consistent with our naming across the code base?
| # We need to determine the number of displays available to provide context | ||
| # to the agent indicating that screen switching can only be done this number | ||
| # of times. | ||
| displays: list[DisplayInformation] = agent_os.get_display_information().displays | ||
|
|
||
| super().__init__( | ||
| name="screen_switch", | ||
| description=f""" | ||
| This tool is useful for switching between multiple displays to find | ||
| information not present on the current active screen. | ||
| If more than one display is available, this tool cycles through them. | ||
| Number of displays available: {len(displays)}. | ||
| """, | ||
| ) | ||
| self._agent_os: AgentOs = agent_os | ||
| self._displays: list[DisplayInformation] = displays |
There was a problem hiding this comment.
Decent idea of listing the displays here and including it in the tool description although that does not match what is specified in the issue. The problem I see with this is that it is too static as (the number of) displays may change throughout the execution. So I think going with an extra tool for listing displays the better approach. WDYT?
| self._agent_os: AgentOs = agent_os | ||
| self._displays: list[DisplayInformation] = displays | ||
|
|
||
| def __call__(self) -> None: |
There was a problem hiding this comment.
I think the tool would be more versatile and easier if it was just forwarding to the set_display taking a the display id as a parameter as then we could just forward without additional logic and we could directly switch to one display without having to call this tool multiple times to switch to the correct display. WDYT?
| ) -> controller_v1_pbs.Response_GetProcessList: | ||
| """ | ||
| Get a list of running processes. | ||
|
|
There was a problem hiding this comment.
Why remove the empty lines across all the docstrings? 😆
| @telemetry.record_call() | ||
| @override | ||
| def get_active_display(self) -> int: | ||
| """ | ||
| Get the active display. | ||
| """ | ||
| return self._display |
There was a problem hiding this comment.
Well done :) I think we should either include this info in the display information returned by the listing tool / the GetDisplayInformationResponse or add an extra tool for this.
I think it would be great if this was included in the controller api as I think they maintain the source of truth about which display is currently active (see https://askyourui.slack.com/archives/C091TSQ6KP0/p1753431836333729).
adi-wan-askui
left a comment
There was a problem hiding this comment.
I actually wanted to request some changes 😆 Selected the wrong option by accident.
Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
ScreenSwitchToolfor cycling through multiple displays.AgentOsfor retrieving display information and active display.VisionAgentto utilize the newScreenSwitchTool.AskUiControllerClientwith display information retrieval capabilities.