Skip to content

feat: add ScreenSwitchTool and display information retrieval#100

Closed
danyalxahid-askui wants to merge 8 commits intofeat/update-controller-grpcfrom
CL-1377-chat-enable-agent-to-search-through-display
Closed

feat: add ScreenSwitchTool and display information retrieval#100
danyalxahid-askui wants to merge 8 commits intofeat/update-controller-grpcfrom
CL-1377-chat-enable-agent-to-search-through-display

Conversation

@danyalxahid-askui
Copy link
Contributor

  • Introduced ScreenSwitchTool for cycling through multiple displays.
  • Added methods in AgentOs for retrieving display information and active display.
  • Updated VisionAgent to utilize the new ScreenSwitchTool.
  • Enhanced AskUiControllerClient with display information retrieval capabilities.

- Introduced `ScreenSwitchTool` for cycling through multiple displays.
- Added methods in `AgentOs` for retrieving display information and active display.
- Updated `VisionAgent` to utilize the new `ScreenSwitchTool`.
- Enhanced `AskUiControllerClient` with display information retrieval capabilities.
… indentation

- Added import for `Coordinate` in `askui_controller.py`.
- Adjusted the indentation of the `@telemetry.record_call()` decorator for the `get_process_list` method.
- Reformatted import statements in `agent.py`, `agent_os.py`, and `askui_controller.py` for improved readability.
- Removed unnecessary blank lines to maintain consistent formatting across files.
- Reformatted comments and docstrings in `screen_switch_tool.py` for better clarity.
- Cleaned up import statements in `askui_controller.py` for improved organization and readability.
- Improved formatting of import statements in `agent_os.py` and `askui_controller.py` for better readability.
- Removed unnecessary comments to enhance code clarity.
@danyalxahid-askui danyalxahid-askui marked this pull request as ready for review July 24, 2025 17:41
Copy link
Contributor

@adi-wan-askui adi-wan-askui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid for first contribution :)

  • WDYT about adjusting the system prompt to include a line similar to: "If you don't see something on one display, list all available displays and see if it can be seen on another screen"?
  • I think we may need to adapt the computer tool as currently it does not adapt the display resolution when another display is selected which may lead to the scaling of coordinates and screenshots not working.


@override
def _get_default_tools_for_act(self, model_choice: str) -> list[Tool]:
self._tools.append(ScreenSwitchTool(agent_os=self.tools.os))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about including it right next to the ExceptionTool in the constructor? Seems easier and more consistent.


@override
def _get_default_tools_for_act(self, model_choice: str) -> list[Tool]:
self._tools.append(ScreenSwitchTool(agent_os=self.tools.os))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing the tool for listing all available displays?

from askui.tools.agent_os import AgentOs, DisplayInformation


class ScreenSwitchTool(Tool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we

  • call the file display_tools.py
  • add the tool for listing displays
  • rename the tool here to SetDisplayTool

so that we are more consistent with our naming across the code base?

Comment on lines +11 to +26
# We need to determine the number of displays available to provide context
# to the agent indicating that screen switching can only be done this number
# of times.
displays: list[DisplayInformation] = agent_os.get_display_information().displays

super().__init__(
name="screen_switch",
description=f"""
This tool is useful for switching between multiple displays to find
information not present on the current active screen.
If more than one display is available, this tool cycles through them.
Number of displays available: {len(displays)}.
""",
)
self._agent_os: AgentOs = agent_os
self._displays: list[DisplayInformation] = displays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decent idea of listing the displays here and including it in the tool description although that does not match what is specified in the issue. The problem I see with this is that it is too static as (the number of) displays may change throughout the execution. So I think going with an extra tool for listing displays the better approach. WDYT?

self._agent_os: AgentOs = agent_os
self._displays: list[DisplayInformation] = displays

def __call__(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tool would be more versatile and easier if it was just forwarding to the set_display taking a the display id as a parameter as then we could just forward without additional logic and we could directly switch to one display without having to call this tool multiple times to switch to the correct display. WDYT?

) -> controller_v1_pbs.Response_GetProcessList:
"""
Get a list of running processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the empty lines across all the docstrings? 😆

Comment on lines +733 to +739
@telemetry.record_call()
@override
def get_active_display(self) -> int:
"""
Get the active display.
"""
return self._display
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done :) I think we should either include this info in the display information returned by the listing tool / the GetDisplayInformationResponse or add an extra tool for this.

I think it would be great if this was included in the controller api as I think they maintain the source of truth about which display is currently active (see https://askyourui.slack.com/archives/C091TSQ6KP0/p1753431836333729).

Copy link
Contributor

@adi-wan-askui adi-wan-askui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wanted to request some changes 😆 Selected the wrong option by accident.

Co-authored-by: adi-wan-askui <105295410+adi-wan-askui@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants