Conversation
|
the timeouts are magic numbers 🤣 based on the old code. |
programminx-askui
left a comment
There was a problem hiding this comment.
Did you tried it without time.sleep?
And can we change the Readme.md so that we are now supporting UI-Tars 1.5?
| def execute(self, _agent_os: AgentOs) -> None: | ||
| raise Exception("Call user action executed. This should be handled by the agent's logic, not directly here.") |
There was a problem hiding this comment.
Just thinking out loud where we could develop this in the future (not now): If we handed over other tools instead of only the AgentOs, e.g., one for getting somehow data from the user, e.g., through a console, or just handing over control to the user busy waiting until the user confirms that he/she has helped out (potentially with some explanation of how he/she helped out)., we would be able to handle this here.
| ) | ||
| raw_message = chat_completion.choices[-1].message.content | ||
| print(raw_message) | ||
| logger.debug(f"Raw message: {raw_message}") |
| self.execute_act(self.act_history) | ||
|
|
||
| def add_screenshot_to_history(self, message_history): | ||
| time.sleep(0.5) |
There was a problem hiding this comment.
Why was this necessary?
| if isinstance(action, FinishedAction): | ||
| return | ||
|
|
||
| action.execute(self._agent_os) |
There was a problem hiding this comment.
Beautiful refactoring :)
Using pydantic for parsing + using the strategy pattern
| class FinishedAction(BaseModel): | ||
| """Finished action.""" | ||
| def execute(self, _agent_os: AgentOs) -> None: | ||
| time.sleep(5) |
There was a problem hiding this comment.
Just thinking out loud where we could develop this in the future (not now): If we handed over other tools instead of only the AgentOs, e.g., one for waiting, we would be able to handle this here.
| """Hotkey action with key combination.""" | ||
| def execute(self, agent_os: AgentOs) -> None: | ||
| agent_os.mouse(x=self.start_box.x, y=self.start_box.y) | ||
| time.sleep(0.2) |
There was a problem hiding this comment.
I would move this into the AskUiControllerClient as this from my perspective depends on the underlying AgentOs implementation and is the general sleep/wait time between actions to ensure the agent os implementation, os, application etc. had time to react. It is generally model-independent.
Also I would make this configurable through the constructor of AskUiControllerClient as it is highly dependent on os, application being automated etc.
There was a problem hiding this comment.
FYI: Just saw that there are already 2 properties (pre_action_wait and post_action_wait) that may just need value adjusting and exposing through the constructor.
| action_type: str | ||
|
|
||
| def execute(self, agent_os: AgentOs) -> None: | ||
| raise NotImplementedError(f"Action '{self.action_type}' must implement execute method.") |
There was a problem hiding this comment.
| raise NotImplementedError(f"Action '{self.action_type}' must implement execute method.") | |
| raise NotImplementedError(f"Action '{self.action_type}' not implemented yet") |
No description provided.