Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features.#236
Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features.#236philipph-askui wants to merge 39 commits intomainfrom
Conversation
…ache settings to new format
…oprietary fields first (e.g. usage_param)
…ding with base64 image strings
…ng caching features
…agent occasionaly provides the values as strigns
… 1.0 to give UIs time to materialize
programminx-askui
left a comment
There was a problem hiding this comment.
I reached only until the cache_executor.py. But here are alreay some comments.
docs/06_caching.md
Outdated
| - **`None`** (default): No caching is used. The agent executes normally without recording or replaying actions. | ||
| - **`"record"`**: Records all agent actions to a cache file for future replay. | ||
| - **`"execute"`**: Provides tools to the agent to list and execute previously cached trajectories. | ||
| - **`"both"`**: Combines execute and record modes - the agent can use existing cached trajectories and will also record new ones. |
There was a problem hiding this comment.
I don't like both.
There was a problem hiding this comment.
How about "auto", as we automatically infer whether to execute or record?
| # Remove visual_representation from tool_use blocks in content | ||
| if isinstance(msg_dict.get("content"), list): | ||
| for block in msg_dict["content"]: | ||
| if isinstance(block, dict) and block.get("type") == "tool_use": | ||
| block.pop("visual_representation", None) |
There was a problem hiding this comment.
Can we wrap this in a function self naming function remove_images_form_tool_use?
I still wondering why the MaessageParam
There was a problem hiding this comment.
it is already wrapper in a function named _sanitize_message_for_api, or what do you mean?
| return isinstance(exception, (APIConnectionError, APITimeoutError, APIError)) | ||
|
|
||
|
|
||
| def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]: |
There was a problem hiding this comment.
Why is the function sanitize named?
There was a problem hiding this comment.
what do you mean by 'named'? a lambda function?
There was a problem hiding this comment.
Can we find a way to Combine this an place everthing in one location?
https://github.com/askui/python-sdk/pull/236/changes#r2863175394
| # Log response | ||
| logger.debug("Agent response: %s", response.model_dump(mode="json")) | ||
|
|
||
| except Exception: |
There was a problem hiding this comment.
Don't catch generic exceptions? Are you sure that ruff is enabled?
| if message.stop_reason == "max_tokens": | ||
| raise MaxTokensExceededError(max_tokens) | ||
| if message.stop_reason == "refusal": | ||
| raise ModelRefusalError |
There was a problem hiding this comment.
Which stop_reasons are defined in the API ? can we link to it and extend it?
| # Determine status based on whether there are tool calls | ||
| # If there are tool calls, conversation will execute them and loop back | ||
| # If no tool calls, conversation is done | ||
| has_tool_calls = self._has_tool_calls(response) | ||
| status = "continue" if has_tool_calls else "done" |
There was a problem hiding this comment.
I'm a little bit unsure what this block is doing.
There was a problem hiding this comment.
it checks if the message by the agent contains tool use blocks and informs the conversation if any tools need to be executed after the message.
There was a problem hiding this comment.
Who is now responsible to call the tools? The Speaker or the Conversation?
if the conversation, then we should move this code snippet to the conversation.
Othersie remove the tool callbacks from the conversation.
| # Determine status based on whether there are tool calls | ||
| # If there are tool calls, conversation will execute them and loop back | ||
| # If no tool calls, conversation is done | ||
| has_tool_calls = self._has_tool_calls(response) | ||
| status = "continue" if has_tool_calls else "done" |
There was a problem hiding this comment.
I assume here are you controlling the control flow.
There was a problem hiding this comment.
this is just setting a flag so that the control flow in conversation.py knows what to do next
|
|
||
| # Cache execution state | ||
| self._executing_from_cache: bool = False | ||
| self._cache_verification_pending: bool = False |
There was a problem hiding this comment.
When you have pending-flags, then you should consider to use a State Machine
| Tool execution is handled by the Conversation class, not by this speaker. | ||
| """ | ||
|
|
||
| def __init__( |
There was a problem hiding this comment.
So many internal parameters indicate, that the class has to many responsibilities.
We need to check, how we can split this up.
| return self._handle_needs_agent(result) | ||
| if result.status == "COMPLETED": | ||
| return self._handle_completed(result) | ||
| # FAILED |
| ) | ||
|
|
||
| # Add failure message to inform the agent about what happened | ||
| failure_message = MessageParam( |
There was a problem hiding this comment.
I need an deep dive about the MessageParam
| message_history=[assistant_message], | ||
| ) | ||
|
|
||
| except Exception as e: |
There was a problem hiding this comment.
generic exception!
| if method and callable(method): | ||
| method(self, *args, **kwargs) |
There was a problem hiding this comment.
Are we sure, that an exception in a callback is failint the complete loop.
Who is responsible for exception handling the Callback or the Conversation Loop?
|
|
||
| # Infrastructure | ||
| self._reporter = reporter | ||
| self.cache_manager = cache_manager |
There was a problem hiding this comment.
CacheManager as Callback
There was a problem hiding this comment.
the CacheManager has to stay in the conversation as it is required by the speakers
… the conversation
programminx-askui
left a comment
There was a problem hiding this comment.
I achived only reviewing until cache_executor
| import time | ||
| from askui import ComputerAgent, ConversationCallback | ||
|
|
||
| class TimingCallback(ConversationCallback): |
| image_qa_provider: Image Q&A provider (optional) | ||
| detection_provider: Detection provider (optional) |
There was a problem hiding this comment.
Is there a reason, why do we need the image_qa_provider and the detection_provider?
| agent.act("Open the settings menu") | ||
| ``` | ||
|
|
||
| ## Available Hooks |
There was a problem hiding this comment.
Missing switch_callback
| return isinstance(exception, (APIConnectionError, APITimeoutError, APIError)) | ||
|
|
||
|
|
||
| def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]: |
There was a problem hiding this comment.
Can we find a way to Combine this an place everthing in one location?
https://github.com/askui/python-sdk/pull/236/changes#r2863175394
| class ConversationException(Exception): | ||
| """Exception raised during conversation execution.""" | ||
|
|
||
| def __init__(self, msg: str) -> None: | ||
| super().__init__(msg) | ||
| self.msg = msg |
There was a problem hiding this comment.
I think we need to split this down later into e.g. ToolcallFailedConverstaionExecption and so on
| self._accumulated_usage.cache_read_input_tokens or 0 | ||
| ) + (step_usage.cache_read_input_tokens or 0) | ||
|
|
||
| current_span = trace.get_current_span() |
There was a problem hiding this comment.
What's happend when we don' have a current_span? The get_current_span shoudl return a optional/None | Attributes
| * You will be able to operate 2 devices: an android device, and a computer device. | ||
| * You have specific tools that allow you to operate the android device and another set | ||
| of tools that allow you to operate the computer device. | ||
| * The tool names have a prefix of either 'computer_' or 'android_'. The | ||
| 'computer_' tools will operate the computer, the 'android_' tools will | ||
| operate the android device. For example, when taking a screenshot, | ||
| you will have to use 'computer_screenshot' for taking a screenshot from the | ||
| computer, and 'android_screenshot' for taking a screenshot from the android | ||
| device. | ||
| * Use the most direct and efficient tool for each task | ||
| * Combine tools strategically for complex operations | ||
| * Prefer built-in tools over shell commands when possible |
There was a problem hiding this comment.
Is this true, if we have multiple AgentOS?
| * Platform: {sys.platform} | ||
| * Architecture: {platform.machine()} |
There was a problem hiding this comment.
please use the AgentOS getPlattform functionality.
| # Determine status based on whether there are tool calls | ||
| # If there are tool calls, conversation will execute them and loop back | ||
| # If no tool calls, conversation is done | ||
| has_tool_calls = self._has_tool_calls(response) | ||
| status = "continue" if has_tool_calls else "done" |
There was a problem hiding this comment.
Who is now responsible to call the tools? The Speaker or the Conversation?
if the conversation, then we should move this code snippet to the conversation.
Othersie remove the tool callbacks from the conversation.
| @override | ||
| def get_description(self) -> str: | ||
| """AgentSpeaker is the default coordinator and not a handoff target. | ||
|
|
||
| Returns: | ||
| Empty string. | ||
| """ | ||
| return "" |
There was a problem hiding this comment.
Do we need the name and the description?
This PR merges two key concepts from the feat/conversation_based_architecture and the feat/caching_v02 branches and makes them ready for main:
-- visual validation using imagehash (phash/ahash)
-- cache invalidation or validation, Parameters in cache files (identified through LLM)
-- non-cacheable tools through is_cacheable flag
-- usage params in reports
all adapted to the new act() architecture
Things that might be worth testing that should work:
and: sorry for yet another massive PR...
For design docs that outline the concept please see here:
Here is a minimal example to test: