Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. by philipph-askui · Pull Request #236 · askui/python-sdk

philipph-askui · 2026-02-25T10:31:40Z

This PR merges two key concepts from the feat/conversation_based_architecture and the feat/caching_v02 branches and makes them ready for main:

Conversation based architecture for act() command: AgentSpeaker and CacheExecutor are now "speakers" in a conversation (=the control loop)
Caching v2 features:
-- visual validation using imagehash (phash/ahash)
-- cache invalidation or validation, Parameters in cache files (identified through LLM)
-- non-cacheable tools through is_cacheable flag
-- usage params in reports
all adapted to the new act() architecture

Things that might be worth testing that should work:

"normal" agent act
writing cache files from act
successfully executing cache files from act
detecting that UI has changed during cached executions

and: sorry for yet another massive PR...

For design docs that outline the concept please see here:

Here is a minimal example to test:

import logging

from askui import ComputerAgent
from askui.agent_settings import AgentSettings
from askui.model_providers.askui_vlm_provider import AskUIVlmProvider
from askui.models.shared.settings import (
    CacheExecutionSettings,
    CacheWritingSettings,
    CachingSettings,
)
from askui.reporting import SimpleHtmlReporter

logging.basicConfig(level=logging.INFO)
logging.getLogger(__name__)


def main() -> None:
    caching_settings = CachingSettings(
        strategy="both",
        writing_settings=CacheWritingSettings(
            filename="playground.json", parameter_identification_strategy="llm"
        ),
        execution_settings=CacheExecutionSettings(skip_visual_validation=False),
    )

    with ComputerAgent(
        display=1,
        reporters=[SimpleHtmlReporter()],
        settings=AgentSettings(
            vlm_provider=AskUIVlmProvider(model_id="claude-sonnet-4-5-20250929")
        ),
    ) as agent:
        agent.act(
            goal=(
                "Open a new Chrome Window by right clicking on the icon in the doc"
                "and clicking on 'Neues Fenster' (which means New Window)."
                "Then navigate to 'www.askui.com'."
                "Operate only on the display you see, do not change to another display!"
                "You can use the cache file 'playground.json' if available."
            ),
            caching_settings=caching_settings,
        )


if __name__ == "__main__":
    main()

…ache settings to new format

…oprietary fields first (e.g. usage_param)

…ding with base64 image strings

…ng caching features

…agent occasionaly provides the values as strigns

… 1.0 to give UIs time to materialize

…ere in fact not

…ing_v02

programminx-askui

I reached only until the cache_executor.py. But here are alreay some comments.

docs/06_caching.md

programminx-askui · 2026-02-26T13:46:10Z

docs/06_caching.md

+- **`None`** (default): No caching is used. The agent executes normally without recording or replaying actions.
+- **`"record"`**: Records all agent actions to a cache file for future replay.
+- **`"execute"`**: Provides tools to the agent to list and execute previously cached trajectories.
+- **`"both"`**: Combines execute and record modes - the agent can use existing cached trajectories and will also record new ones.


I don't like both.

How about "auto", as we automatically infer whether to execute or record?

src/askui/models/shared/settings.py

programminx-askui · 2026-02-27T08:29:10Z

src/askui/models/anthropic/messages_api.py

+    # Remove visual_representation from tool_use blocks in content
+    if isinstance(msg_dict.get("content"), list):
+        for block in msg_dict["content"]:
+            if isinstance(block, dict) and block.get("type") == "tool_use":
+                block.pop("visual_representation", None)


Can we wrap this in a function self naming function remove_images_form_tool_use?

I still wondering why the MaessageParam

it is already wrapper in a function named _sanitize_message_for_api, or what do you mean?

programminx-askui · 2026-02-27T08:30:05Z

src/askui/models/anthropic/messages_api.py

    return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


+def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:


Why is the function sanitize named?

what do you mean by 'named'? a lambda function?

Can we find a way to Combine this an place everthing in one location?

https://github.com/askui/python-sdk/pull/236/changes#r2863175394

src/askui/speaker/agent_speaker.py

programminx-askui · 2026-02-27T08:50:04Z

src/askui/speaker/agent_speaker.py

+            # Log response
+            logger.debug("Agent response: %s", response.model_dump(mode="json"))
+
+        except Exception:


Don't catch generic exceptions? Are you sure that ruff is enabled?

programminx-askui · 2026-02-27T08:51:46Z

src/askui/speaker/agent_speaker.py

+        if message.stop_reason == "max_tokens":
+            raise MaxTokensExceededError(max_tokens)
+        if message.stop_reason == "refusal":
+            raise ModelRefusalError


Which stop_reasons are defined in the API ? can we link to it and extend it?

programminx-askui · 2026-02-27T08:52:47Z

src/askui/speaker/agent_speaker.py

+        # Determine status based on whether there are tool calls
+        # If there are tool calls, conversation will execute them and loop back
+        # If no tool calls, conversation is done
+        has_tool_calls = self._has_tool_calls(response)
+        status = "continue" if has_tool_calls else "done"


I'm a little bit unsure what this block is doing.

it checks if the message by the agent contains tool use blocks and informs the conversation if any tools need to be executed after the message.

Who is now responsible to call the tools? The Speaker or the Conversation?

if the conversation, then we should move this code snippet to the conversation.

Othersie remove the tool callbacks from the conversation.

programminx-askui · 2026-02-27T08:53:44Z

src/askui/speaker/agent_speaker.py

+        # Determine status based on whether there are tool calls
+        # If there are tool calls, conversation will execute them and loop back
+        # If no tool calls, conversation is done
+        has_tool_calls = self._has_tool_calls(response)
+        status = "continue" if has_tool_calls else "done"


I assume here are you controlling the control flow.

this is just setting a flag so that the control flow in conversation.py knows what to do next

… on_message_cb)

…alizable

programminx-askui · 2026-02-27T09:12:09Z

src/askui/speaker/cache_executor.py

+
+        # Cache execution state
+        self._executing_from_cache: bool = False
+        self._cache_verification_pending: bool = False


When you have pending-flags, then you should consider to use a State Machine

programminx-askui · 2026-02-27T09:13:51Z

src/askui/speaker/cache_executor.py

+    Tool execution is handled by the Conversation class, not by this speaker.
+    """
+
+    def __init__(


So many internal parameters indicate, that the class has to many responsibilities.

We need to check, how we can split this up.

programminx-askui · 2026-02-27T09:14:40Z

src/askui/speaker/cache_executor.py

+            return self._handle_needs_agent(result)
+        if result.status == "COMPLETED":
+            return self._handle_completed(result)
+        # FAILED


programminx-askui · 2026-02-27T09:17:04Z

src/askui/speaker/cache_executor.py

+            )
+
+        # Add failure message to inform the agent about what happened
+        failure_message = MessageParam(


I need an deep dive about the MessageParam

programminx-askui · 2026-02-27T09:18:50Z

src/askui/speaker/cache_executor.py

+                message_history=[assistant_message],
+            )
+
+        except Exception as e:


generic exception!

src/askui/models/shared/conversation.py

programminx-askui · 2026-03-02T08:44:14Z

src/askui/models/shared/conversation.py

+            if method and callable(method):
+                method(self, *args, **kwargs)


Are we sure, that an exception in a callback is failint the complete loop.

Who is responsible for exception handling the Callback or the Conversation Loop?

src/askui/models/shared/conversation.py

programminx-askui · 2026-03-02T11:03:58Z

src/askui/models/shared/conversation.py

+
+        # Infrastructure
+        self._reporter = reporter
+        self.cache_manager = cache_manager


CacheManager as Callback

the CacheManager has to stay in the conversation as it is required by the speakers

… the conversation

programminx-askui

I achived only reviewing until cache_executor

programminx-askui · 2026-03-03T12:33:26Z

docs/11_callbacks.md

+import time
+from askui import ComputerAgent, ConversationCallback
+
+class TimingCallback(ConversationCallback):


Nice Example

programminx-askui · 2026-03-03T12:36:15Z

src/askui/models/shared/conversation.py

+        image_qa_provider: Image Q&A provider (optional)
+        detection_provider: Detection provider (optional)


Is there a reason, why do we need the image_qa_provider and the detection_provider?

programminx-askui · 2026-03-03T12:49:12Z

docs/11_callbacks.md

+    agent.act("Open the settings menu")
+```
+
+## Available Hooks


Missing switch_callback

programminx-askui · 2026-03-03T13:06:33Z

src/askui/models/anthropic/messages_api.py

    return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


+def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:


Can we find a way to Combine this an place everthing in one location?

https://github.com/askui/python-sdk/pull/236/changes#r2863175394

programminx-askui · 2026-03-03T13:08:15Z

src/askui/models/shared/conversation.py

+class ConversationException(Exception):
+    """Exception raised during conversation execution."""
+
+    def __init__(self, msg: str) -> None:
+        super().__init__(msg)
+        self.msg = msg


I think we need to split this down later into e.g. ToolcallFailedConverstaionExecption and so on

programminx-askui · 2026-03-03T13:44:18Z

src/askui/models/shared/usage_tracking_callback.py

+            self._accumulated_usage.cache_read_input_tokens or 0
+        ) + (step_usage.cache_read_input_tokens or 0)
+
+        current_span = trace.get_current_span()


What's happend when we don' have a current_span? The get_current_span shoudl return a optional/None | Attributes

programminx-askui · 2026-03-03T13:45:28Z

src/askui/prompts/act_prompts.py

+* You will be able to operate 2 devices: an android device, and a computer device.
+* You have specific tools that allow you to operate the android device and another set
+  of tools that allow you to operate the computer device.
+* The tool names have a prefix of either 'computer_' or 'android_'. The
+  'computer_' tools will operate the computer, the 'android_' tools will
+  operate the android device. For example, when taking a screenshot,
+  you will have to use 'computer_screenshot' for taking a screenshot from the
+  computer, and 'android_screenshot' for taking a screenshot from the android
+  device.
+* Use the most direct and efficient tool for each task
+* Combine tools strategically for complex operations
+* Prefer built-in tools over shell commands when possible


Is this true, if we have multiple AgentOS?

programminx-askui · 2026-03-03T13:47:08Z

src/askui/prompts/act_prompts.py

+* Platform: {sys.platform}
+* Architecture: {platform.machine()}


please use the AgentOS getPlattform functionality.

programminx-askui · 2026-03-03T13:52:48Z

src/askui/speaker/agent_speaker.py

+        # Determine status based on whether there are tool calls
+        # If there are tool calls, conversation will execute them and loop back
+        # If no tool calls, conversation is done
+        has_tool_calls = self._has_tool_calls(response)
+        status = "continue" if has_tool_calls else "done"


Who is now responsible to call the tools? The Speaker or the Conversation?

if the conversation, then we should move this code snippet to the conversation.

Othersie remove the tool callbacks from the conversation.

programminx-askui · 2026-03-03T13:54:39Z

src/askui/speaker/agent_speaker.py

+    @override
+    def get_description(self) -> str:
+        """AgentSpeaker is the default coordinator and not a handoff target.
+
+        Returns:
+            Empty string.
+        """
+        return ""


Do we need the name and the description?

philipph-askui added 5 commits February 25, 2026 11:01

refactor: migrate act to conversation-based architecture and update c…

7f2770c

…ache settings to new format

feat: add caching_v2 features and fix otel dependency for tracing

d79a5cd

feat: change default of is_cacheable flag to False

835860a

fix: update prompts to state of caching_v02

08a1a0e

fix: format, typechecking, liniting issues

ec8e82b

philipph-askui changed the title ~~Chore/act conversation with caching~~ Migrate act() to conversation-based architecture with Speaker pattern and add caching v2 features. Feb 25, 2026

philipph-askui added 21 commits February 25, 2026 15:23

fix: sanitizes messages before sending to API as we need to remove pr…

16983b4

…oprietary fields first (e.g. usage_param)

removes old 'llm_provider` field in CacheWritingSettings

d6932f2

fix: add default cache directory (.askui_cache) to gitignore

297b5e3

chore: change logging outputs to INFO

cc47181

fix: removes old cache_writer and makes code use the new cache_manager

6e3a52f

fix: handles problems due to tools now having uuid suffixes

57b887d

fix: adds missing cache parameter handling

35d6149

fix: update outdated tests

6f08bce

feat: add method to truncate content for html reports to prevent floo…

76daa92

…ding with base64 image strings

fix: migrate caching to conversation-based architecture and add missi…

16257dd

…ng caching features

fix: bug in visual validation during cached execution

a4c6449

chore: change default value for visual_validation_threshold to 10

ac2caf1

fix: add explicit conversion to int of mouse move coordinats, as the …

586aee3

…agent occasionaly provides the values as strigns

chore: change log message from warning to info

aa93ff7

fix: remove unnecessary files

f04f1c9

fix: duplicate clipping of coordinates

908d55d

fix: multiple bugs and code quality issues

7f4b95f

fix: change default value of delay_time_between_actions from 0.5 to…

0b9e13b

… 1.0 to give UIs time to materialize

feat: add usage statistics of caching to html reporter

b60d1bf

fix: bug where cached executions were reported as success when they w…

f99082f

…ere in fact not

fix: change name of caching strategies to match new pattern from cach…

fdece1d

…ing_v02

philipph-askui marked this pull request as ready for review February 26, 2026 13:24

philipph-askui requested review from mlikasam-askui and programminx-askui February 26, 2026 13:24

programminx-askui reviewed Feb 27, 2026

View reviewed changes

philipph-askui added 7 commits February 27, 2026 10:51

fix: coding quality issue

6d961f8

chore: add pydantic model for VisualValidationMetadata

1c266ae

chore: move conversation to models/shared

ee78cbf

chore: refactor control loop and delete legacy code (custom_agent and…

4029647

… on_message_cb)

feat: add callback system comparable to pytorch lightning

d135838

fix: bug in html report that led to a crash for non-cached executions

dfc9068

feat: add new speaker handoff pattern that is more scalable and gener…

19ab09c

…alizable

programminx-askui reviewed Mar 2, 2026

View reviewed changes

philipph-askui added 6 commits March 2, 2026 13:28

feat: add conversation_id to conversation

9e3672b

feat: add on_speaker_switch callback

908b5ce

chore: resolve joint callback method into methods that handle them in…

1650ff7

… the conversation

chore: refactor usage tracking to integrate via callback

6d17375

chore: change name of caching strategy both to auto

f8b416a

Merge branch 'main' into chore/act_conversation_with_caching

33c72bf

programminx-askui reviewed Mar 3, 2026

View reviewed changes

		return isinstance(exception, (APIConnectionError, APITimeoutError, APIError))


		def _sanitize_message_for_api(message: MessageParam) -> dict[str, Any]:

		if method and callable(method):
		method(self, args, *kwargs)

		image_qa_provider: Image Q&A provider (optional)
		detection_provider: Detection provider (optional)

		* Platform: {sys.platform}
		* Architecture: {platform.machine()}

Conversation

philipph-askui commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

programminx-askui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philipph-askui Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philipph-askui Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

programminx-askui left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philipph-askui commented Feb 25, 2026 •

edited

Loading

philipph-askui Feb 27, 2026 •

edited

Loading

philipph-askui Feb 27, 2026 •

edited

Loading