Releases: askui/python-sdk
v0.25.1
What's Changed
the value of model_id for vlm_providers can now be set as env variable by @philipph-askui in #239
Full Changelog: v0.25.0...v0.25.1
v0.25.0
v0.25.0
🎉 Overview
v0.25.0 is a major release that introduces the new conversation-based agent architecture with a pluggable speaker system, a complete overhaul of the caching system, and a new callback system for observing and extending agent behavior.
The agent's control loop has been redesigned around the speaker-conversation pattern, where different "speakers" generate messages while the Conversation class handles tool execution and state management. This enables modular execution strategies — the default AgentSpeaker handles LLM interactions, while the new CacheExecutor speaker replays cached trajectories.
🚨 Breaking Changes
Caching Settings Restructured
The CachingSettings model has been restructured with renamed strategies, nested settings objects, and new defaults.
Strategy renames:
| Before | After |
|---|---|
"read" |
"execute" |
"write" |
"record" |
"both" |
"auto" |
"no" |
None |
Settings structure:
# Before (0.24.x)
from askui.models.shared.settings import CachingSettings
CachingSettings(
strategy="write",
cache_dir=".cache",
filename="login_flow.json",
execute_cached_trajectory_tool_settings=CachedExecutionToolSettings(
delay_time_between_action=0.5,
),
)
# After (0.25.0)
from askui.models.shared.settings import (
CachingSettings,
CacheWritingSettings,
CacheExecutionSettings,
)
CachingSettings(
strategy="record",
cache_dir=".askui_cache", # default changed
writing_settings=CacheWritingSettings(
filename="login_flow.json",
),
execution_settings=CacheExecutionSettings(
delay_time_between_actions=1.0, # default changed from 0.5
),
)Default changes:
cache_dirdefault:".cache"→".askui_cache"delay_time_between_actionsdefault:0.5→1.0seconds (to give UIs time to materialize)visual_validation_thresholddefault: changed to10
act() Parameter Renamed
The settings parameter on act() has been renamed to act_settings:
# Before
agent.act("Open settings", settings=ActSettings(...))
# After
agent.act("Open settings", act_settings=ActSettings(...))on_message Callback Removed
The on_message parameter on act() has been removed. Use the new ConversationCallback system instead (see New Features below).
# Before
agent.act("Open settings", on_message=my_callback)
# After — use ConversationCallback
from askui import ComputerAgent, ConversationCallback
class LoggingCallback(ConversationCallback):
def on_step_end(self, conversation, step_index, result):
for msg in result.messages_to_add:
print(msg.model_dump_json())
with ComputerAgent(callbacks=[LoggingCallback()]) as agent:
agent.act("Open settings")CustomAgent Removed
The internal CustomAgent class has been removed as part of the architecture refactoring.
Default Model Updated to claude-sonnet-4-6
The default model for both AnthropicVlmProvider and AskUIVlmProvider has been changed to claude-sonnet-4-6 (previously claude-sonnet-4-5-20251101 / claude-sonnet-4-5-20250929).
MessageSettings Type Changes
MessageSettings fields now use provider-agnostic types instead of Anthropic-specific ones:
betasfield removed — useprovider_options={"betas": [...]}insteadthinking,tool_choice: nowThinkingConfigParam | None/ToolChoiceParam | None(instead of AnthropicBeta*types withOmit)temperature: nowfloat | None(instead offloat | Omit)- New
provider_options: dict[str, Any] | Nonefield for provider-specific options
✨ New Features
Conversation-Based Architecture
The agent's control loop has been redesigned around the speaker-conversation pattern. The Conversation class orchestrates the flow, managing speaker switching, tool execution, message history, and truncation strategies. Speakers are pluggable message generators that produce responses without executing tools.
Speaker Handoff System
A new speaker handoff pattern enables dynamic switching between execution strategies. Speakers describe their capabilities in the system prompt, and the LLM can initiate handoffs via a switch_speaker tool. This replaces the hardcoded tool-result checks from previous versions.
Speakerabstract base class withcan_handle(),handle_step(), andon_activate()hooksSpeakersregistry for managing and looking up speakersSwitchSpeakerToolautomatically added when non-default speakers are registered- Speaker descriptions injected into system prompt for LLM-initiated handoffs
Callback System
A new PyTorch Lightning-style callback system provides hooks into the conversation lifecycle:
from askui import ComputerAgent, ConversationCallback
class TimingCallback(ConversationCallback):
def on_conversation_start(self, conversation):
self.start = time.time()
def on_step_end(self, conversation, step_index, result):
print(f"Step {step_index}: {result.status}")
def on_conversation_end(self, conversation):
print(f"Total: {time.time() - self.start:.2f}s")
with ComputerAgent(callbacks=[TimingCallback()]) as agent:
agent.act("Open settings")Available hooks: on_conversation_start, on_conversation_end, on_control_loop_start, on_control_loop_end, on_step_start, on_step_end, on_speaker_switch, on_tool_execution_start, on_tool_execution_end.
Caching V2: CacheManager and CacheExecutor
The caching system has been rebuilt from the ground up:
CacheManagerreplaces the oldCacheWriterwith support for cache metadata, validation, parameter handling, and recordingCacheExecutorspeaker replays cached trajectories as a dedicated speaker, enabling the agent to verify results after playback and handle non-cacheable tools gracefully- Visual validation during cache replay using perceptual hashing (pHash/aHash) to detect UI state changes
- Cache parameters — dynamic values in trajectories can be parameterized and substituted at replay time
- Cache metadata — cache files now include metadata (validity, token usage, visual validation config)
- Usage statistics for cached executions shown in HTML reports, including original token consumption from when the cache was recorded
CacheWritingSettings and CacheExecutionSettings
New dedicated settings classes for fine-grained control over cache recording and playback:
CacheWritingSettings:filename,parameter_identification_strategy,visual_verification_method,visual_validation_region_sizeCacheExecutionSettings:delay_time_between_actions,skip_visual_validation,visual_validation_threshold
Conversation ID
Each conversation execution is now assigned a unique conversation_id (UUID), accessible via conversation.conversation_id.
🔧 Improvements
Speaker Architecture Refinements
- Speaker
nameanddescriptionare now validated member variables (non-empty enforced at init) AgentSpeakernow returns"continue"status when tool calls are present (semantically accurate)- Speaker switch logic extracted into dedicated
_switch_speaker_if_needed()method _handle_result_statusrenamed to_handle_continue_conversationfor clarity
Anthropic Messages API Refactoring
_sanitize_message_for_apirefactored intofrom_message_paramandfrom_content_blockfunctionsToolUseBlockParam.visual_representation(internal cache field) is now properly excluded when sending to the Anthropic API
Code Quality
- Removed unnecessary try/except blocks in
AgentSpeakerandCacheExecutor - Added
isEnabledFor(logging.DEBUG)guard for costlymodel_dump()log calls - Tracing span names made consistent with function names
- Usage tracking refactored to integrate via callback pattern
- Default
.askui_cachedirectory added to.gitignore
HTML Reporter
- Truncation of long content (base64 image strings) to prevent report flooding
- Cache execution statistics section with original token usage hints
- Fixed crash for non-cached executions
🐛 Bug Fixes
- Fixed bug where cached executions were reported as success when they failed
- Fixed bug in visual validation during cached execution
- Fixed duplicate clipping of coordinates in mouse move
- Fixed agent occasionally providing coordinates as strings instead of integers
- Fixed bug where
AgentMessagesdid not show up in reports - Fixed agent response status not accurately reflecting tool-call state
- Fixed telemetry error when
AgentSettingscould not be converted to dict
📚 Documentation
- Updated
docs/06_caching.mdwith newCachingSettingsstructure, renamed strategies, and corrected defaults - Updated
docs/11_callbacks.mdwith new callback system documentation - Fixed outdated references across all docs:
act()parametersettings→act_settingson_messagecallback removed from MCP example- Typo
AntrhopicImageQAProvider→AnthropicImageQAProvider vlm_provider=GoogleImageQAProvider→image_qa_provider=GoogleImageQAProvider- Broken link to non-existent
11_file_support.mdfixed _on_speaker_switch→on_speaker_switchin callback hooks table
📈 Statistics
- 73 files changed
- 5,136 additions
- 1,500 deletions
Dependencies
opentelemetry-sdkmade a default dependency (previously optional)- OpenTelemetry instrumentors imported safely with fallback
Full Changelog: v0.24.1...v0.25.0
v0.24.1
fixes a bug where AgentMessages did not show up in reports anymore](7514591)
Full Changelog: v0.24.0...v0.24.1
v0.24.0
v0.24.0
🎉 Overview
v0.24.0 is a major architectural release that fundamentally refactors how custom models can be integrated into the AskUI SDK. The centerpiece is the new Bring-Your-Own-Model-Provider system: instead of configuring models via string identifiers and using a complicated ModelRouter abstraction, you can now simply plug typed provider instances directly into AgentSettings. Three clean interfaces, VlmProvider, ImageQAProvider, and DetectionProvider, make it straightforward to swap in your own model backends for acting, querying, and locating UI elements. Built-in providers for Anthropic and Google are included alongside the AskUI defaults.
🚨 Breaking Changes
-
Model Provider Overhaul: We removed the
ModelRouterandModelRegistryand replaced them with a newmodel_providersarchitecture. You can now bring your own model providers through three typed interfaces:VlmProvider(foract),ImageQAProvider(forget), andDetectionProvider(forlocate). Built-in providers includeAnthropicVlmProvider,GoogleImageQAProvider,AskUIVlmProvider, and more. Please see the new Bring Your Own Model Provider docs for detailed instructions.Migration: Replace
model/modelsconstructor parameters with the newsettings: AgentSettingsparameter:# Before agent = VisionAgent(model="claude-sonnet-4-20250514") # After from askui import ComputerAgent, AgentSettings from askui.model_providers import AnthropicVlmProvider agent = ComputerAgent(settings=AgentSettings( vlm_provider=AnthropicVlmProvider(model_id="claude-sonnet-4-20250514"), ))
-
VisionAgentrenamed toComputerAgent: The main agent class is nowComputerAgent.VisionAgentstill works but emits aDeprecationWarning. Similarly,AndroidVisionAgentis nowAndroidAgent. -
click()/mouse_move()modelparameter replaced: Themodelparameter onclick(),mouse_move(), andlocate()has been replaced bylocate_settings: LocateSettingsfor controlling resolution and other locate options. -
betasparameter removed fromMessageSettings: The Anthropic-specificbetasparameter was replaced with a genericprovider_options: dict[str, Any]field. To pass betas, useprovider_options={"betas": [...]}. -
Chat API removed: The Chat API (
src/askui/chat/) has been removed from the package along with its dependencies (sqlalchemy,alembic,fastapi,uvicorn,apscheduler, etc.). -
pynputAgentOs backend removed: ThePynputAgentOsimplementation and theaskui[pynput]optional dependency group have been removed. Use the defaultAskUiControllerClient(gRPC) backend instead. -
UITars model removed: The
UITarsmodel integration (src/askui/models/ui_tars_ep/) has been removed. -
OpenAI integration removed: The OpenAI-compatible model provider (
src/askui/models/openai/) has been removed. Use the new provider interfaces for custom model integrations. -
ModelCompositionandModelDefinitionremoved: These classes have been replaced by the new provider system.
✨ New Features
-
AgentSettingsfor centralized configuration: A newAgentSettingsclass provides a clean, typed configuration surface for agents with three provider slots:vlm_provider,image_qa_provider, anddetection_provider— each with sensible AskUI defaults. -
Bring-Your-Own-Model-Provider: Three abstract provider interfaces (
VlmProvider,ImageQAProvider,DetectionProvider) allow users to plug in their own models. Built-in implementations:AskUIVlmProvider,AskUIImageQAProvider,AskUIDetectionProvider(defaults)AnthropicVlmProvider,AnthropicImageQAProvider(direct Anthropic API)GoogleImageQAProvider(direct Google Gemini API)
-
mouse_movementaccepts adurationparameter to control mouse movement speed (in milliseconds, default: 500ms) by @philipph-askui in #233 -
Time and wait tools added to universal tool store by @mlikasam-askui in #234:
GetCurrentTimeTool— returns current date/time for time-aware agent decisionsWaitTool— pauses execution for a specified durationWaitWithProgressTool— wait with a visual progress barWaitUntilConditionTool— polls a condition with configurable interval and timeout
-
LocateSettingsandGetSettingsexposed in public API: Users can now control per-call locate/get behavior includingresolution,max_tokens,temperature, andsystem_prompt. -
FallbackLocateModelandFallbackGetModel: New utility classes that try multiple models in sequence until one succeeds, replacing the oldModelCompositionpattern. -
getandlocatetools in act loop: The LLM can now usegetandlocateas tools duringact()calls (only when anAgentOsis available).
🐛 Bug Fixes
-
Fixed agent crash without AgentOs:
getandlocatetools are now only added to the act loop whenagent_osis set. Agents used without an AgentOs (e.g., pure LLM pipelines) no longer crash onact(). by @philipph-askui in #237 -
Fixed OpenTelemetry import errors:
opentelemetry-sdkis now a default dependency. Instrumentor imports (FastAPIInstrumentor,HTTPXClientInstrumentor, etc.) are safely guarded withtry/exceptso installing without[otel]extras no longer causes import failures. by @philipph-askui in #238 -
Fixed typechecking issue in
not_given.py— added@finaldecorator to resolve mypy ambiguity. -
Fixed
Displaydefault value fornameparameter in AgentOS (was raising an error when executing from cache).
📚 Documentation
- Complete restructuring of docs (
00_overview.mdthrough10_extracting_data.md) - Removed outdated docs for chat API, MCP, and direct tool use
- New Bring Your Own Model Provider guide
- Updated reporting docs to distinguish between execution reports and test reports
- Updated README to reflect new
ComputerAgentclass name, corrected Python version requirement (>=3.10, <3.14), and fixed broken links
Dependencies
Removed: openai, fastapi, uvicorn, sqlalchemy, alembic, apscheduler, pynput, mss, structlog, asgi-correlation-id, starlette-context, anyio, bson, aiofiles
Added to core: opentelemetry-sdk>=1.38.0 (promoted from optional chat extras)
Optional extras changed:
askui[chat]— removedaskui[pynput]— removedaskui[otel]— now contains only the instrumentor packages (the base SDK is always available)askui[all]— now includesandroid,bedrock,tracing,vertex,web
📝 Full Changelog: v0.23.1...v0.24.0
v0.23.1
Release Notes - v0.23.1
🎉 Overview
This release adds the new LoadImageTool, improves Tool Store tools with flexible paths and encodings, introduces unique instance identifiers for tools, renames the project to AskUI Python SDK, and includes refactoring and documentation fixes. With 44 files changed, 353 additions, and 271 deletions, this is a focused update building on the v0.23.0 foundation.
✨ New Features
LoadImageTool
New universal tool for loading images from the filesystem:
- Location:
askui.tools.store.universal.LoadImageTool - Loads images from a configurable base directory for analysis, comparison, or visual inspection
- Supports common formats (PNG, JPEG etc.)
- Use cases: analyzing screenshots, comparing visual elements, verifying image content, providing visual context during execution
Use with agent.act() or via act_tools on the agent constructor.
Unique Tool Instance Identifiers
Tools now get unique instance identifiers so that multiple instances of the same tool (e.g. different base_dir) can be used together without name collisions. Tool names are derived from base name, optional tags, and a unique ID, and are sanitized and truncated to 64 characters for model compatibility.
🔧 Improvements
Tool Store – File Tools Flexibility
- ListFilesTool, ReadFromFileTool, WriteToFileTool:
base_dirnow acceptsstr | Path(previouslystronly). Paths are normalized to absolute for consistent descriptions and behavior. - ReadFromFileTool: New optional
encodingsparameter (default:["utf-8", "latin-1"]) to try multiple encodings when reading text files. Improves handling of files that are not UTF-8. - WriteToFileTool: Base directory is no longer created in
__init__; creation happens on write, aligning behavior with other file tools.
AskUI Controller Refactoring
- Stub checks in
AskUiControllerClientare centralized in a_get_stub()helper, removing repeated assert blocks and clarifying error messages (e.g. "Callconnect()first").
Documentation and Branding
- Project and docs renamed from "AskUI Vision Agent" to AskUI Python SDK (README, docs, and HTML report title).
- Fixed Custom Models link in README to point to
docs/using-models.md#using-custom-modelsinstead ofdocs/custom-models.md. - Doc and README references updated for consistency.
🐛 Bug Fixes
- Fixed stub initialization error message in AskUI controller ("Connect" → "connect()").
- Improved file reading robustness via configurable encodings in
ReadFromFileTool(avoidsUnicodeDecodeErrorfor non-UTF-8 files when an alternative encoding is suitable).
📚 Documentation
- README and docs updated for AskUI Python SDK naming and correct Custom Models link.
LoadImageTooldocumented with examples in its docstring and exported fromaskui.tools.store.universal.
📊 Statistics
- 44 files changed
- 353 lines added
- 271 lines removed
- Net change: +82 lines
🚀 New Tools
Using LoadImageTool
from askui import VisionAgent
from askui.tools.store.universal import LoadImageTool
with VisionAgent(act_tools=[LoadImageTool(base_dir="./images")]) as agent:
agent.act("Describe the logo image called './images/logo.png'")Or per-act:
with VisionAgent() as agent:
agent.act(
"Describe the logo image called './images/logo.png'",
tools=[LoadImageTool(base_dir="./images")]
)📝 Full Changelog
For a complete list of changes, see the git log.
Upgrade: pip install --upgrade askui
Documentation: docs.askui.com
v0.23.0
Release Notes - v0.23.0
🎉 Overview
This release introduces a major overhaul of the prompting system, a new Tool Store for extending agent capabilities, automatic AgentOS injection, and numerous improvements and bug fixes. With 106 files changed, 5,972 additions, and 1,767 deletions, this is one of the most significant updates to the Vision Agent.
✨ New Features
Advanced Prompting System
A completely redesigned prompting paradigm with a structured 5-fold (plus optional 6th) system prompt architecture:
- System Capabilities: Defines what the agent can do and how it should behave
- Device Information: Provides platform-specific context (desktop, mobile, web)
- UI Information: Custom information about your specific UI (strongly recommended)
- Report Format: Specifies how to format execution results
- Cache Use: New optional prompt part that specifies when and how the agent should use cache files
- Additional Rules: Optional special handling for edge cases or known issues
The new system provides better structure, flexibility, and control over agent behavior. A comprehensive prompting guide has been added to help you create effective custom prompts.
Breaking Change: System prompts passed as strings will now show a deprecation warning. Use ActSystemPrompt instead.
Tool Store
Introducing a new Tool Store that provides optional, extensible tools organized by category:
-
Universal Tools (
askui.tools.store.universal): Work with any agent typeListFilesTool: List files in a directoryReadFromFileTool: Read content from filesWriteToFileTool: Write content to filesPrintToConsoleTool: Print messages to console during execution
-
Computer Tools (
askui.tools.store.computer): Require ComputerAgentOsComputerSaveScreenshotTool: Save screenshots during execution- Experimental window management tools:
AddWindowAsVirtualDisplayToolListProcessWindowsToolListProcessToolSetProcessInFocusToolSetWindowInFocusTool
-
Android Tools (
askui.tools.store.android): Require AndroidAgentOsAndroidSaveScreenshotTool: Save screenshots from Android devices
Tools can be passed to agent.act() or to the agent constructor as act_tools for persistent availability.
AgentOS Auto-Injection
Tools that require AgentOS (like computer or Android tools) now automatically receive the appropriate AgentOS instance. This simplifies tool usage and eliminates the need for manual AgentOS management in most cases.
Computer Tools Refactoring
Computer tools have been completely refactored and reorganized:
- Tools are now properly modularized in
askui.tools.computer - Removed deprecated
AskUiComputerBaseTool - Improved separation of concerns and better code organization
- New tools added:
GetSystemInfoTool: Retrieve system informationGetActiveProcessTool: Get information about the active processConnectToolandDisconnectTool: Manage computer connections in chat
Enhanced AgentOS Capabilities
- Updated AgentOS JSON schema with expanded capabilities
- New system information retrieval methods
- Improved window management capabilities
- Better error handling for gRPC invalid argument errors
SSL Verification Control
Added the ability to disable SSL verification for the user identification API via the ASKUI_HTTP_SSL_VERIFICATION environment variable. This is useful for development environments with self-signed certificates.
Note: SSL verification is enabled by default for security.
🔧 Improvements
Chat API Enhancements
- Added computer connect/disconnect tools for chat interface
- Improved chat history management
- Enhanced MCP server integration for computer operations
Agent Improvements
- Cleaned up agent and agent_base code
- Fixed typechecking bugs
- Improved reporter encoding handling
- Added more reporter messages for better observability
- Enhanced overlay support during e2e controller tests
Prompting Improvements
- Refined how system prompts are provided
- Introduced
cache_useprompt part for better cache control - Improved prompt structure and organization
- Better handling of prompt parts
Tool System Improvements
- Better tool organization and categorization
- Improved tool initialization and lifecycle management
- Enhanced tool tagging system
- Better support for tools with AgentOS requirements
🐛 Bug Fixes
- Fixed act prompts issues
- Fixed reporter encoding problems
- Fixed tool initialization bugs
- Fixed typechecking issues in agent
- Fixed linter issues across the codebase
- Fixed typos in documentation and code
📚 Documentation
- Added comprehensive System Prompts documentation
- Updated README with Tool Store examples
- Improved code examples and usage patterns
🔄 Code Quality
- Extensive code cleanup and refactoring
- Improved type hints and type safety
- Better code organization and structure
- Enhanced test coverage
📊 Statistics
- 106 files changed
- 5,972 lines added
- 1,767 lines removed
- Net change: +4,205 lines
⚠️ Breaking Changes
-
System Prompt Format: System prompts should now use
ActSystemPromptinstead of plain strings. Passing strings will show a deprecation warning. -
Computer Tools: The
AskUiComputerBaseToolhas been removed. Use tools fromaskui.tools.computeroraskui.tools.store.computerinstead. -
Tool Organization: Computer tools have been reorganized. If you were using tools directly from
askui.tools.computer, check the new structure.
🚀 Migration Guide
Using the Tool Store
from askui import VisionAgent
from askui.tools.store.universal import PrintToConsoleTool, WriteToFileTool
from askui.tools.store.computer import ComputerSaveScreenshotTool
with VisionAgent(act_tools=[
PrintToConsoleTool(),
WriteToFileTool(base_dir="./output"),
ComputerSaveScreenshotTool(base_dir="./screenshots")
]) as agent:
agent.act("Take a screenshot and save it")📝 Full Changelog
For a complete list of changes, see the git log.
Upgrade: pip install --upgrade askui
Documentation: docs.askui.com
v0.22.12
What's Changed
- feat: add SBOM generation and release workflow by @mlikasam-askui in #214
- Add SBOM generator by @mlikasam-askui in #215
- ci: ensure SBOM generation runs after PyPI publish by @mlikasam-askui in #216
- refactor: remove functools.cache decorator from create_api_client fun… by @danyalxahid-askui in #218
- Cl 1935 scheduling workflows which run on a specific time by @danyalxahid-askui in #217
🚀 New Features
- Background Scheduler for Executing Run
- Create SBOM
🐛 Bug Fixes
- Unauthorized issues with long Running instances
Full Changelog: v0.22.11...v0.22.12
v0.22.11
What's Changed
- fix: add encoding='utf-8' to all file operations to prevent UnicodeEn… by @programminx-askui in #213
- feat(messages): add support for injecting cancelled tool results in m… by @danyalxahid-askui in #212
- Introduce io publisher for communicating events via stdio by @onur-askui in #211
🚀 New Features
io_publisherfor communicating events via stdio
🐛 Bug Fixes
- Added encoding='utf-8' to all file operations
- Handle messages without
tool_resultblock correctly
Full Changelog: v0.22.10...v0.22.11
v0.22.10
What's Changed
- refactor(mcp_clients): improve argument passing in call_tool method by @danyalxahid-askui in #209
🐛 Bug Fixes
Pip Install
• Support for fastmcp 2.14.* installed via pip
Full Changelog: v0.22.9...v0.22.10
v0.22.9
Release Notes: v0.22.9
🚀 New Features
Android Agent
- Added
deviceparameter toAndroidVisionAgentconstructor for device selection by serial number or index - Added
act_toolsparameter toAndroidVisionAgent(matchingVisionAgentfunctionality)
Reporting
- Added theme toggle (light/dark) to HTML reports
- Updated HTML report styling with CSS variables and improved color scheme
- Added reporting messages for
key_combination(),shell(),drag_and_drop(), andswipe()methods - Moved reporting from
AndroidAgentOsFacadetoPpadbAgentOsto eliminate duplicate reporting
🐛 Bug Fixes
Android Agent
- Added
UnknownAndroidDisplayclass for handling cases where display information cannot be determined
Full Changelog: v0.22.8...v0.22.9