| title | OpenHome Ability SDK — Complete Reference |
|---|---|
| description | Single source of truth for everything available inside an OpenHome Ability SDK. |
This is the single source of truth for everything available inside an Ability. If a method or property isn't listed here, it either doesn't exist or hasn't been documented yet. Found something missing? Let us know on Discord.
Inside any Ability, you have access to two objects:
| Object | What it is | Access via |
|---|---|---|
self.capability_worker |
The SDK — all I/O, speech, audio, LLM, files, and flow control | CapabilityWorker(self) |
self.worker |
The Agent — logging, session management, memory, user connection info | Passed into call() |
| Runtime | Required file | call() signature |
Lifecycle |
|---|---|---|---|
| Interactive Skill / Brain Skill | main.py |
call(self, worker) |
Triggered on demand, exits with resume_normal_flow() |
| Background Daemon | background.py |
call(self, worker, background_daemon_mode) |
Auto-starts on session begin, runs continuously |
background.py must be named exactly background.py to be detected as a background daemon.
- Speaking / TTS
- Listening / User Input
- Combined Speak + Listen
- LLM / Text Generation
- Audio Playback
- Audio Recording
- Audio Streaming
- File Storage (User Data + Ability Directory)
- Ability Context Storage (Key-Value)
- WebSocket Communication
- Flow Control
- Logging
- Session Tasks
- User Connection Info
- Conversation Memory & History
- Music Mode
- Common Patterns
- Appendix: What You CAN'T Do (Yet)
- Appendix: Blocked Imports
Converts text to speech using the Agent's default voice. Streams audio to the user.
await self.capability_worker.speak("Hello! How can I help?")- Async: Yes (
await) - Voice: Uses whatever voice is configured on the Agent
- Tip: Keep it to 1-2 sentences. This is voice, not text.
Converts text to speech using a specific Voice ID (e.g., from ElevenLabs). Use when your Ability needs its own distinct voice.
await self.capability_worker.text_to_speech("Welcome aboard.", "pNInz6obpgDQGcFmaJgB")- Async: Yes (
await) - Voice: Overrides the Agent's default
- See: Voice ID catalog at the bottom of this doc
Waits for the user's next spoken or typed input. Returns it as a string.
user_input = await self.capability_worker.user_response()- Async: Yes (
await) - Returns:
str— the transcribed user input - Tip: Always check for empty strings (
if not user_input: continue)
Waits until the user has completely finished speaking before returning the final transcription.
full_input = await self.capability_worker.wait_for_complete_transcription()- Async: Yes (
await) - Returns:
str— the final transcribed input - When to use:
- Long-form input (descriptions, dictation, storytelling)
- Cases where partial STT results may break your logic
- Flows that need the entire spoken sentence before processing
- The first step of an ability when capturing the trigger sentence
When a trigger word starts an ability immediately, this method still returns the full spoken sentence, including both the trigger phrase and the actual request.
Example trigger word: remind
User says: remind me to call Alex tomorrow at 6 PM
import re
async def first_function(self):
full_input = await self.capability_worker.wait_for_complete_transcription()
reminder_text = re.sub(r"^\s*remind\b", "", full_input, flags=re.IGNORECASE).strip()
await self.capability_worker.speak(f"Creating reminder: {reminder_text}")In this flow:
- The ability is triggered by
remind wait_for_complete_transcription()returns:remind me to call Alex tomorrow at 6 PM- The extracted request becomes:
me to call Alex tomorrow at 6 PM
Speaks the text, then waits for the user's response. Returns the user's reply. A convenience wrapper around speak() + user_response().
answer = await self.capability_worker.run_io_loop("What's your favorite color?")- Async: Yes (
await) - Returns:
str— user's reply
Note: Uses the Agent's default voice (not a custom voice ID)
Speaks the text (appends "Please respond with 'yes' or 'no'"), then loops until the user clearly says yes or no.
confirmed = await self.capability_worker.run_confirmation_loop("Should I send this email?")
if confirmed:
# send it- Async: Yes (
await) - Returns:
bool—Truefor yes,Falsefor no
Generates a text response using the configured LLM.
response = self.capability_worker.text_to_text_response(
"What's the capital of France?",
history=[
{"role": "user", "content": "Let's do geography trivia"},
{"role": "assistant", "content": "Great, I'll ask you questions!"}
],
system_prompt="You are a geography quiz host. Keep answers under 1 sentence."
)⚠️ THIS IS THE ONLY SYNCHRONOUS METHOD. Do NOT useawait.- Returns:
str— the LLM's response - Parameters:
prompt_text(str): The current prompt/questionhistory(list): Optional conversation history for multi-turn context. Each item:{"role": "user"|"assistant", "content": "..."}system_prompt(str): Optional system prompt to control LLM behavior
- Tip: LLMs often wrap JSON in markdown fences. Always strip them:
clean = response.replace("```json", "").replace("```", "").strip()
Plays audio directly from bytes or a file-like object.
import requests
audio = requests.get("https://example.com/song.mp3")
await self.capability_worker.play_audio(audio.content)- Async: Yes (
await) - Input:
bytesor file-like object - Tip: For anything longer than a TTS clip, use Music Mode
Plays an audio file stored in the Ability's directory (same folder as main.py).
await self.capability_worker.play_from_audio_file("notification.mp3")- Async: Yes (
await) - Input: Filename (string) — must be in the same folder as your
main.py
Record audio from the user's microphone during a session.
Begins recording audio from the user's mic.
self.capability_worker.start_audio_recording()Stops the current audio recording.
self.capability_worker.stop_audio_recording()Returns the recorded audio as a .wav file.
wav_data = self.capability_worker.get_audio_recording()- Returns:
.wavfile data
Returns the length/duration of the current recording.
length = self.capability_worker.get_audio_recording_length()Clears the current recording buffer/file so the next recording starts fresh.
self.capability_worker.flush_audio_recording()- Async: No (synchronous)
async def record_voice_note(self):
await self.capability_worker.speak("I'll record a voice note. Start speaking.")
self.capability_worker.start_audio_recording()
await self.worker.session_tasks.sleep(10) # Record for 10 seconds
self.capability_worker.stop_audio_recording()
duration = self.capability_worker.get_audio_recording_length()
wav_file = self.capability_worker.get_audio_recording()
await self.capability_worker.speak(f"Got it. Recorded {duration} of audio.")
self.capability_worker.resume_normal_flow()For streaming audio in chunks rather than loading it all into memory at once.
Initializes an audio streaming session.
await self.capability_worker.stream_init()Streams audio data in chunks. Handles mono conversion and resampling automatically.
await self.capability_worker.send_audio_data_in_stream(audio_bytes, chunk_size=4096)- Input:
bytes, file-like object, orhttpx.Response - chunk_size: Bytes per chunk (default: 4096)
Ends the streaming session and cleans up.
await self.capability_worker.stream_end()async def stream_long_audio(self):
await self.capability_worker.stream_init()
response = requests.get("https://example.com/long-audio.mp3")
await self.capability_worker.send_audio_data_in_stream(response.content)
await self.capability_worker.stream_end()Use in_ability_directory to choose where the file operation runs:
in_ability_directory=False(default): user data storage (shared across that user's abilities)in_ability_directory=True: current Ability directory
Allowed file types: .txt, .csv, .json, .md, .log, .yaml, .yml
exists = await self.capability_worker.check_if_file_exists(
"user_prefs.json",
in_ability_directory=False
)- Async: Yes (
await) - Returns:
bool
await self.capability_worker.write_file(
"user_prefs.json",
'{"theme": "dark"}',
False
)- Async: Yes (
await) - Modes:
mode="a+"(default, append) ormode="w"(overwrite) - Default behavior (
a+): Appends to existing file; creates file if it doesn't exist
data = await self.capability_worker.read_file(
"user_prefs.json",
in_ability_directory=False
)- Async: Yes (
await) - Returns:
str
await self.capability_worker.delete_file(
"user_prefs.json",
in_ability_directory=False
)- Async: Yes (
await)
Returns filenames in user data storage.
files = await self.capability_worker.get_user_data_file_names()- Async: Yes (
await) - Returns:
list[str]
Because write_file defaults to append mode (a+), writing JSON to an existing file can silently produce invalid JSON ({"a":1}{"a":1,"b":2}) — no error is thrown, but your file is broken and unreadable. There are two safe ways to overwrite:
- Delete + Write — explicitly delete the file first, then write the new content
mode="w"— passed as a parameter towrite_file, it overwrites the file in place instead of appending
# ✅ RECOMMENDED — delete + write (explicit, safe)
async def save_json(self, filename, data):
if await self.capability_worker.check_if_file_exists(filename, False):
await self.capability_worker.delete_file(filename, False)
await self.capability_worker.write_file(filename, json.dumps(data), False)
# ✅ ALTERNATIVE — mode="w" (shorthand, but easy to forget)
await self.capability_worker.write_file("prefs.json", json.dumps(data), False, mode="w")
# ❌ WRONG — appending to JSON
await self.capability_worker.write_file("prefs.json", json.dumps(new_data), False)
# Result: {"old":"data"}{"new":"data"} ← broken JSON, no error thrownCapabilityWorker includes a key-value context store for structured user/session state.
- Each key stores a JSON object (
dict) as the value. - These methods are synchronous (
do not await). - Great for conversation memory, user preferences, cart/session state, multi-step workflows, feature flags, and API cache metadata.
Creates a new key-value pair.
result = self.capability_worker.create_key(
key="user_preferences",
value={
"language": "en",
"theme": "dark",
"notifications": True
}
)- Async: No (synchronous)
- Parameters:
key(str): Unique keyvalue(dict): JSON object to store
Note: If the key already exists, the backend may return an error.
Updates an existing key.
result = self.capability_worker.update_key(
key="user_preferences",
value={
"language": "en",
"theme": "light",
"notifications": False
}
)- Async: No (synchronous)
- Parameters: same as
create_key
Deletes a key-value pair permanently.
result = self.capability_worker.delete_key("user_preferences")- Async: No (synchronous)
- Parameters:
key(str): Key to delete
Returns all stored key-value pairs.
all_context = self.capability_worker.get_all_keys()- Async: No (synchronous)
- Returns: Backend response containing all keys/values
Returns one key's stored value.
preferences = self.capability_worker.get_single_key("user_preferences")- Async: No (synchronous)
- Parameters:
key(str): Key to retrieve
# 1) Create state
self.capability_worker.create_key(
key="conversation_1234",
value={
"last_intent": "book_flight",
"destination": "Dubai",
"travel_date": "2026-04-01",
"step": "awaiting_confirmation"
}
)
# 2) Update state
self.capability_worker.update_key(
key="conversation_1234",
value={
"last_intent": "book_flight",
"destination": "Dubai",
"travel_date": "2026-04-01",
"step": "confirmed"
}
)
# 3) Read state
context = self.capability_worker.get_single_key("conversation_1234")- Use descriptive keys (for example
user_123_preferences,conversation_456_state,cart_session_789). - Always store structured JSON objects, not raw strings.
- Handle missing keys safely before update:
existing = self.capability_worker.get_single_key("user_preferences")
if existing:
self.capability_worker.update_key("user_preferences", updated_value)
else:
self.capability_worker.create_key("user_preferences", updated_value)Use file storage when you are producing human-readable artifacts (for example notes.md, activity.log, report.txt, data.csv, user_prefs.json) that a user or developer might open in an editor to read or export, and writes are infrequent or append-only.
Use ability context storage (key-value) when you need internal, structured JSON state that your code reads and writes frequently (conversation state, carts, workflows, feature flags), especially when multiple Abilities or processes might touch the same state.
| Aspect | File Storage | Ability Context Storage (Key-Value) |
|---|---|---|
| Data shape | Any allowed text format; you define the structure. | One JSON object (dict) stored under each key. |
| API style | Async file ops: read_file, write_file, delete_file, etc. |
Sync key ops: create_key, update_key, delete_key, get_single_key. |
| Best for | Logs, notes, reports, markdown context, CSV/JSON exports. | Conversation/workflow state, carts, fast-changing preferences, feature flags. |
| Write pattern | Infrequent writes or append-only logs. | Frequent small reads/writes during interactions. |
| Concurrency / corruption | Be careful with JSON and multiple writers (delete-then-write or mode="w"). |
Safer atomic key updates for concurrent access. |
| Rule of thumb | Use when you want a file a human might open in an editor. | Use when you want live structured state your code updates often. |
Sends structured data over WebSocket. Used for custom events (music mode, DevKit actions, etc.).
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "on"})- Async: Yes (
await) - Parameters:
data_type(str): Event type identifierdata(dict): Payload
Sends a hardware action to a connected DevKit device.
await self.capability_worker.send_devkit_action("led_on")- Async: Yes (
await)
main.py SKILLS: You MUST call this when an interactive skill is done. It hands control back to the Agent. Without it, the Agent goes silent and the user has to restart the conversation.
self.capability_worker.resume_normal_flow()- Async: No (synchronous)
- When to call: On EVERY exit path:
- End of your main logic (happy path)
- After a
breakin a loop - Inside
exceptblocks (error fallback) - After timeout
- After user says "exit"/"stop"/"quit"
Checklist before shipping any Ability:
- Called after the main flow completes?
- Called after every
breakstatement? - Called in every
exceptblock that ends the ability? - Called after timeout logic?
- Called after user exit detection?
Do not call this in background.py daemon loops. Background daemons are independent threads and should keep running until session end.
Sends an interrupt event to stop the current assistant output (speech/audio) and switch back to user input.
interrupt_signal = await self.capability_worker.send_interrupt_signal()- Async: Yes (
await) - Use case: Manual cutoffs when your Ability needs to immediately stop ongoing output and listen for fresh input
- Background daemon rule: Call this before daemon
speak(),play_audio(), orplay_from_audio_file()to avoid audio overlap.
Always use this. Never use print().
self.worker.editor_logging_handler.info("Something happened")
self.worker.editor_logging_handler.error("Something broke")
self.worker.editor_logging_handler.warning("Something suspicious")
self.worker.editor_logging_handler.debug("Debugging")- Tip: Log before and after API calls so you can see what's happening in the Live Editor:
self.worker.editor_logging_handler.info(f"Calling weather API for {city}...") response = requests.get(url, timeout=10) self.worker.editor_logging_handler.info(f"Weather API returned: {response.status_code}")
OpenHome's managed task system. Ensures async work gets properly cancelled when sessions end. Raw asyncio tasks can outlive a session — if the user hangs up or switches abilities, your task keeps running as a ghost process. session_tasks ensures everything gets cleaned up properly.
Launches an async task within the agent's managed lifecycle.
self.worker.session_tasks.create(self.my_async_method())- Use instead of:
asyncio.create_task()(which can leak tasks)
Pauses execution for the specified duration.
await self.worker.session_tasks.sleep(5.0)- Use instead of:
asyncio.sleep()(which can't be cleanly cancelled) - Daemon best practice: Background
background.pyloops should always use this for polling intervals.
Returns the timezone for the active user/session when available.
timezone = self.capability_worker.get_timezone()- Async: No (synchronous)
- Returns: Timezone string (for example
America/Chicago) or empty/Nonewhen unavailable - Use case: Time-aware scheduling, local date/time formatting, reminders
- Common daemon use: Alarm/reminder checks aligned to the user's local timezone
Returns the linked account access token for the current user.
token = self.capability_worker.get_token("google")
self.worker.editor_logging_handler.info(token)- Async: No (synchronous)
- Parameters:
platform(str): Platform name. Supported values: Google ("google"), Slack ("slack"), Discord ("discord")
- Returns: Access token string for that linked platform
- Use case: Calling Google/Slack/Discord APIs on behalf of the linked user account
The user's public IP address at connection time.
user_ip = self.worker.user_socket.client.host
self.worker.editor_logging_handler.info(f"User connected from: {user_ip}")- Use case: IP-based geolocation, timezone detection, personalization
- Tip: Cloud/datacenter IPs won't give you useful location data. Check the ISP name for keywords like "amazon", "aws", "google cloud" before using for geolocation.
import requests
def get_user_location(self):
"""Get user's city and timezone from their IP address."""
try:
ip = self.worker.user_socket.client.host
resp = requests.get(f"http://ip-api.com/json/{ip}", timeout=5)
if resp.status_code == 200:
data = resp.json()
if data.get("status") == "success":
# Check for cloud/datacenter IPs
isp = data.get("isp", "").lower()
cloud_indicators = ["amazon", "aws", "google", "microsoft", "azure", "digitalocean"]
if any(c in isp for c in cloud_indicators):
self.worker.editor_logging_handler.warning("Cloud IP detected, location may be inaccurate")
return None
return {
"city": data.get("city"),
"region": data.get("regionName"),
"country": data.get("country"),
"timezone": data.get("timezone"),
"lat": data.get("lat"),
"lon": data.get("lon"),
}
except Exception as e:
self.worker.editor_logging_handler.error(f"Geolocation error: {e}")
return NoneAccess the full conversation message history from the current session through CapabilityWorker.
history = self.capability_worker.get_full_message_history()
self.worker.editor_logging_handler.info(f"Messages so far: {len(history)}")- Returns: The complete message history for the active session
- Use case: Building context-aware abilities that know what was said before the ability was triggered
- Common daemon use: Live conversation monitoring for note-taking, summarization, and event detection
Append additional instructions/context to the active Agent personality prompt.
self.capability_worker.update_personality_agent_prompt(
"The user prefers concise answers and metric units."
)- Async: No (synchronous)
- Use case: Persist behavior/context updates into the Agent's prompt for later turns
The text_to_text_response method accepts a history parameter. Use it to maintain multi-turn conversation context:
self.history = []
async def main_loop(self):
system = "You are a helpful cooking assistant. Keep answers under 2 sentences."
while True:
user_input = await self.capability_worker.user_response()
if "exit" in user_input.lower():
break
self.history.append({"role": "user", "content": user_input})
response = self.capability_worker.text_to_text_response( # No await!
user_input,
history=self.history,
system_prompt=system
)
self.history.append({"role": "assistant", "content": response})
await self.capability_worker.speak(response)
self.capability_worker.resume_normal_flow()After an Ability finishes, you can carry context forward in a few ways. When resume_normal_flow() fires, direct execution returns to the Agent.
What you CAN do:
- Save to conversation history — Anything spoken during the Ability (via
speak()) becomes part of the conversation history, which the Agent's LLM can see in subsequent turns. - Update the Agent prompt — Use
update_personality_agent_prompt(prompt_addition)to append durable instructions/context to the Agent's personality prompt. - Use file storage — Write data to persistent files (see File Storage) that other Abilities can read later. The Agent itself won't read these files directly, but your Abilities can share data through them.
- Memory feature — OpenHome has a new memory feature that can persist user context. (Details TBD as this feature evolves.)
What you CANNOT do (yet):
- Silently inject hidden conversation-history entries without speaking them
- Inject arbitrary structured runtime objects directly into the Agent's LLM context without using prompt/history/file mechanisms
When playing audio that's longer than a TTS utterance (music, sound effects, long recordings), you need to signal the system to stop listening and not interrupt.
async def play_track(self, audio_bytes):
# 1. Enter music mode (system stops listening, won't interrupt)
self.worker.music_mode_event.set()
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "on"})
# 2. Play the audio
await self.capability_worker.play_audio(audio_bytes)
# 3. Exit music mode (system resumes listening)
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "off"})
self.worker.music_mode_event.clear()What happens if you skip Music Mode: The system may try to transcribe the audio playback as user speech, or interrupt the playback thinking the user is talking.
Use the LLM to classify user intent and route to different actions:
def classify_intent(self, user_input: str) -> dict:
prompt = (
"Classify this user input. Return ONLY valid JSON.\n"
'{"intent": "weather|timer|music|chat", "confidence": 0.0-1.0}\n\n'
f"User: {user_input}"
)
raw = self.capability_worker.text_to_text_response(prompt) # No await!
clean = raw.replace("```json", "").replace("```", "").strip()
try:
return json.loads(clean)
except json.JSONDecodeError:
return {"intent": "chat", "confidence": 0.0}Always speak errors to the user and always resume:
async def do_something(self):
try:
response = requests.get("https://api.example.com/data", timeout=10)
if response.status_code == 200:
data = response.json()
await self.capability_worker.speak(f"Here's what I found: {data['result']}")
else:
await self.capability_worker.speak("Sorry, I couldn't get that information right now.")
except Exception as e:
self.worker.editor_logging_handler.error(f"API error: {e}")
await self.capability_worker.speak("Something went wrong. Let me hand you back.")
self.capability_worker.resume_normal_flow() # ALWAYS calledABILITY_VOICE_ID = "pNInz6obpgDQGcFmaJgB" # Deep, American, male narration voice
async def speak(self, text: str):
await self.capability_worker.text_to_speech(text, ABILITY_VOICE_ID)Use with text_to_speech(text, voice_id) to give your Ability its own voice.
| Voice ID | Accent | Gender | Tone | Good For |
|---|---|---|---|---|
21m00Tcm4TlvDq8ikWAM |
American | Female | Calm | Narration |
EXAVITQu4vr4xnSDxMaL |
American | Female | Soft | News |
XrExE9yKIg1WjnnlVkGX |
American | Female | Warm | Audiobook |
pMsXgVXv3BLzUgSXRplE |
American | Female | Pleasant | Interactive |
ThT5KcBeYPX3keUQqHPh |
British | Female | Pleasant | Children |
ErXwobaYiN019PkySvjV |
American | Male | Well-rounded | Narration |
GBv7mTt0atIp3Br8iCZE |
American | Male | Calm | Meditation |
TxGEqnHWrfWFTfGW9XjX |
American | Male | Deep | Narration |
pNInz6obpgDQGcFmaJgB |
American | Male | Deep | Narration |
onwK4e9ZLuTAKqWW03F9 |
British | Male | Deep | News |
D38z5RcWu1voky8WS1ja |
Irish | Male | Sailor | Games |
IKne3meq5aSn9XLyUdCD |
Australian | Male | Casual | Conversation |
Full catalog with 40+ voices available in the OpenHome Dashboard.
Being explicit about limitations saves developers hours of guessing:
| You might want to... | Status |
|---|---|
| Directly replace the full Agent system prompt from an Ability | update_personality_agent_prompt(prompt_addition) to append instructions |
Pass structured data back to the Agent after resume_normal_flow() |
❌ Not possible — use conversation history, prompt updates, or file storage as workarounds |
| Access other Abilities from within an Ability | ❌ Not supported |
| Run background tasks for the active session | ✅ Supported via background.py background daemons |
| Keep tasks alive after the session ends | ❌ Not supported — session tasks are cancelled on session end |
| Access a database directly (Redis, SQL, etc.) | ❌ Blocked — use File Storage API instead |
Use print() |
❌ Blocked — use editor_logging_handler |
Use asyncio.sleep() or asyncio.create_task() |
❌ Blocked — use session_tasks |
Use open() for raw file access |
❌ Blocked — use File Storage API |
Import redis, user_config |
❌ Blocked |
These will cause your Ability to be rejected by the sandbox:
| Import | Why | Use Instead |
|---|---|---|
redis |
Direct datastore coupling | File Storage API |
user_config |
Can leak global state | File Storage API |
Also avoid: exec(), eval(), pickle, dill, shelve, marshal, hardcoded secrets, MD5, ECB cipher mode.
Found an undocumented method? Report it on Discord so we can add it here.