-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(core): desktop companion domain — session, pipeline, pointing (#1909) #2025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
senamakel
merged 13 commits into
tinyhumansai:main
from
yuvrxj-afk:feat/desktop-companion-core
May 19, 2026
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
5a955c9
feat(core): add desktop_companion domain with session lifecycle (#1909)
yuvrxj-afk a48c3b3
feat(core): add companion pointing parser and coordinate mapping (#1909)
yuvrxj-afk 55bf6b6
feat(core): companion pipeline — STT, LLM, TTS orchestration (#1909)
yuvrxj-afk 6f5fe60
feat(core): wire provider-surface handoff into companion pipeline (#1…
yuvrxj-afk f54e339
test(core): companion session lifecycle JSON-RPC E2E (#1909)
yuvrxj-afk bbe242a
docs: desktop companion architecture and capability catalog (#1909)
yuvrxj-afk 45e6d8a
fix(core): regex caching, session lock cleanup, event publishing (#1909)
yuvrxj-afk ce14d7c
fix(core): address review feedback on desktop companion (#1909)
yuvrxj-afk 898939e
fix(core): token-aware handoff matching, session error logs, doc cap …
yuvrxj-afk c20235f
fix(core): extract testable handoff helper, add positive-path tests (…
yuvrxj-afk e02d578
test(core): add session auto-expire and TTL overflow tests (#1909)
yuvrxj-afk 1996611
fix(core): TOCTOU race in session_status TTL expiry, document Listeni…
yuvrxj-afk b2e8d16
ci: re-run — unrelated composio flaky test (#1909)
yuvrxj-afk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| --- | ||
| description: Desktop companion domain — Clicky-style interaction loop tying hotkey, voice, screen intelligence, LLM, TTS, and visual pointing into a single product experience. | ||
| icon: robot | ||
| --- | ||
|
|
||
| # Desktop Companion (`src/openhuman/desktop_companion/`) | ||
|
|
||
| The desktop companion orchestrates a Clicky-style interaction loop: hotkey activation, microphone capture, screen context, LLM reasoning, speech synthesis, and visual pointing. It reuses existing building blocks rather than reimplementing them. | ||
|
|
||
| ## Building blocks | ||
|
|
||
| | Module | What it provides | Path | | ||
| |--------|-----------------|------| | ||
| | **screen_intelligence** | Permission-gated capture sessions, `capture_now()`, `VisionSummary`, `AppContextInfo` | `src/openhuman/screen_intelligence/` | | ||
| | **voice** | Hotkey listener (push/tap), audio capture, cloud STT (Whisper), TTS (`reply_speech`) | `src/openhuman/voice/` | | ||
| | **meet_agent** | LLM orchestration pattern (STT -> LLM -> TTS), WAV packing | `src/openhuman/meet_agent/` | | ||
| | **overlay** | Floating UI surface, attention events, typewriter bubbles | `src/openhuman/overlay/` | | ||
| | **provider_surfaces** | Connected-app event queue (`ingest_event`, `list_queue`) | `src/openhuman/provider_surfaces/` | | ||
| | **accessibility** | Foreground app context (`foreground_context()`) | `src/openhuman/accessibility/` | | ||
|
|
||
| ## Module layout | ||
|
|
||
| ```text | ||
| src/openhuman/desktop_companion/ | ||
| mod.rs — module exports (light) | ||
| types.rs — CompanionState enum, CompanionConfig, ConversationTurn, session param/result types | ||
| session.rs — singleton session lifecycle, state machine, TTL, conversation history | ||
| pipeline.rs — STT -> screen context -> LLM -> TTS -> pointing orchestration | ||
| pointing.rs — [POINT:x,y:label:screenN] tag parser, multi-monitor coordinate mapping | ||
| handoff.rs — provider-surface queue matching for connected-app actions | ||
| bus.rs — broadcast channel for CompanionStateChangedEvent | ||
| schemas.rs — RPC controllers (companion_start_session, companion_stop_session, etc.) | ||
| ``` | ||
|
|
||
| ## State machine | ||
|
|
||
| ```text | ||
| Idle -> Listening -> Thinking -> Speaking -> Pointing -> Idle | ||
| | | | ||
| v v | ||
| Listening Listening (interrupt) | ||
|
|
||
| Any state -> Error -> Idle (reset) | ||
| ``` | ||
|
|
||
| Valid transitions are enforced by `session::is_valid_transition()`. Key paths: | ||
|
|
||
| - **Happy path**: Idle -> Listening -> Thinking -> Speaking -> Pointing -> Idle | ||
| - **No pointing**: Thinking -> Speaking -> Idle (no POINT tags in response) | ||
| - **Interrupt**: Speaking/Pointing -> Listening (user re-activates hotkey) | ||
| - **Cancel**: Thinking -> Idle (user cancels mid-think) | ||
| - **Error recovery**: Any -> Error -> Idle | ||
|
|
||
| ## Interaction pipeline | ||
|
|
||
| `pipeline.rs` orchestrates a single turn: | ||
|
|
||
| 1. **Activation** — state transitions to Listening (will be driven by Tauri shell hotkey bridge in PR 2) | ||
| 2. **STT** — audio samples transcribed via `voice::cloud_transcribe` (Whisper) | ||
| 3. **Screen context** — `accessibility::foreground_context()` for app name + window title | ||
| 4. **LLM** — chat-completions via `BackendOAuthClient` with system prompt, screen context, and rolling conversation history (last 20 turns as context) | ||
| 5. **Parse response** — extract `[POINT:x,y:label:screenN]` tags via `pointing::parse_and_map()` | ||
| 6. **Handoff check** — scan response for provider keywords, match against `provider_surfaces` queue | ||
| 7. **TTS** — synthesize speech via `voice::reply_speech` (ElevenLabs) | ||
| 8. **Pointing** — emit pointing targets for overlay animation | ||
| 9. **Return to Idle** | ||
|
|
||
| The pipeline supports cancellation via `CancellationToken` — the Tauri shell can cancel at any checkpoint (between STT, LLM, TTS stages). | ||
|
|
||
| Text input is also supported via `run_text_turn()` which skips STT. | ||
|
|
||
| ## Session lifecycle | ||
|
|
||
| - **One session at a time** — enforced by a process-global `Mutex<Option<CompanionSessionInner>>` | ||
| - **Consent required** — `start_session` rejects `consent=false` | ||
| - **TTL enforcement** — sessions auto-expire when `status()` detects elapsed TTL | ||
| - **Conversation history** — capped at 50 turns, oldest drained on overflow | ||
|
|
||
| ## RPC surface | ||
|
|
||
| Namespace: `companion`. All methods go through the standard controller registry. | ||
|
|
||
| | Method | Description | | ||
| |--------|-------------| | ||
| | `companion_start_session` | Start a session with explicit consent + optional TTL | | ||
| | `companion_stop_session` | End the active session | | ||
| | `companion_status` | Current state, session info, remaining TTL | | ||
| | `companion_config_get` | Read companion configuration | | ||
| | `companion_config_set` | Update companion configuration | | ||
|
|
||
| ## Event bus | ||
|
|
||
| `CompanionStateChangedEvent` is broadcast via a `tokio::sync::broadcast` channel (same pattern as `overlay::bus`). Three `DomainEvent` variants route to the `"companion"` domain: | ||
|
|
||
| - `CompanionSessionStarted { session_id }` | ||
| - `CompanionStateChanged { session_id, state, previous_state }` | ||
| - `CompanionSessionEnded { session_id, reason }` | ||
|
|
||
| ## Pointing system | ||
|
|
||
| LLM responses can embed `[POINT:x,y:label:screenN]` tags. `pointing.rs`: | ||
|
|
||
| - Parses tags via regex | ||
| - Maps screen-relative coordinates to absolute desktop coordinates using `ScreenGeometry` | ||
| - Clamps coordinates to screen bounds | ||
| - Falls back to screen 0 when the index is out of range | ||
| - Strips tags from display text | ||
|
|
||
| ## Provider-surface handoff | ||
|
|
||
| `handoff.rs` scans the clean LLM response text for provider keywords (slack, discord, telegram, etc.) and matches them against items in the `provider_surfaces` queue. When matches are found, `HandoffEvent`s are included in `TurnResult` for the Tauri shell / overlay to surface. | ||
|
|
||
| ## Platform scope | ||
|
|
||
| - **macOS**: Full support — hotkey, screen capture, pointing, TTS, overlay | ||
| - **Windows/Linux**: Partial — hotkey works (rdev), screen context stubbed, no pointing | ||
|
|
||
| Platform-specific code is gated with `#[cfg(target_os = "macos")]`. | ||
|
|
||
| ## Testing | ||
|
|
||
| | File | Coverage | | ||
| |------|----------| | ||
| | `session_tests.rs` | Session CRUD, state machine transitions, TTL, consent, conversation history | | ||
| | `pipeline_tests.rs` | Turn orchestration, cancellation, input validation, system prompt | | ||
| | `pointing_tests.rs` | Tag parsing, coordinate mapping, multi-monitor, edge cases | | ||
| | `handoff.rs` (inline) | Keyword matching, empty queue, provider coverage | | ||
| | `schemas.rs` (inline) | Controller count, schema field validation | | ||
| | `tests/json_rpc_e2e.rs` | Full RPC round-trip: start -> status -> config -> stop | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| //! Broadcast bus for companion state change events. | ||
| //! | ||
| //! Follows the same pattern as `overlay::bus`: a process-global | ||
| //! `tokio::sync::broadcast` channel so any module can subscribe. | ||
| //! The Socket.IO bridge (PR 2) will forward these to the overlay | ||
| //! as `companion:state_changed` events. | ||
|
|
||
| use once_cell::sync::Lazy; | ||
| use tokio::sync::broadcast; | ||
|
|
||
| use super::types::CompanionStateChangedEvent; | ||
|
|
||
| const LOG_PREFIX: &str = "[desktop_companion]"; | ||
|
|
||
| static STATE_BUS: Lazy<broadcast::Sender<CompanionStateChangedEvent>> = Lazy::new(|| { | ||
| let (tx, _rx) = broadcast::channel(64); | ||
| tx | ||
| }); | ||
|
|
||
| /// Subscribe to companion state change events. | ||
| pub fn subscribe_state_changed() -> broadcast::Receiver<CompanionStateChangedEvent> { | ||
| STATE_BUS.subscribe() | ||
| } | ||
|
|
||
| /// Publish a state change event. | ||
| /// | ||
| /// Fire-and-forget: if nobody is subscribed the event is dropped. | ||
| pub fn publish_state_changed(event: CompanionStateChangedEvent) -> usize { | ||
| log::debug!( | ||
| "{LOG_PREFIX} state_changed session={} {} -> {}", | ||
| event.session_id, | ||
| event.previous_state, | ||
| event.state, | ||
| ); | ||
| match STATE_BUS.send(event) { | ||
| Ok(n) => n, | ||
| Err(_) => { | ||
| log::debug!("{LOG_PREFIX} no subscribers — state change dropped"); | ||
| 0 | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
| use crate::openhuman::desktop_companion::types::CompanionState; | ||
|
|
||
| #[tokio::test] | ||
| async fn publish_is_received_by_subscriber() { | ||
| // STATE_BUS is process-global — other tests may publish events. | ||
| // We filter by session_id to avoid flakiness. | ||
| let mut rx = subscribe_state_changed(); | ||
| let delivered = publish_state_changed(CompanionStateChangedEvent { | ||
| session_id: "bus-test-unique".into(), | ||
| state: CompanionState::Listening, | ||
| previous_state: CompanionState::Idle, | ||
| message: None, | ||
| }); | ||
| assert!(delivered >= 1); | ||
| // Drain until we find our specific event (others may have been published concurrently). | ||
| loop { | ||
| let event = rx.recv().await.expect("event delivered"); | ||
| if event.session_id == "bus-test-unique" { | ||
| assert_eq!(event.state, CompanionState::Listening); | ||
| assert_eq!(event.previous_state, CompanionState::Idle); | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| #[test] | ||
| fn publish_with_no_subscribers_is_safe() { | ||
| let _ = publish_state_changed(CompanionStateChangedEvent { | ||
| session_id: "test".into(), | ||
| state: CompanionState::Idle, | ||
| previous_state: CompanionState::Error, | ||
| message: Some("recovered".into()), | ||
| }); | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.