Skip to content

oxcl/cocoparrot

Repository files navigation

CocoParrot

A context-aware voice-to-text Chrome extension with LLM-powered rewriting profiles. Speak into any text input on the web, have your words transcribed, rewritten, and injected — all through your own OpenRouter API key.

Chrome Extension TypeScript Preact WXT

What It Does

CocoParrot lets you hold a key, speak, and have your voice transcribed using Whisper via OpenRouter. Before injecting the text into the focused input field, it can optionally pass your words through an LLM — not to answer questions or act as an agent, but to rewrite: fix grammar, adjust tone, translate, match a professional register, or clean up casual speech into something polished. You review the result in a small corner popup before anything gets written.

Architecture

Core Pipeline

┌─────────────┐    ┌──────────────────┐    ┌─────────────────┐    ┌──────────────────┐
│   Audio     │    │   Transcription  │    │   LLM Rewrite   │    │   Text Injection │
│   Capture   │───▶│   (Whisper)      │───▶│   (Optional)    │───▶│   (Strategies)   │
└─────────────┘    └──────────────────┘    └─────────────────┘    └──────────────────┘
       │                    │                      │                      │
       │                    │                      │                      │
   Microphone          OpenRouter API         Profile-aware          Context-aware
   MediaRecorder       Streaming SSE          System prompts         Element targeting
   16kHz audio         Chunked upload         Page context           Multiple strategies

Event-Driven Architecture

The extension uses a custom event bus (BusService) for state management and cross-component communication:

// State phases
type Phase = 
  | 'idle'                    // No activity
  | 'pending-recording'       // Debounce period before recording starts
  | 'recording'               // Audio capture active
  | 'transcribing'            // Sending audio to Whisper API
  | 'processing'              // LLM rewriting in progress
  | 'awaiting-approval'       // User review before injection
  | 'editing-transcription'   // Editing raw transcription
  | 'editing-llm';            // Editing LLM output

Service Layer

Each service is a singleton module exporting pure functions:

  • RecorderService — Microphone capture with MediaRecorder API, handles permissions, MIME type detection, and base64 encoding
  • TranscriptionService — Chunked streaming upload to OpenRouter Whisper API with retry logic and abort controllers
  • LlmService — Streaming SSE parsing for LLM responses with provider selection and reasoning mode support
  • ProfileService — Profile matching via rule-based system (hostname, URL patterns, DOM selectors, page title)
  • InserterService — Strategy pattern for text injection with 4 configurable strategies
  • BusService — Event-driven state management with watchers, notifications, and phase transitions
  • StorageService — Chrome extension storage abstraction with @wxt-dev/storage
  • KeyboardService — Hotkey handling with configurable keys and hold detection

Profile System

Profiles are the core abstraction. Each profile contains:

interface Profile {
  id: string;
  name: string;
  description?: string;
  rules: ProfileRule[];           // When to activate
  contextQueries: ProfileContextQuery[];  // What page data to extract
  systemPrompt: string;           // LLM instructions
  inputElementCssSelector?: string; // Target input element
  overridePreferences?: DeepPartial<SettingsPreferences>; // Per-profile overrides
}

Rule Types:

  • hostname — Exact domain match
  • url-contains — Substring match
  • url-prefix — URL prefix match
  • url-regex — Regular expression match
  • contains-element — CSS selector presence
  • page-title-contains — Page title match

Context Queries:

  • css-selector — Extract text from CSS selector
  • css-selector-all — Extract from multiple elements
  • xpath — XPath expression
  • meta-tag — Meta tag content
  • current-url — Current page URL
  • page-title — Page title
  • whole-page-content — Full page content via Mozilla Readability + Turndown (HTML→Markdown)

Inserter Strategies

The extension supports 4 text injection strategies:

  1. insert-text — Direct text insertion via document.execCommand
  2. human-simulation — Character-by-character typing with configurable:
    • Base delay and jitter
    • Typo generation (QWERTY neighbor-based)
    • Backspace correction
    • Newline handling (line-break, enter, shift-enter, ignore, space)
    • React compatibility mode
  3. clipboard-paste — Clipboard API-based paste
  4. direct-value — Direct value setting for controlled inputs

Context Extraction

The profile service can extract context from web pages:

// Example: Extract job posting details
const contextQueries = [
  { type: 'css-selector', value: '.job-title', name: 'Job Title' },
  { type: 'css-selector', value: '.company-name', name: 'Company' },
  { type: 'whole-page-content', value: '', name: 'Full Posting' },
];

Uses @mozilla/readability for content extraction and turndown for HTML→Markdown conversion.

Tech Stack

Category Technology Purpose
Framework WXT Chrome Extension (Manifest V3)
UI Preact + TypeScript Lightweight React alternative
Validation Valibot Schema validation (not Zod)
Storage @wxt-dev/storage Chrome extension storage
Messaging @webext-core/messaging Cross-context communication
Content Turndown + Readability HTML→Markdown conversion
Icons lucide-preact Icon library
Package Manager Bun Fast JavaScript runtime
Build Vite + vite-imagetools Build tooling

Project Structure

src/
├── entrypoints/
│   ├── background.ts        # Service worker (badge, messaging relay)
│   ├── content.ts           # Content script (orchestrates all services)
│   ├── overlay.css          # Display overlay styles
│   ├── globals.css          # Global styles (CSS variables, dark mode)
│   ├── popup/               # Extension popup UI
│   │   ├── popup.tsx
│   │   ├── popup.css
│   │   └── pages/           # Main, Profiles, Settings, Onboarding
│   └── dashboard/           # Full-page management dashboard
│       ├── dashboard.tsx
│       ├── dashboard.css
│       └── pages/           # Profiles, Rules, History, Statistics, Settings
├── services/                # Core business logic
│   ├── recorder-service.ts  # Microphone recording
│   ├── transcription-service.ts  # OpenRouter Whisper calls
│   ├── llm-service.ts       # OpenRouter LLM calls
│   ├── inserter-service.ts  # Text injection orchestrator
│   ├── profile-service.ts   # Profile CRUD and matching
│   ├── keyboard-service.ts  # Hotkey handling
│   ├── bus-service.ts       # Event bus + state management
│   ├── ui-service.ts        # UI state management
│   ├── storage-service.ts   # Storage abstraction
│   ├── auth-service.ts      # Authentication flow
│   └── inserter/            # Inserter strategies
│       ├── insert-text.ts
│       ├── human-simulation.ts
│       ├── clipboard-paste.ts
│       └── direct-value.ts
├── components/              # Feature components
│   ├── Display.tsx          # Transcription/LLM output overlay
│   ├── Overlay.tsx          # Recording visualization
│   ├── Sidebar.tsx          # Dashboard navigation
│   ├── RuleList.tsx         # Profile rule editor
│   ├── ContextQueryList.tsx # Context query editor
│   └── ...                  # 20+ components
├── ui/                      # Reusable UI primitives
│   ├── Button.tsx
│   ├── Input.tsx
│   ├── Select.tsx
│   ├── TextArea.tsx
│   └── ...                  # 11 UI components
├── schemas/                 # Valibot schemas
│   ├── auth.ts
│   ├── error.ts
│   ├── llm.ts
│   └── transcription.ts
├── hooks/                   # Custom Preact hooks
│   ├── useBus.ts            # Event bus integration
│   ├── useCurrentProfile.ts # Profile matching
│   ├── useStorage.ts        # Storage access
│   └── useExitAnimation.ts  # Page transitions
├── contexts/                # Preact contexts
│   └── router.ts            # Client-side routing
├── shared/                  # Shared types
│   └── messages.ts          # Cross-context messaging protocol
├── utils/                   # Utility functions
│   ├── base64-encoder.ts    # Audio encoding
│   ├── logger.ts            # Dev-only logging
│   ├── profile-utils.ts     # Profile helpers
│   └── shallow-equal.ts     # State comparison
├── storage.ts               # Storage types and defaults
├── constants.ts             # App constants, icon maps
├── errors.ts                # Custom error classes
└── type.ts                  # Shared utility types

Key Features

Voice Input

  • Hold-to-record with configurable hotkey (default: backtick)
  • Real-time recording visualization
  • Automatic MIME type detection (OGG, WebM, MP4)
  • Chunked audio streaming for low latency

Transcription

  • Whisper integration via OpenRouter
  • Streaming upload with progress feedback
  • Automatic retry on failure
  • Language selection per profile

LLM Rewriting

  • Profile-aware system prompts
  • Page context injection (CSS selectors, XPath, whole page)
  • Streaming SSE responses
  • Reasoning mode support
  • Provider selection

Text Injection

  • 4 injection strategies
  • Smart element targeting (profile selector → active element → document.activeElement → body search)
  • React compatibility mode
  • Human-like typing simulation with configurable typos

Profile Management

  • Rule-based activation (hostname, URL patterns, DOM selectors)
  • Context extraction from page elements
  • Per-profile model preferences
  • Import/export profiles as JSON
  • Onboarding flow with API key setup

Dashboard

  • Full-page management interface
  • Profile editor with visual rule builder
  • History log (planned)
  • Statistics tracking (planned)
  • Provider management

Development

Prerequisites

  • Bun package manager
  • Chrome browser
  • OpenRouter API key

Setup

# Install dependencies
bun install

# Start development server
bun dev

Testing

  1. Load the extension from .output/chrome-mv3-dev
  2. Navigate to http://localhost:8000/?title=<feature-name>
  3. Use browser DevTools to inspect:
    • Console logs (filtered by source)
    • Storage state
    • Network requests

Code Conventions

  • TypeScript strict mode throughout
  • Preact (not React) — imports from preact and preact/hooks
  • Valibot for schema validation (not Zod)
  • CSS files per component (not CSS-in-JS)
  • Singleton services with static instance getter
  • Event-driven architecture via BusService
  • Path alias @/ maps to src/
  • Custom error classes in src/errors.ts

Key Patterns

Event Bus:

// Emit events
await bus.emit('transcription:done', { text });

// Listen to events
bus.on('transcription:done', async ({ text }) => {
  // Handle event
});

// Watch state changes
bus.watch((state) => {
  console.log('State:', state.phase);
});

Service Singleton:

class MyService {
  static get instance(): MyService {
    if (!this._instance) this._instance = new MyService();
    return this._instance;
  }
  private static _instance: MyService;
}

Profile Matching:

// Find matching profile for current page
const { profile, rule } = ProfileService.instance.findMatchingProfile();

// Extract context from page
const contexts = await ProfileService.instance.grabContextFromWebpage();

Design Decisions

Why Preact over React?

  • 3KB gzipped vs 40KB for React
  • Perfect for Chrome extensions where bundle size matters
  • Same API as React, easy migration

Why Valibot over Zod?

  • Tree-shakable (only includes what you use)
  • Better TypeScript inference
  • Smaller bundle size

Why Event Bus over Redux/Zustand?

  • No external dependencies
  • Built-in async support
  • Natural fit for Chrome extension's message passing
  • Phase-based state machine pattern

Why Singleton Services?

  • Chrome extension content scripts run in page context
  • Services need to maintain state across events
  • Simple dependency management
  • Easy to test and mock

Browser Support

  • Chrome 88+ (Manifest V3)
  • Firefox (experimental, via wxt -b firefox)

Privacy

  • No data stored server-side — all processing happens via your OpenRouter API key
  • Microphone permission — only used during recording, released immediately after
  • Local storage only — profiles, settings, and history stored in Chrome extension storage
  • No analytics — no tracking, no telemetry

License

Proprietary — see LICENSE

Author

oxclGitHub

About

AI Dictation Browser Extension designed for Busy People

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors