Luna — always-on AI voice participant

## Goal

Luna joins the room as a real audio participant — appears in the participant grid, listens to the conversation, and speaks to everyone simultaneously. No `@luna` mention needed.

This is the only voice feature worth building. Partial approaches (client-side TTS, KV audio relay) are not worth implementing — they degrade UX compared to plain text replies.

---

## What's blocking this

### No server-side RTK participant API

RTK's Add Participant API (`POST .../meetings/{id}/participants`) only mints auth tokens for browser SDK clients. There is no server-side media injection path:
- No `participant_type: "server"` or bot flag
- No audio inject endpoint
- No server SDK (`@cloudflare/realtimekit-node` does not exist)

### `@cloudflare/voice` SFU cannot bridge into RTK rooms

`@cloudflare/voice` uses raw Cloudflare Calls SFU (`rtc.live.cloudflare.com/v1`). RTK is built on top of this same SFU infrastructure, but RTK never exposes the underlying SFU App ID — there is no way to inject a foreign SFU session into an RTK meeting room.

### CF private pipeline not public

PR [cloudflare/agents#785](https://github.com/cloudflare/agents/pull/785) implemented a `RealtimeKitTransport` that joins an RTK room server-side using a private `agents.realtime.cloudflare.com` pipeline. It works, but was closed May 2026 without merging (missing tests, hardcoded internals, no client example). The pipeline service is gated behind a private beta header and not publicly documented.

---

## Known workaround: Headless Chrome bot

The Recall.ai / Daily Bots pattern — launch a headless Chrome process that joins the RTK room using the standard Web SDK, intercept room audio via Web Audio API, exchange PCM with the BotSession DO via WebSocket, play TTS output back through an AudioContext.

**Why we're not doing this now:**

WebRTC requires UDP for the media plane (ICE/SRTP). Cloudflare Containers do not support inbound UDP — only HTTP/WebSocket. So the headless Chrome process must run on an external server (Fly.io, Cloud Run, etc.), communicating with the DO over the network. This adds:
- An external service to maintain
- Cross-network latency between container and DO
- Operational complexity that isn't justified while a cleaner CF-native path is on the roadmap

---

## What would unblock a clean implementation

Any one of:

- **CF opens `agents.realtime.cloudflare.com`** — the private pipeline from PR #785 becomes public. This is clearly the direction CF is heading.
- **RTK exposes underlying SFU App ID** — lets a DO use the SFU WebSocket adapter (`/adapters/websocket/new`) to push audio directly into a meeting's SFU session.
- **CF Containers get UDP support** — headless Chrome bot becomes fully CF-native, no external server needed.
- **RTK adds WHIP ingest** — any server with a real network stack can push audio without implementing RTK's proprietary signaling.

---

## Architecture (when unblocked)

```
All participants' audio
  → STT (OpenAI gpt-4o-transcribe, multilingual zh+en)
  → LLM (glm-4.7-flash, reuse BotSession history + rate limiting)
  → TTS (Azure zh-CN-XiaoxiaoNeural, ~$0.0012/turn)
  → Luna's audio track broadcast to all participants
```

Luna uses the existing `BotSession` Durable Object for conversation history, rate limiting, and LLM calls — no new AI infrastructure needed.

## Cost estimate (when built)

| Component | Cost |
|---|---|
| STT (streaming, text-triggered only) | $0 — no always-on mic monitoring |
| LLM glm-4.7-flash | ~$0.00002/turn |
| TTS Azure XiaoxiaoNeural | ~$0.0012/turn |
| **Total** | **~$0.0013/turn** |

Platform-funded at current scale. BYOK for STT if always-on listening is added later.

---

## Status

Parking until CF ships a clean server-side path. Revisit when any of the unblocking conditions above are met.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Luna — always-on AI voice participant #56

Goal

What's blocking this

No server-side RTK participant API

`@cloudflare/voice` SFU cannot bridge into RTK rooms

CF private pipeline not public

Known workaround: Headless Chrome bot

What would unblock a clean implementation

Architecture (when unblocked)

Cost estimate (when built)

Status

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Cost
STT (streaming, text-triggered only)	$0 — no always-on mic monitoring
LLM glm-4.7-flash	~$0.00002/turn
TTS Azure XiaoxiaoNeural	~$0.0012/turn
Total	~$0.0013/turn

Luna — always-on AI voice participant #56

Description

Goal

What's blocking this

No server-side RTK participant API

@cloudflare/voice SFU cannot bridge into RTK rooms

CF private pipeline not public

Known workaround: Headless Chrome bot

What would unblock a clean implementation

Architecture (when unblocked)

Cost estimate (when built)

Status

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`@cloudflare/voice` SFU cannot bridge into RTK rooms