Skip to content

PRD: AI Radio — Host-driven Channel experience with chat timeline #39

@zeevenn

Description

@zeevenn

Problem Statement

Users who want to listen to music on their phone face a constant management burden: building playlists, searching for tracks, deciding what comes next. The experience is active and effortful when it should be passive and immersive. There is no product that combines a local music library with the feel of a live FM radio station — a DJ-hosted experience that adapts to your mood on request.

Solution

Flow becomes an AI radio app. The user selects a Channel (e.g. 深夜电台, 晨间咖啡, 专注工作). The AI Host takes over: it queries the local Library for a CandidateSet matching the Channel Style, sequences Tracks, writes and synthesises spoken Interludes via TTS, and assembles a Programme. The user only ever sees Now Playing. When the user wants to intervene — request a song, shift the mood, or give a free instruction — they type or speak into the chat input at the bottom of the screen. The Host acknowledges, generates a new Programme, and life goes on.

The main screen is a unified chat timeline that merges playback history (Interlude scripts, track-played cards) with Intervention dialogue — the same scroll, the same visual language.

Architecture Decision: Dual Playback Modes

The Player supports two playback modes that coexist:

  1. Radio Mode (Programme-driven) — the new AI radio experience. Player consumes a Programme: Segment[] fed by the Programme Manager. No user queue manipulation. Used when a Channel is active.
  2. Classic Mode (Queue-driven) — the traditional playback experience. Player consumes a queue: Track[] with user-facing controls (add to queue, insert next, shuffle, etc.). Used when the user manually selects a track from the Library.

Why both: Offline scenarios and user agency. When offline, the AI Host is unavailable — users need to be able to manually browse and play specific tracks. Even online, some users will want to just play a specific song without entering radio mode.

Player layer: The Player Controller operates on a unified internal concept — an ordered list of audio items. The distinction between Programme and Queue is handled at the store level above the controller. The existing playerStore queue logic is preserved as-is; Radio mode adds a parallel radioStore (or extends playerStore with a mode discriminator).

Mode switching:

  • Entering Radio mode: loadChannel() → switches to Programme-driven playback
  • Exiting Radio mode: user manually plays a track from Library → switches to Queue-driven playback
  • Only one mode is active at a time

User Stories

Onboarding & Library

  1. As a first-time user, I want the app to scan my local music files automatically on first launch, so that I don't have to configure anything before listening.
  2. As a first-time user, I want to see a clear empty-state prompt when no local tracks are found, so that I know how to add music before using the radio.
  3. As a user, I want subsequent scans to be triggerable from Settings, so that newly added files are picked up without reinstalling the app.

Channel Selection

  1. As a user, I want to enter the radio experience immediately on app launch with a default Channel, so that the path from opening the app to listening is zero-tap.
  2. As a user, I want each Channel to display a name and a short descriptor of its style, so that I can make an informed choice when switching.
  3. As a user, I want an optional Channel switcher accessible from the Radio Screen (e.g. a button → BottomSheet), so that I can change mood without leaving the main experience.

Programme & Playback

  1. As a user, I want music to start playing within a few seconds of entering the radio, so that the experience feels instant.
  2. As a user, I want the Host to automatically queue the next Programme before the current one ends, so that playback never stops between Programmes.
  3. As a user, I want playback to continue seamlessly between Tracks and Interludes without gaps, so that the experience feels like a real radio station.
  4. As a user, I want the Programme to be opaque — I do not see a tracklist of what is coming — so that the radio feels live and unpredictable.

Now Playing

  1. As a user, I want to see the current Track's title and artist clearly at the top of the screen while it is playing, so that I always know what I am listening to.
  2. As a user, I want an audio waveform or visualisation in the Now Playing area, so that the playback state is immediately visible at a glance.
  3. As a user, I want standard system controls (lock screen, notification bar, headphone buttons) to control playback, so that I can pause and skip without unlocking my phone.
  4. As a user, I want audio to duck automatically when another app requests focus (navigation, phone call), so that I don't have to manually manage volume.

Chat Timeline

  1. As a user, I want to see a scrollable timeline of what has played and what was said, so that I can recall the Host's words and my interventions.
  2. As a user, I want Interlude text to appear in the chat at the moment the Interlude audio begins, so that I can read along with the Host.
  3. As a user, I want a track-played record to appear in the timeline when each Track begins, so that the timeline is a complete record of the session.
  4. As a user, I want my Intervention messages and the Host's responses to appear in the same timeline, so that the full context of a listening session is in one place.

Interventions

  1. As a user, I want to type a free instruction into a text input at the bottom of the screen, so that I can tell the Host what I want without voice.
  2. As a user, I want to tap a microphone button and speak a free instruction, so that I can interact with the Host hands-free.
  3. As a user, I want to name a specific track and have the Host include it in the next Programme at a natural position, so that I can request songs without breaking the flow.
  4. As a user, I want to express a mood shift in natural language (e.g. "更轻松一点") and have the Host adjust the Channel Style for the new Programme, so that the radio adapts to how I feel.
  5. As a user, I want the Host to acknowledge my Intervention in the chat before generating the new Programme, so that I know my request was understood.
  6. As a user, I want the current Track to finish playing before the new Programme starts (unless I request otherwise), so that transitions feel natural.

Classic Playback (preserved)

  1. As a user, I want to browse my Library and tap a track to play it immediately in classic mode, so that I retain full manual control when I want it.
  2. As a user, I want queue management (add to queue, insert next, shuffle, reorder) to work as before in classic mode.
  3. As a user, I want playing a track from the Library to exit radio mode and enter classic mode, so that mode switching is implicit and natural.

Offline Fallback

  1. As a user with no network connection, I want the radio to fall back to shuffling local tracks from the Channel's style rules without an AI Host, so that I can still listen passively offline.
  2. As a user going offline, I want to hear a pre-generated TTS announcement telling me the Host is unavailable, so that I understand why the experience has changed.
  3. As a user offline, I want the option to switch to classic mode and manually pick tracks, so that I'm not stuck with random shuffle.

Implementation Decisions

Modules to build or modify

1. Channel Registry (new — Radio context)
A static, app-bundled list of Channels, each with a name, descriptor, and Channel Style (genre hints, energy level, tempo range, mood tags). No persistence required for v1. Interface: getChannels() → Channel[], getChannelById(id) → Channel.

2. Host Service (new — Radio context)
The core AI orchestrator. Given a Channel Style, it:

  • Issues a CandidateSet Query to the Library
  • Calls the Claude API to select and sequence Tracks and write Interlude Scripts
  • Calls a TTS service to synthesise each Interlude Script into an audio URL
  • Returns a completed Programme (ordered array of Segments)

Runs at Programme-generation time, not during playback. Interface: generateProgramme(channelStyle, intervention?) → Promise<Programme>.

3. Programme Manager (new — Radio context)
Manages the lifecycle of Programmes: holds the active Programme, monitors Segment-end events from the Player, triggers the Host to pre-generate the next Programme when the active one is nearly exhausted, and swaps in a new Programme on Intervention. Interface: loadChannel(channel), handleIntervention(intervention), onSegmentEnd(segmentId).

4. Player (modify — Player context)
The Player supports two modes:

  • Radio mode: operates on a Programme: Segment[] where a Segment is either a Track or an Interlude. The Player does not expose queue-manipulation APIs in this mode. Lookahead handles both file URIs (TrackSegment) and remote TTS URLs (InterlSegment).
  • Classic mode: operates on a queue: Track[] with full user control (add, insert next, remove, shuffle, play mode). This is the existing behaviour, preserved as-is.

A playbackMode: 'radio' | 'classic' discriminator determines which mode is active. Only one mode plays at a time. Switching modes stops the current playback and starts the new mode's content.

5. CandidateSet Query (new — Library context)
An API on the Library that accepts Channel Style parameters and returns a filtered array of Tracks. The Library does not rank or sequence. Interface: queryCandidateSet(style: ChannelStyle) → Track[].

6. Chat Timeline Store (new — Radio context)
An in-memory (non-persisted) ordered list of chat messages for the current session. Message roles: user and host. Host messages carry a kind field (acknowledgement, interlude, announcement). The timeline also records track-played events for session history. Interface: appendMessage(message), messages: Message[].

7. Offline Host (new — Radio context)
A fallback implementation of the Host Service interface. On activation, plays a pre-bundled TTS audio file ("当前无网络连接,为你随机播放本地音乐。"). Subsequent Programmes are assembled locally: random shuffle from CandidateSet, no Interludes. Activated automatically when network is unavailable. Users can also exit to classic mode for manual control.

8. Radio Screen (new — mobile app)
The default home screen. The user enters an immersive radio experience immediately on app launch (default Channel auto-selected). Three vertical zones:

  • Top: Now Playing bar with audio waveform visualisation, Track title, artist
  • Middle: Scrollable chat timeline (Chat Timeline Store rendered as a message list)
  • Bottom: Input bar with text field and microphone button for Interventions

A Channel switcher button is available on-screen (opens a BottomSheet or navigates to a selection view) — this is optional and low priority for v1.

Key data shapes (from conversation decisions)

type SegmentKind = 'track' | 'interlude'

interface TrackSegment {
  kind: 'track'
  track: Track
}

interface InterlSegment {
  kind: 'interlude'
  script: string   // text pushed to chat at playback start
  audioUrl: string // pre-synthesised TTS URL
}

type Segment = TrackSegment | InterlSegment

type Programme = Segment[]

// Chat messages use role-based model
interface UserMessage {
  role: 'user'
  text: string
  timestamp: number
}

interface HostMessage {
  role: 'host'
  text: string
  kind: 'acknowledgement' | 'interlude' | 'announcement'
  timestamp: number
}

type ChatMessage = UserMessage | HostMessage

Schema changes

The existing playlists and playlist_tracks tables are not removed — they continue to serve classic mode. The tracks table is used as-is; no new columns required for v1.

AI / TTS

  • Host calls Claude API (claude-sonnet-4-6) at Programme-generation time
  • TTS provider TBD; must return an audio URL that the player can stream
  • Offline fallback uses a pre-bundled audio file, not live TTS

Intervention flow

  1. User submits text or voice input → parsed as one of: Track Request, Mood Change, Free Instruction
  2. user-turn message appended to Chat Timeline immediately
  3. Programme Manager calls Host.generateProgramme(channelStyle, intervention)
  4. When Host responds, host-turn message appended; new Programme delivered to Player
  5. Player finishes current Track, then switches to new Programme

Testing Decisions

A good test exercises the module through its public interface only — not its internal implementation. Tests should be written against the contract (inputs → outputs / state transitions), not against how the contract is fulfilled.

Modules to test:

  • CandidateSet Query: pure function — given a set of tracks and a Channel Style, returns the correct filtered subset. No mocks needed; seed data inline.
  • Chat Timeline Store: state machine — given a sequence of append calls, the messages array contains the right items in the right order. Pure Zustand store, no RN dependencies.
  • Programme Manager: state transitions — given Segment-end events and an Intervention, verifies that the Host is called at the right time and that the Player receives the right Programme. Host and Player are injected interfaces, easily stubbed.
  • Player (Segment handling): given a Programme with mixed Track and Interlude Segments, verifies that Lookahead pre-loads the next Segment and that Segment-end is reported correctly. Existing playerController test pattern (if any) as prior art.

The Host Service itself (Claude API + TTS) is not unit-tested — its behaviour is non-deterministic. Integration-level smoke tests only.

Out of Scope

  • User-created Channels
  • Per-Channel custom TTS voice
  • User-visible tracklist / upcoming Programme display
  • Ability to disable Interludes
  • Remote Track streaming (source: 'remote')
  • Lyrics display on the Radio Screen (existing LyricsView is for the old player screen)
  • History persistence across app restarts (future — will be addressed when schema is designed)
  • Skip / previous controls on the Radio Screen (first version: Host-driven, user does not skip)

Further Notes

  • Dual mode: "Radio" and "Classic" coexist. The existing queue/playlist logic is preserved. Radio mode is additive, not a replacement.
  • "Queue" remains the term for classic mode. "Programme" is the term for radio mode. New radio code uses Programme; existing queue code is untouched.
  • "Radio" is used colloquially; the domain term for what the user tunes into is Channel.
  • The chat timeline is session-only (in-memory) for v1. Persisting history across sessions is a future concern pending schema design.
  • The waveform visualisation library is TBD; expo-av or a dedicated RN waveform package. This is a UI-only decision and does not affect the domain model.
  • The app launches directly into Radio Screen with a default Channel — no Channel Selection Screen as a prerequisite. Channel switching is an optional action available from within the Radio Screen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ready-for-agentReady for an AI agent to implement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions