Skip to content

chore: tts integration#73

Merged
KartikLabhshetwar merged 17 commits intomainfrom
feat/annotation
Feb 26, 2026
Merged

chore: tts integration#73
KartikLabhshetwar merged 17 commits intomainfrom
feat/annotation

Conversation

@KartikLabhshetwar
Copy link
Collaborator

@KartikLabhshetwar KartikLabhshetwar commented Feb 21, 2026


Open with Devin

Summary by CodeRabbit

  • New Features

    • Full in-app text-to-speech: Listen button on desktop and mobile with loading states, real-time word highlighting, playback controls (play/pause, speed, skip), and L shortcut.
    • Voice selection and new audio UI (wave animation, scrub bar, transcript viewer with click-to-seek).
    • Free tier: 3 TTS uses/day; premium unlimited.
  • Documentation

    • Added TTS usage and architecture guide.

@vercel
Copy link

vercel bot commented Feb 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
smry Ready Ready Preview, Comment Feb 26, 2026 4:46pm

Request Review

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a full TTS feature: client hook and UI, Next.js proxy routes, backend ElevenLabs integration with per-chunk caching and concurrency limits, transcript viewer and scrubber UI, concurrency limiter, text/chunk utilities, env/config updates, docs, and icon/UX changes.

Changes

Cohort / File(s) Summary
Next.js proxy routes
app/api/tts/route.ts, app/api/tts/voices/route.ts
New POST and GET proxy endpoints that forward requests/headers to backend, apply timeouts, return JSON, and cache voices for 1 hour.
Server TTS API & wiring
server/routes/tts.ts, server/index.ts, server/env.ts
New backend TTS routes (synthesis, voices) with chunking, per-chunk caching, usage enforcement, concurrency integration; limiter configured on startup; /health augmented; new optional env vars.
ElevenLabs TTS & chunking libs
lib/elevenlabs-tts.ts, lib/tts-chunk.ts, lib/tts-text.ts
ElevenLabs client + chunk synth, alignment→word boundaries, text cleaning, deterministic chunk hashing, and DOM-based text extraction/word-position mapping.
Concurrency limiter
lib/tts-concurrency.ts
Global and per-user TTS slot limiter with FIFO queue, timeouts, abort handling, metrics, and configuration API.
Client TTS hook & highlight
lib/hooks/use-tts.ts, components/hooks/use-tts-highlight.ts
useTTS hook with IndexedDB cache, daily usage tracking, fetch flow, abort/cleanup, and word-level DOM highlighting hook with MutationObserver and click-to-seek.
Transcript tooling & UI
components/hooks/use-transcript-viewer.ts, components/ui/transcript-viewer.tsx, components/ui/scrub-bar.tsx
New transcript composition hook and a context-driven transcript viewer with audio, play/pause, per-word rendering, and a pointer-based scrub bar.
Proxy-content & toolbar/mobile wiring
components/features/proxy-content.tsx, components/features/floating-toolbar.tsx, components/features/mobile-bottom-bar.tsx
Integrated TTS into main article renderer: tts state, L-key toggle, transcript container, added props (onTTSToggle, isTTSActive, isTTSLoading) and corresponding UI buttons.
Icons & UI primitives
components/ui/icons.tsx, components/ui/progress.tsx
Added audio/TTS icons (Play, Pause, VolumeHigh, etc.); simplified Progress to single forwardRef Radix-based component.
Article & styling
components/article/content.tsx, app/globals.css
Added stable data-article-content marker and a TTS wave animation CSS class.
Docs & deps
docs/TTS.md, package.json
Comprehensive TTS design doc added; dependencies for Radix progress and ElevenLabs client added.
Telemetry helper
server/routes/tts.ts (export) , lib/tts-concurrency.ts (stats)
Exports for cache and concurrency stats (getTTSCacheStats, getTTSSlotStats) added for telemetry.

Sequence Diagrams

sequenceDiagram
    participant Client as Browser Client
    participant NextJS as Next.js Proxy
    participant Backend as Backend API
    participant ElevenLabs as ElevenLabs API
    participant Cache as Server Chunk Cache

    Client->>NextJS: POST /api/tts {text, voice, url}
    NextJS->>Backend: Forward request + headers
    Backend->>Backend: Split text into chunks
    loop per chunk
        Backend->>Cache: Check per-chunk cache
        alt cache hit
            Cache-->>Backend: cached audio + alignment
        else cache miss
            Backend->>Backend: acquireTTSSlot(user)
            Backend->>ElevenLabs: synthesize chunk
            ElevenLabs-->>Backend: audio + character timestamps
            Backend->>Cache: store chunk result
            Backend->>Backend: releaseTTSSlot(user)
        end
    end
    Backend->>Backend: merge alignments, concat audio
    Backend-->>NextJS: JSON {audioBase64, alignment, durationMs, usage headers}
    NextJS-->>Client: JSON response
Loading
sequenceDiagram
    participant User as User
    participant UI as Toolbar / Bottom Bar
    participant Hook as useTTS Hook
    participant IndexedDB as IndexedDB Cache
    participant Proxy as Next.js /api/tts
    participant Player as Audio Player
    participant Highlight as useTTSHighlight

    User->>UI: Click "Listen"
    UI->>Hook: load()
    Hook->>IndexedDB: lookup cache (7d TTL)
    alt cache hit
        IndexedDB-->>Hook: audio blob + alignment
    else cache miss
        Hook->>Hook: check local daily usage
        Hook->>Proxy: POST {text, voice}
        Proxy-->>Hook: audioBase64 + alignment
        Hook->>Hook: decode → blob, createObjectURL
        Hook->>IndexedDB: store cached entry (async)
    end
    Hook-->>UI: isReady, isTTSActive
    UI->>Player: play(blob URL)
    Player->>Hook: timeupdate
    Hook->>Highlight: update currentWordIndex
    Highlight->>UI: highlight current word in DOM
    Player->>UI: ended
    Hook-->>UI: isTTSActive=false
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • mrmps

Poem

🐰
I nibble words into neat little chunks,
Stitch sounds and pauses — no need for spunks.
I hop through prose, highlight every line,
Now articles sing — oh what a time! 🎧✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'chore: tts integration' is vague and generic. While it indicates TTS is involved, it doesn't convey meaningful information about what the integration accomplishes or its primary impact. Consider using a more descriptive title such as 'feat: add TTS feature with voice controls, usage limits, and wave animation' (matching the commit message) to better reflect the substantial feature addition and its key components.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/annotation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@railway-app
Copy link

railway-app bot commented Feb 21, 2026

🚅 Deployed to the SMRY-pr-73 environment in smry

Service Status Web Updated (UTC)
SMRY ✅ Success (View Logs) Web Feb 26, 2026 at 4:46 pm
smry-api ✅ Success (View Logs) Feb 26, 2026 at 4:46 pm

@railway-app railway-app bot temporarily deployed to smry / SMRY-pr-73 February 21, 2026 11:01 Destroyed
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

This PR adds comprehensive text-to-speech functionality using Inworld AI with multi-level caching, usage tracking, and concurrency management. The implementation includes client-side audio player with transcript viewer, voice selection, and real-time word highlighting.

Key Changes:

  • Inworld TTS integration with MP3 audio generation and character-level alignment
  • Multi-tier caching: client IndexedDB, server LRU (article + chunk level), Redis persistence
  • Usage limits: 3 TTS plays/day for free users, unlimited for premium
  • Concurrency control: max 15 global slots, max 2 per user, with queuing
  • Client UI: audio player with scrub bar, playback controls, transcript viewer, voice picker

Critical Issues Found:

  1. Usage tracking vulnerabilityarticleUrl parameter is user-controlled and used for deduplication, allowing users to bypass daily limits by reusing the same URL with different article text
  2. Redis race conditions — Multiple sadd/incr + expire operations are not atomic, can leave keys without TTL
  3. Deduplication cascade failures — When one client aborts a pending generation, all other clients waiting on the same generation fail
  4. Missing deduplication in memory fallback — When Redis is down, every request increments the counter without checking for duplicates
  5. Optimistic usage headers — Server sends X-TTS-Usage-Count: count+1 before the increment completes (fire-and-forget), creating potential client/server mismatch

Confidence Score: 2/5

  • This PR has critical usage tracking vulnerabilities and race conditions that could lead to quota bypass and data inconsistency
  • The TTS implementation is well-architected with comprehensive caching and concurrency control, but has several critical issues: (1) usage deduplication can be bypassed via user-controlled articleUrl, (2) Redis operations have race conditions that could leave keys without expiration, (3) cascading failures when requests are deduplicated, (4) memory fallback lacks deduplication logic, and (5) optimistic usage headers before actual increments complete. These issues affect quota enforcement and system reliability.
  • Pay close attention to server/routes/tts.ts for usage tracking and deduplication logic

Important Files Changed

Filename Overview
server/routes/tts.ts TTS route with usage tracking vulnerabilities, race conditions in Redis operations, and potential request deduplication issues
lib/tts-provider.ts Inworld TTS provider with MP3 processing, alignment conversion, and Xing header generation
lib/hooks/use-tts.ts Client TTS hook with multi-level caching (memory, IndexedDB, server) and usage tracking
lib/tts-concurrency.ts Concurrency limiter with global and per-user slot management, queue with timeout, and abort signal support
lib/tts-chunk.ts Text cleaning and chunking utilities with SHA-256 cache key generation
lib/tts-redis-cache.ts Redis cache for TTS chunks and articles with compression, batch operations, and silent error handling

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Client TTS Request] --> B{L0: Memory Cache?}
    B -->|Hit| Z[Return Audio]
    B -->|Miss| C{L1: IndexedDB Cache?}
    C -->|Hit| Z
    C -->|Miss| D{Check Usage Limit}
    D -->|Exceeded| E[Return 429 Error]
    D -->|Allowed| F[POST /api/tts]
    F --> G{Server L2: Article Cache?}
    G -->|Hit| H[Increment Usage]
    G -->|Miss| I{L3: Redis Article?}
    I -->|Hit| H
    I -->|Miss| J[Split into Chunks]
    J --> K{L4: Chunk LRU Cache?}
    K -->|Partial Hit| L{L5: Redis Chunk?}
    K -->|Full Hit| M[Merge Chunks]
    L -->|Hits| M
    L -->|Misses| N{Acquire Slot}
    N -->|Timeout| O[Return 503]
    N -->|Success| P[Generate via Inworld API]
    P --> Q[Cache Chunks]
    Q --> M
    M --> R[Generate Xing Header]
    R --> S[Cache Article]
    S --> H
    H --> T{Dedup Check}
    T -->|Already Counted| Z
    T -->|New| U[Increment Counter]
    U --> Z
Loading

Last reviewed commit: cd0b9a9

greptile-apps[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (12)
server/env.ts (1)

54-56: TTS concurrency vars should use .default() instead of .optional() for consistency.

MAX_CONCURRENT_ARTICLE_FETCHES and ARTICLE_FETCH_SLOT_TIMEOUT_MS use .coerce.number().default(n), which guarantees a typed number. The new TTS equivalents use .optional(), yielding number | undefined — requiring all downstream consumers to handle undefined explicitly. The docs already document concrete defaults (20, 2, 15000).

♻️ Proposed fix
-    MAX_CONCURRENT_TTS: z.coerce.number().optional(),
-    MAX_TTS_PER_USER: z.coerce.number().optional(),
-    TTS_SLOT_TIMEOUT_MS: z.coerce.number().optional(),
+    MAX_CONCURRENT_TTS: z.coerce.number().default(20),
+    MAX_TTS_PER_USER: z.coerce.number().default(2),
+    TTS_SLOT_TIMEOUT_MS: z.coerce.number().default(15000),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/env.ts` around lines 54 - 56, The TTS env vars MAX_CONCURRENT_TTS,
MAX_TTS_PER_USER, and TTS_SLOT_TIMEOUT_MS are defined with
z.coerce.number().optional(), producing number|undefined; change each to
z.coerce.number().default(...) to match the other env vars and guarantee a typed
number: set MAX_CONCURRENT_TTS.default(20), MAX_TTS_PER_USER.default(2), and
TTS_SLOT_TIMEOUT_MS.default(15000) so downstream code no longer needs to handle
undefined.
docs/TTS.md (1)

238-241: Hardcoded trusted-client token is a fragile, ToS-violating dependency.

Using Microsoft Edge's internal trusted-client token is reverse-engineering a private API. At the documented scale (30K DAU, 100+ concurrent), this risks:

  • IP bans or rate limiting from Microsoft
  • Silent breakage if the token is rotated (no official notice)
  • ToS violation for commercial use

If the implementation actually uses Azure Speech Service (as indicated by AZURE_SPEECH_KEY/AZURE_SPEECH_REGION), this concern may already be resolved — but the docs need to be corrected to remove the misleading hardcoded-token description.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/TTS.md` around lines 238 - 241, Update the TTS docs to remove the claim
about a "hardcoded trusted-client token" and the WebSocket
`wss://speech.platform.bing.com/...`/Edge token workflow; instead document the
supported, supported-auth method(s) actually used by the codebase (e.g., Azure
Speech Service) and reference the environment variables AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION (or other auth config symbols present) as the correct setup,
and add a short note warning not to rely on reverse-engineered tokens and that
those are not supported or recommended for production.
components/features/proxy-content.tsx (1)

160-168: [tts] as useCallback dependency likely invalidates every render.

If useTTS returns a new object on each render (standard for hooks returning { isPlaying, play, stop, ... }), handleTTSToggle will be recreated every render. This cascades to the keyboard effect (line 715) re-registering its listener on every render.

Consider destructuring the specific stable values you need, or extracting a derived isTTSActive flag:

Proposed approach
+ const isTTSActive = tts.isPlaying || tts.isPaused || tts.isLoading;
+
  const handleTTSToggle = React.useCallback(() => {
-   if (tts.isPlaying || tts.isPaused || tts.isLoading) {
+   if (isTTSActive) {
      tts.stop();
      setTTSOpen(false);
    } else {
      setTTSOpen(true);
      tts.play();
    }
- }, [tts]);
+ }, [isTTSActive, tts.stop, tts.play]);

This also lets you reuse isTTSActive at lines 904 and 1210 instead of repeating the expression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/features/proxy-content.tsx` around lines 160 - 168,
handleTTSToggle is recreated every render because the entire tts object from
useTTS is in the dependency array; destructure the specific stable bits you need
(e.g., const { isPlaying, isPaused, isLoading, play, stop } = tts or directly
from useTTS) and replace [tts] with the concrete dependencies (play, stop,
isTTSActive), and introduce a derived const isTTSActive = isPlaying || isPaused
|| isLoading to use inside handleTTSToggle and elsewhere (lines referenced:
handleTTSToggle, keyboard effect, and usages around lines ~904 and ~1210) so the
callback and keyboard listener only re-register when actual TTS state/functions
change.
server/index.ts (1)

32-36: Clean integration of TTS routes and concurrency configuration.

Follows the established pattern from configureFetchLimiter. One minor inconsistency: TTS stats (line 93) are only included in the healthy /health response, not the unhealthy branch (lines 71–82). Consider adding tts: getTTSSlotStats() there too — TTS queue depth during a memory crisis would be useful for diagnostics.

Also applies to: 93-93, 105-105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/index.ts` around lines 32 - 36, The unhealthy /health response omits
TTS stats; update the health-check handler that builds the unhealthy response to
include tts: getTTSSlotStats() just like the healthy branch does. Locate the
configureTTSLimiter setup and the getTTSSlotStats() accessor, then add
getTTSSlotStats() into the unhealthy response object so both success and failure
branches return TTS slot metrics for diagnostics.
app/api/tts/route.ts (1)

36-44: Client disconnect is not propagated to the upstream fetch.

The AbortSignal.timeout(120_000) handles staleness, but if the Next.js client disconnects early, the upstream Elysia request continues until the timeout expires. Consider combining signals:

Proposed enhancement
+  // Combine client abort + 120s timeout
+  const controller = new AbortController();
+  const timeoutId = setTimeout(() => controller.abort(new DOMException("TimeoutError", "TimeoutError")), 120_000);
+  req.signal.addEventListener("abort", () => controller.abort(), { once: true });
+
   let response: Response;
   try {
     response = await fetch(`${API_URL}/api/tts`, {
       method: "POST",
       headers,
       body,
-      signal: AbortSignal.timeout(120_000),
+      signal: controller.signal,
     });
+    clearTimeout(timeoutId);
   } catch (err) {

Alternatively, use AbortSignal.any([req.signal, AbortSignal.timeout(120_000)]) if your Node.js version supports it (18.17+).

Based on learnings: "Use Next.js route handlers to proxy streaming requests to Elysia API to avoid SSE buffering" — this proxy correctly fulfills that pattern; connecting the abort signals would complete the lifecycle management.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/api/tts/route.ts` around lines 36 - 44, The fetch to `${API_URL}/api/tts`
uses only AbortSignal.timeout(120_000) so client disconnects aren't propagated;
update the fetch call in the route handler to combine the incoming request
signal (req.signal) with the timeout (e.g., use AbortSignal.any([req.signal,
AbortSignal.timeout(120_000)]) when available) or create an AbortController,
listen for req.signal.abort to call controller.abort(), and pass
controller.signal to fetch so upstream Elysia requests are cancelled immediately
on client disconnect.
server/routes/tts.ts (1)

155-173: Minor TOCTOU in Redis usage check — get then incr is not atomic.

Between redis.get and redis.incr, a concurrent request for the same user could also pass the check, allowing usage to exceed FREE_TTS_LIMIT by 1. With a limit of 3 this is negligible. If you want exactness, use a Lua script or redis.incr first, then check the returned value and decr if over limit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/routes/tts.ts` around lines 155 - 173, The Redis check in the tts
usage flow (monthKey/getMonthKey, redisKey, FREE_TTS_LIMIT) is vulnerable to a
TOCTOU race because it calls redis.get(...) then redis.incr(...); change the
logic to perform an atomic increment-first check: call redis.incr(redisKey) (or
use a small Lua script that INCR and set TTL atomically), inspect the returned
value and if it exceeds FREE_TTS_LIMIT immediately DECR the key (or return
disallowed) and only set the expire when the counter transitions from 1 to 2 (or
via the Lua script set TTL on first increment); this ensures correct limits
without a separate get/incr race.
components/features/tts-highlight.tsx (1)

94-144: currentWord is in the dependency array but unused in the effect body.

currentWord (line 144 dep array) is never read inside the effect — only currentWordIndex is used. If the word text changes but the index stays the same, this triggers a no-op effect run (skipped at line 103). This is harmless but misleading; consider removing it from the deps to better document intent.

Proposed change
-  }, [currentWord, currentWordIndex, isActive, ensureWordIndex]);
+  }, [currentWordIndex, isActive, ensureWordIndex]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/features/tts-highlight.tsx` around lines 94 - 144, The useEffect
currently lists currentWord in its dependency array but never reads it inside
the effect body; remove currentWord from the dependency array of the useEffect
that manages highlights so the effect only depends on currentWordIndex,
isActive, and ensureWordIndex (update the dependency array used by the useEffect
function that references lastHighlightIdx.current, ensureWordIndex(),
wordIndexRef.current, and CSS.highlights). If you intended the effect to re-run
when the word text changes, instead explicitly reference currentWord inside the
effect; otherwise drop it to reflect actual dependencies.
lib/azure-tts-ws.ts (1)

197-199: Skipping Sec-WebSocket-Accept verification weakens handshake integrity.

The comment explains the rationale, but skipping this check means the client cannot detect a non-WebSocket response body that happens to include a 101 status. For a server-to-Azure connection this is low risk but worth noting. If the connection ever routes through an untrusted proxy, this becomes exploitable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/azure-tts-ws.ts` around lines 197 - 199, The WebSocket handshake code in
lib/azure-tts-ws.ts currently skips verifying Sec-WebSocket-Accept; re-enable
strict verification by reading the request's Sec-WebSocket-Key, computing
expectedAccept = base64(sha1(secWebSocketKey +
"258EAFA5-E914-47DA-95CA-C5AB0DC85B11")), and comparing it against the
Sec-WebSocket-Accept header returned — if they differ, treat the upgrade as
failed and close the connection; update the upgrade/handshake function (the code
that issues the HTTP upgrade and processes headers) to perform this check and
optionally add a clear opt-out flag (e.g., allowInsecureHandshake) that logs a
warning when set.
components/features/tts-player.tsx (2)

102-106: Hard-coded limit "3" will drift if FREE_TTS_LIMIT changes.

The denominator in {usageCount}/3 is a magic number that duplicates the constant in use-tts.ts. Consider accepting it as a prop (e.g. maxUsage) or importing the shared constant so the UI and logic stay in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/features/tts-player.tsx` around lines 102 - 106, The UI uses a
hard-coded "3" in the TTS limit display which will fall out of sync with the
shared limit; update the TTSPlayer component
(components/features/tts-player.tsx) to use a single source of truth: either
accept a prop like maxUsage on the TTSPlayer component and pass the value from
where useTTS is used, or import the shared FREE_TTS_LIMIT constant from
use-tts.ts and replace the literal 3 with that constant (ensure typing/exports
allow import); update any callers to provide the prop if you choose the prop
approach so the denominator always matches the logic in useTTS/use-tts.ts.

163-196: Rate menu lacks keyboard accessibility.

The dropdown doesn't close on Escape and doesn't trap focus or support arrow-key navigation. This can frustrate keyboard-only users. At minimum, add an onKeyDown handler to close on Escape.

💡 Minimal Escape-to-close addition
-        <div className="relative" ref={rateMenuRef}>
+        <div className="relative" ref={rateMenuRef} onKeyDown={(e) => { if (e.key === "Escape") setShowRateMenu(false); }}>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/features/tts-player.tsx` around lines 163 - 196, The rate menu
lacks keyboard accessibility—add an onKeyDown handler on the menu container (the
element rendered when showRateMenu is true) that listens for Escape and calls
setShowRateMenu(false) to close the menu; additionally, when toggling the menu
via the rate button (the one using rateMenuRef and setShowRateMenu), move focus
into the menu (e.g., first RATE_OPTIONS button) on open and restore focus to the
trigger button on close, and ensure each RATE_OPTIONS button remains
keyboard-focusable and calls onRateChange(r) as now.
lib/hooks/use-tts.ts (2)

126-168: Consider throttling state updates in trackPlayback.

setProgress, setCurrentTime, and setDuration fire on every animation frame (~60 fps), each triggering a re-render of every consumer. Since humans can't perceive time updates faster than ~4–10 Hz in a progress bar, you could throttle these updates (e.g., every 250 ms) while keeping the word-boundary tracking at full frame rate via refs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/hooks/use-tts.ts` around lines 126 - 168, trackPlayback currently calls
setProgress, setCurrentTime, and setDuration on every animation frame; add a
throttling ref (e.g., lastStateUpdateRef) and only call those three state
setters when enough time has passed (e.g., 250ms via performance.now() or
timeMs) while leaving the word-boundary tracking (lastWordIdxRef,
setCurrentWordIndex, setCurrentWord) running at full frame rate; create and use
lastStateUpdateRef in the hook, check elapsed time before invoking
setProgress/setCurrentTime/setDuration, and update lastStateUpdateRef when you
do so, keeping animFrameRef and the forward-scan/binary-search logic unchanged.

170-328: Audio only starts after the entire SSE stream completes — potentially long wait for large articles.

The play function buffers all audio chunks (Lines 244–272) before creating the blob and starting playback (Lines 289–309). For long articles, this means the user sees a loading spinner until the full audio is generated and transferred. This is a known trade-off with blob-based playback vs. MediaSource streaming, so just flagging for awareness — it may be fine for typical article lengths.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/api/tts/voices/route.ts`:
- Around line 9-18: The GET handler currently parses and caches every upstream
response; update the GET function to check response.ok after
fetch(`${API_URL}/api/tts/voices`) and only attach the long Cache-Control header
for successful responses. If response.ok is false, read the upstream body (JSON
or text), return a Response.json (or Response) with the upstream status (e.g.,
response.status) and do not set the long public max-age cache header (use
no-store or a short cache), and ensure error details are included in the
returned payload; reference the GET function, the fetch call using API_URL, and
the Response.json usage when locating where to add the response.ok check and
alternate error path.

In `@components/features/floating-toolbar.tsx`:
- Around line 337-346: The TTS ToolbarButton is missing the tooltipDisabled prop
which causes its tooltip to show over open panels; update the ToolbarButton (the
instance using props icon/isTTSLoading, label/isTTSActive, shortcut "L",
onClick/onTTSToggle) to include tooltipDisabled={anyPanelOpen} so it matches the
other toolbar buttons and suppresses the tooltip when anyPanelOpen is true.

In `@docs/TTS.md`:
- Around line 236-241: Docs incorrectly state "no API key" and Edge websocket
usage; update the Connection details to describe Azure Cognitive Services TTS
(used in server/env.ts and server/routes/tts.ts) including required environment
variables AZURE_SPEECH_KEY and AZURE_SPEECH_REGION, the Azure REST/endpoint
format (region-specific TTS endpoint) and subscription-key auth headers, and
remove or clearly mark the Edge wss://speech.platform.bing.com flow as a
separate optional fallback (if kept, document its different auth/token
mechanism). Ensure the docs mention that AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION are required or document fallback selection logic if both
modes are supported.
- Line 7: Three fenced code blocks in the TTS documentation are missing language
specifiers, causing markdown-lint MD040 warnings; update each of the three
architecture/diagram fenced blocks (the ASCII/diagram blocks demarcated by the
triple backticks ``` ) by adding a language identifier such as text or plaintext
immediately after the opening backticks (i.e., change ``` to ```text) so the
three ASCII diagram/code blocks are properly tagged.

In `@lib/azure-tts-ws.ts`:
- Around line 389-397: The close() method sets this.closed before calling
sendClose(), so sendClose() returns early and the WebSocket close frame is never
sent; change the sequence so sendClose() is invoked while this.closed is still
false and only set this.closed = true after sendClose() completes (or on its
completion callback/Promise), ensuring sendClose() can transmit the close frame;
locate the close() and sendClose() methods in azure-tts-ws.ts and either move
the assignment of this.closed to after sendClose() or adapt sendClose() to use a
separate "closeSent" flag so the close frame is actually sent before marking the
socket closed.

In `@lib/hooks/use-tts.ts`:
- Line 98: The expression that computes canUse calls getLocalUsage() on every
render (causing localStorage reads/parses during high-frequency updates from
trackPlayback); fix by reading/parsing localStorage once and deriving a
persistent flag instead of calling getLocalUsage() inline: on hook
initialization compute an "hasUsedArticle" boolean (based on getLocalUsage() and
articleUrl) and store it in state or memoize canUse with useMemo depending only
on usageCount, isPremium, NODE_ENV, FREE_TTS_LIMIT and that one-time
hasUsedArticle; update the canUse expression to reference that cached flag and
remove repeated getLocalUsage() calls.
- Around line 244-272: The SSE parsing loop resets eventType on every chunk read
which drops events split across chunk boundaries; move the declaration of
eventType (currently inside the while (true) loop) to the outer scope above the
loop so its value persists between reads, and only reset eventType to "" after
successfully handling a corresponding data: line; keep the existing logic that
pushes to boundariesRef.current, audioChunksRef.current and throws on error, but
ensure eventType is not reinitialized on each reader.read() iteration.
- Around line 341-347: The resume callback calls audioRef.current.play() without
handling its returned Promise, causing unhandled rejections and leaving state
inconsistent; change resume (the function named resume) to handle the Promise
from audioRef.current.play() (use .then/.catch or async/await) so
setStatus("playing") and starting the animation frame (animFrameRef.current =
requestAnimationFrame(trackPlayback)) only happen after play() resolves, and in
the catch path revert or setStatus to "paused" (and optionally cancel any
animation frame) and log the error to avoid unhandled promise rejections.

In `@lib/tts-concurrency.ts`:
- Around line 170-192: getTTSSlotStats() currently returns perUserBreakdown
(derived from perUserActive) which leaks user IDs to the unauthenticated /health
endpoint; remove or sanitize that field. Update getTTSSlotStats to either omit
perUserBreakdown entirely or replace it with a non-identifying aggregate (e.g.,
counts histogram or keyed by hashed/salted IDs) so no raw user IDs are returned,
and ensure any code referencing perUserBreakdown (from perUserActive) is adapted
to the new shape; keep the rest of the returned stats (activeSlots,
queuedRequests, totalAcquired, etc.) unchanged.
- Around line 140-164: Queued entries rejected or skipped in the dequeue loop
leak their timeout and abort listener because releaseTTSSlot calls
next.reject(...) or the loop continues without running the cleanup from
acquireTTSSlot; modify the queued entry shape created in acquireTTSSlot to
include a cleanup() function that clears the timeout and removes the abort
listener, store that on each queue item, and then in the dequeue loop inside
releaseTTSSlot call next.cleanup() before calling next.reject(new
TTSUserLimitError(...)) or before continuing on aborted entries so
timers/listeners are always removed; update references to queue, acquireTTSSlot,
releaseTTSSlot, and TTSUserLimitError accordingly.
- Around line 60-71: The per-user limit currently maps null userId to the
literal "anonymous" in acquireTTSSlot which causes all unauthenticated requests
to share a single slot; change the logic so that when userId is null you do not
apply the per-user limit (i.e., do not map to "anonymous" and skip the
perUserActive/maxPerUser check), or alternatively accept and use an ephemeral
identifier (e.g., per-IP passed into acquireTTSSlot) instead of the fixed
"anonymous" string; update acquireTTSSlot to branch on userId === null (skip
per-user counting) and keep the rest of the function (totalRejected,
TTSUserLimitError, perUserActive updates) unchanged for authenticated users.

In `@server/env.ts`:
- Around line 59-60: AZURE_SPEECH_KEY and AZURE_SPEECH_REGION are currently
required and will break startups without Azure credentials; change their zod
schemas to optional (e.g., z.string().min(1).optional()) and update the TTS
route registration logic to only initialize/register TTS functionality when both
AZURE_SPEECH_KEY and AZURE_SPEECH_REGION are present (check these env variables
before constructing the Azure client in the TTS initialization code). Also
replace .optional() on MAX_CONCURRENT_TTS, MAX_TTS_PER_USER, and
TTS_SLOT_TIMEOUT_MS with .default(20), .default(2), and .default(15000)
respectively so their types are consistent and safe, and ensure any code reading
these values uses the defaulted values.

In `@server/routes/tts.ts`:
- Around line 526-568: The TTS slot can be released twice because
releaseTTSSlot(auth.userId) is called both in the ReadableStream start() finally
block and in cancel(); introduce a local boolean (e.g., slotReleased) scoped to
the stream creation and update both start() and cancel() to call releaseTTSSlot
only if slotReleased is false, then set it true immediately after calling
releaseTTSSlot; ensure the abortController.abort(), tracker.end(...) and logging
remain unchanged but are executed independently of the guarded release so the
slot never decrements twice.
- Around line 349-354: The SSML template assigns unsanitized voice, rate, and
pitch into ssmlMessage (see ssmlMessage construction and streamTTS usage)
allowing attribute/element injection; fix by validating and/or sanitizing those
parameters before interpolation: enforce an allowlist (preferred) or strict
regex for voice names, and numeric bounds/format for rate and pitch, or
XML-escape them when used in attributes; if values fail validation, reject the
request or substitute a safe default so only safe, expected strings are inserted
into the ssmlMessage.
- Around line 1-48: The startup currently calls
initAzureTTS(env.AZURE_SPEECH_KEY, env.AZURE_SPEECH_REGION) during module import
which fails if those env vars are absent; make AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION optional in the env schema and change tts.ts to guard or
defer initAzureTTS (e.g., only call initAzureTTS when both env.AZURE_SPEECH_KEY
and env.AZURE_SPEECH_REGION are present, or initialize lazily inside the
/api/tts handler), and ensure any uses of AzureTTS helpers (initAzureTTS,
buildTTSUrl, getTTSHeaders, getTTSHost, getTTSOrigin, buildVoiceListUrl,
getVoiceListHeaders) check for initialized state and return a clear 404/503 or
feature-disabled response when TTS is not configured.

---

Nitpick comments:
In `@app/api/tts/route.ts`:
- Around line 36-44: The fetch to `${API_URL}/api/tts` uses only
AbortSignal.timeout(120_000) so client disconnects aren't propagated; update the
fetch call in the route handler to combine the incoming request signal
(req.signal) with the timeout (e.g., use AbortSignal.any([req.signal,
AbortSignal.timeout(120_000)]) when available) or create an AbortController,
listen for req.signal.abort to call controller.abort(), and pass
controller.signal to fetch so upstream Elysia requests are cancelled immediately
on client disconnect.

In `@components/features/proxy-content.tsx`:
- Around line 160-168: handleTTSToggle is recreated every render because the
entire tts object from useTTS is in the dependency array; destructure the
specific stable bits you need (e.g., const { isPlaying, isPaused, isLoading,
play, stop } = tts or directly from useTTS) and replace [tts] with the concrete
dependencies (play, stop, isTTSActive), and introduce a derived const
isTTSActive = isPlaying || isPaused || isLoading to use inside handleTTSToggle
and elsewhere (lines referenced: handleTTSToggle, keyboard effect, and usages
around lines ~904 and ~1210) so the callback and keyboard listener only
re-register when actual TTS state/functions change.

In `@components/features/tts-highlight.tsx`:
- Around line 94-144: The useEffect currently lists currentWord in its
dependency array but never reads it inside the effect body; remove currentWord
from the dependency array of the useEffect that manages highlights so the effect
only depends on currentWordIndex, isActive, and ensureWordIndex (update the
dependency array used by the useEffect function that references
lastHighlightIdx.current, ensureWordIndex(), wordIndexRef.current, and
CSS.highlights). If you intended the effect to re-run when the word text
changes, instead explicitly reference currentWord inside the effect; otherwise
drop it to reflect actual dependencies.

In `@components/features/tts-player.tsx`:
- Around line 102-106: The UI uses a hard-coded "3" in the TTS limit display
which will fall out of sync with the shared limit; update the TTSPlayer
component (components/features/tts-player.tsx) to use a single source of truth:
either accept a prop like maxUsage on the TTSPlayer component and pass the value
from where useTTS is used, or import the shared FREE_TTS_LIMIT constant from
use-tts.ts and replace the literal 3 with that constant (ensure typing/exports
allow import); update any callers to provide the prop if you choose the prop
approach so the denominator always matches the logic in useTTS/use-tts.ts.
- Around line 163-196: The rate menu lacks keyboard accessibility—add an
onKeyDown handler on the menu container (the element rendered when showRateMenu
is true) that listens for Escape and calls setShowRateMenu(false) to close the
menu; additionally, when toggling the menu via the rate button (the one using
rateMenuRef and setShowRateMenu), move focus into the menu (e.g., first
RATE_OPTIONS button) on open and restore focus to the trigger button on close,
and ensure each RATE_OPTIONS button remains keyboard-focusable and calls
onRateChange(r) as now.

In `@docs/TTS.md`:
- Around line 238-241: Update the TTS docs to remove the claim about a
"hardcoded trusted-client token" and the WebSocket
`wss://speech.platform.bing.com/...`/Edge token workflow; instead document the
supported, supported-auth method(s) actually used by the codebase (e.g., Azure
Speech Service) and reference the environment variables AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION (or other auth config symbols present) as the correct setup,
and add a short note warning not to rely on reverse-engineered tokens and that
those are not supported or recommended for production.

In `@lib/azure-tts-ws.ts`:
- Around line 197-199: The WebSocket handshake code in lib/azure-tts-ws.ts
currently skips verifying Sec-WebSocket-Accept; re-enable strict verification by
reading the request's Sec-WebSocket-Key, computing expectedAccept =
base64(sha1(secWebSocketKey + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")), and
comparing it against the Sec-WebSocket-Accept header returned — if they differ,
treat the upgrade as failed and close the connection; update the
upgrade/handshake function (the code that issues the HTTP upgrade and processes
headers) to perform this check and optionally add a clear opt-out flag (e.g.,
allowInsecureHandshake) that logs a warning when set.

In `@lib/hooks/use-tts.ts`:
- Around line 126-168: trackPlayback currently calls setProgress,
setCurrentTime, and setDuration on every animation frame; add a throttling ref
(e.g., lastStateUpdateRef) and only call those three state setters when enough
time has passed (e.g., 250ms via performance.now() or timeMs) while leaving the
word-boundary tracking (lastWordIdxRef, setCurrentWordIndex, setCurrentWord)
running at full frame rate; create and use lastStateUpdateRef in the hook, check
elapsed time before invoking setProgress/setCurrentTime/setDuration, and update
lastStateUpdateRef when you do so, keeping animFrameRef and the
forward-scan/binary-search logic unchanged.

In `@server/env.ts`:
- Around line 54-56: The TTS env vars MAX_CONCURRENT_TTS, MAX_TTS_PER_USER, and
TTS_SLOT_TIMEOUT_MS are defined with z.coerce.number().optional(), producing
number|undefined; change each to z.coerce.number().default(...) to match the
other env vars and guarantee a typed number: set MAX_CONCURRENT_TTS.default(20),
MAX_TTS_PER_USER.default(2), and TTS_SLOT_TIMEOUT_MS.default(15000) so
downstream code no longer needs to handle undefined.

In `@server/index.ts`:
- Around line 32-36: The unhealthy /health response omits TTS stats; update the
health-check handler that builds the unhealthy response to include tts:
getTTSSlotStats() just like the healthy branch does. Locate the
configureTTSLimiter setup and the getTTSSlotStats() accessor, then add
getTTSSlotStats() into the unhealthy response object so both success and failure
branches return TTS slot metrics for diagnostics.

In `@server/routes/tts.ts`:
- Around line 155-173: The Redis check in the tts usage flow
(monthKey/getMonthKey, redisKey, FREE_TTS_LIMIT) is vulnerable to a TOCTOU race
because it calls redis.get(...) then redis.incr(...); change the logic to
perform an atomic increment-first check: call redis.incr(redisKey) (or use a
small Lua script that INCR and set TTL atomically), inspect the returned value
and if it exceeds FREE_TTS_LIMIT immediately DECR the key (or return disallowed)
and only set the expire when the counter transitions from 1 to 2 (or via the Lua
script set TTL on first increment); this ensures correct limits without a
separate get/incr race.

# Conflicts:
#	components/features/mobile-bottom-bar.tsx
#	components/features/proxy-content.tsx
greptile-apps[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

@railway-app railway-app bot temporarily deployed to smry / SMRY-pr-73 February 24, 2026 14:57 Destroyed
greptile-apps[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (1)

lib/tts-chunk.ts, line 89
cache version mismatch — client uses v3 (lib/hooks/use-tts.ts:208) but server uses v2

client and server compute different SHA-256 keys for identical content, causing cache misses and duplicate API calls

const CHUNK_CACHE_VERSION = "v3";

devin-ai-integration[bot]

This comment was marked as resolved.

@railway-app railway-app bot temporarily deployed to smry / SMRY-pr-73 February 26, 2026 16:24 Destroyed
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (2)

docs/TTS.md, line 11
documentation states "120s timeout" but implementation uses 300s (5 minutes) in app/api/tts/route.ts:43

update to "(300s timeout)" to match actual timeout value


docs/TTS.md, line 74
documentation states "120s timeout" but implementation uses 300s in app/api/tts/route.ts:43

update to "300s timeout" to match code

devin-ai-integration[bot]

This comment was marked as resolved.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Additional Comments (6)

server/routes/tts.ts, line 387
Usage deduplication can be bypassed — articleUrl is user-controlled (optional in request body at line 998), allowing users to send different text with the same articleUrl to avoid incrementing usage counter. A malicious user could listen to unlimited articles by reusing the same URL.

Deduplication should use the article cache key (hash of cleaned text) instead of user-provided articleUrl.


server/routes/tts.ts, line 383
Race condition — sadd and expire are separate Redis operations. If the process crashes or Redis connection fails between them, the dedup set will never expire and persist forever.

Use a transaction or Lua script to make this atomic.


server/routes/tts.ts, line 393
Same race condition — incr and expire should be atomic.


server/routes/tts.ts, line 410
Memory fallback has no deduplication — when Redis is down, every TTS request increments the counter even if it's the same article+voice combo. Users could bypass the daily limit by making requests when Redis is unavailable.


server/routes/tts.ts, line 748
Usage count header shows optimistic value (usage.count + 1) before the actual increment completes (line 738 is fire-and-forget). If incrementTtsUsage fails silently, the header will be incorrect, causing client/server mismatch.


server/routes/tts.ts, line 946
Request deduplication can cause cascading failures — if the first request for an article+voice combo is aborted by client disconnect (line 911), all other waiting requests will fail with the same abort error (line 942), even though their clients didn't disconnect.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 28 additional findings in Devin Review.

Open in Devin Review

@KartikLabhshetwar KartikLabhshetwar merged commit d09c6af into main Feb 26, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant