Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a full TTS feature: client hook and UI, Next.js proxy routes, backend ElevenLabs integration with per-chunk caching and concurrency limits, transcript viewer and scrubber UI, concurrency limiter, text/chunk utilities, env/config updates, docs, and icon/UX changes. Changes
Sequence DiagramssequenceDiagram
participant Client as Browser Client
participant NextJS as Next.js Proxy
participant Backend as Backend API
participant ElevenLabs as ElevenLabs API
participant Cache as Server Chunk Cache
Client->>NextJS: POST /api/tts {text, voice, url}
NextJS->>Backend: Forward request + headers
Backend->>Backend: Split text into chunks
loop per chunk
Backend->>Cache: Check per-chunk cache
alt cache hit
Cache-->>Backend: cached audio + alignment
else cache miss
Backend->>Backend: acquireTTSSlot(user)
Backend->>ElevenLabs: synthesize chunk
ElevenLabs-->>Backend: audio + character timestamps
Backend->>Cache: store chunk result
Backend->>Backend: releaseTTSSlot(user)
end
end
Backend->>Backend: merge alignments, concat audio
Backend-->>NextJS: JSON {audioBase64, alignment, durationMs, usage headers}
NextJS-->>Client: JSON response
sequenceDiagram
participant User as User
participant UI as Toolbar / Bottom Bar
participant Hook as useTTS Hook
participant IndexedDB as IndexedDB Cache
participant Proxy as Next.js /api/tts
participant Player as Audio Player
participant Highlight as useTTSHighlight
User->>UI: Click "Listen"
UI->>Hook: load()
Hook->>IndexedDB: lookup cache (7d TTL)
alt cache hit
IndexedDB-->>Hook: audio blob + alignment
else cache miss
Hook->>Hook: check local daily usage
Hook->>Proxy: POST {text, voice}
Proxy-->>Hook: audioBase64 + alignment
Hook->>Hook: decode → blob, createObjectURL
Hook->>IndexedDB: store cached entry (async)
end
Hook-->>UI: isReady, isTTSActive
UI->>Player: play(blob URL)
Player->>Hook: timeupdate
Hook->>Highlight: update currentWordIndex
Highlight->>UI: highlight current word in DOM
Player->>UI: ended
Hook-->>UI: isTTSActive=false
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🚅 Deployed to the SMRY-pr-73 environment in smry
|
Greptile SummaryThis PR adds comprehensive text-to-speech functionality using Inworld AI with multi-level caching, usage tracking, and concurrency management. The implementation includes client-side audio player with transcript viewer, voice selection, and real-time word highlighting. Key Changes:
Critical Issues Found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Client TTS Request] --> B{L0: Memory Cache?}
B -->|Hit| Z[Return Audio]
B -->|Miss| C{L1: IndexedDB Cache?}
C -->|Hit| Z
C -->|Miss| D{Check Usage Limit}
D -->|Exceeded| E[Return 429 Error]
D -->|Allowed| F[POST /api/tts]
F --> G{Server L2: Article Cache?}
G -->|Hit| H[Increment Usage]
G -->|Miss| I{L3: Redis Article?}
I -->|Hit| H
I -->|Miss| J[Split into Chunks]
J --> K{L4: Chunk LRU Cache?}
K -->|Partial Hit| L{L5: Redis Chunk?}
K -->|Full Hit| M[Merge Chunks]
L -->|Hits| M
L -->|Misses| N{Acquire Slot}
N -->|Timeout| O[Return 503]
N -->|Success| P[Generate via Inworld API]
P --> Q[Cache Chunks]
Q --> M
M --> R[Generate Xing Header]
R --> S[Cache Article]
S --> H
H --> T{Dedup Check}
T -->|Already Counted| Z
T -->|New| U[Increment Counter]
U --> Z
Last reviewed commit: cd0b9a9 |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (12)
server/env.ts (1)
54-56: TTS concurrency vars should use.default()instead of.optional()for consistency.
MAX_CONCURRENT_ARTICLE_FETCHESandARTICLE_FETCH_SLOT_TIMEOUT_MSuse.coerce.number().default(n), which guarantees a typed number. The new TTS equivalents use.optional(), yieldingnumber | undefined— requiring all downstream consumers to handleundefinedexplicitly. The docs already document concrete defaults (20, 2, 15000).♻️ Proposed fix
- MAX_CONCURRENT_TTS: z.coerce.number().optional(), - MAX_TTS_PER_USER: z.coerce.number().optional(), - TTS_SLOT_TIMEOUT_MS: z.coerce.number().optional(), + MAX_CONCURRENT_TTS: z.coerce.number().default(20), + MAX_TTS_PER_USER: z.coerce.number().default(2), + TTS_SLOT_TIMEOUT_MS: z.coerce.number().default(15000),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/env.ts` around lines 54 - 56, The TTS env vars MAX_CONCURRENT_TTS, MAX_TTS_PER_USER, and TTS_SLOT_TIMEOUT_MS are defined with z.coerce.number().optional(), producing number|undefined; change each to z.coerce.number().default(...) to match the other env vars and guarantee a typed number: set MAX_CONCURRENT_TTS.default(20), MAX_TTS_PER_USER.default(2), and TTS_SLOT_TIMEOUT_MS.default(15000) so downstream code no longer needs to handle undefined.docs/TTS.md (1)
238-241: Hardcoded trusted-client token is a fragile, ToS-violating dependency.Using Microsoft Edge's internal trusted-client token is reverse-engineering a private API. At the documented scale (30K DAU, 100+ concurrent), this risks:
- IP bans or rate limiting from Microsoft
- Silent breakage if the token is rotated (no official notice)
- ToS violation for commercial use
If the implementation actually uses Azure Speech Service (as indicated by
AZURE_SPEECH_KEY/AZURE_SPEECH_REGION), this concern may already be resolved — but the docs need to be corrected to remove the misleading hardcoded-token description.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/TTS.md` around lines 238 - 241, Update the TTS docs to remove the claim about a "hardcoded trusted-client token" and the WebSocket `wss://speech.platform.bing.com/...`/Edge token workflow; instead document the supported, supported-auth method(s) actually used by the codebase (e.g., Azure Speech Service) and reference the environment variables AZURE_SPEECH_KEY and AZURE_SPEECH_REGION (or other auth config symbols present) as the correct setup, and add a short note warning not to rely on reverse-engineered tokens and that those are not supported or recommended for production.components/features/proxy-content.tsx (1)
160-168:[tts]as useCallback dependency likely invalidates every render.If
useTTSreturns a new object on each render (standard for hooks returning{ isPlaying, play, stop, ... }),handleTTSTogglewill be recreated every render. This cascades to the keyboard effect (line 715) re-registering its listener on every render.Consider destructuring the specific stable values you need, or extracting a derived
isTTSActiveflag:Proposed approach
+ const isTTSActive = tts.isPlaying || tts.isPaused || tts.isLoading; + const handleTTSToggle = React.useCallback(() => { - if (tts.isPlaying || tts.isPaused || tts.isLoading) { + if (isTTSActive) { tts.stop(); setTTSOpen(false); } else { setTTSOpen(true); tts.play(); } - }, [tts]); + }, [isTTSActive, tts.stop, tts.play]);This also lets you reuse
isTTSActiveat lines 904 and 1210 instead of repeating the expression.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/features/proxy-content.tsx` around lines 160 - 168, handleTTSToggle is recreated every render because the entire tts object from useTTS is in the dependency array; destructure the specific stable bits you need (e.g., const { isPlaying, isPaused, isLoading, play, stop } = tts or directly from useTTS) and replace [tts] with the concrete dependencies (play, stop, isTTSActive), and introduce a derived const isTTSActive = isPlaying || isPaused || isLoading to use inside handleTTSToggle and elsewhere (lines referenced: handleTTSToggle, keyboard effect, and usages around lines ~904 and ~1210) so the callback and keyboard listener only re-register when actual TTS state/functions change.server/index.ts (1)
32-36: Clean integration of TTS routes and concurrency configuration.Follows the established pattern from
configureFetchLimiter. One minor inconsistency: TTS stats (line 93) are only included in the healthy/healthresponse, not the unhealthy branch (lines 71–82). Consider addingtts: getTTSSlotStats()there too — TTS queue depth during a memory crisis would be useful for diagnostics.Also applies to: 93-93, 105-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/index.ts` around lines 32 - 36, The unhealthy /health response omits TTS stats; update the health-check handler that builds the unhealthy response to include tts: getTTSSlotStats() just like the healthy branch does. Locate the configureTTSLimiter setup and the getTTSSlotStats() accessor, then add getTTSSlotStats() into the unhealthy response object so both success and failure branches return TTS slot metrics for diagnostics.app/api/tts/route.ts (1)
36-44: Client disconnect is not propagated to the upstream fetch.The
AbortSignal.timeout(120_000)handles staleness, but if the Next.js client disconnects early, the upstream Elysia request continues until the timeout expires. Consider combining signals:Proposed enhancement
+ // Combine client abort + 120s timeout + const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(new DOMException("TimeoutError", "TimeoutError")), 120_000); + req.signal.addEventListener("abort", () => controller.abort(), { once: true }); + let response: Response; try { response = await fetch(`${API_URL}/api/tts`, { method: "POST", headers, body, - signal: AbortSignal.timeout(120_000), + signal: controller.signal, }); + clearTimeout(timeoutId); } catch (err) {Alternatively, use
AbortSignal.any([req.signal, AbortSignal.timeout(120_000)])if your Node.js version supports it (18.17+).Based on learnings: "Use Next.js route handlers to proxy streaming requests to Elysia API to avoid SSE buffering" — this proxy correctly fulfills that pattern; connecting the abort signals would complete the lifecycle management.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/api/tts/route.ts` around lines 36 - 44, The fetch to `${API_URL}/api/tts` uses only AbortSignal.timeout(120_000) so client disconnects aren't propagated; update the fetch call in the route handler to combine the incoming request signal (req.signal) with the timeout (e.g., use AbortSignal.any([req.signal, AbortSignal.timeout(120_000)]) when available) or create an AbortController, listen for req.signal.abort to call controller.abort(), and pass controller.signal to fetch so upstream Elysia requests are cancelled immediately on client disconnect.server/routes/tts.ts (1)
155-173: Minor TOCTOU in Redis usage check —getthenincris not atomic.Between
redis.getandredis.incr, a concurrent request for the same user could also pass the check, allowing usage to exceedFREE_TTS_LIMITby 1. With a limit of 3 this is negligible. If you want exactness, use a Lua script orredis.incrfirst, then check the returned value anddecrif over limit.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/routes/tts.ts` around lines 155 - 173, The Redis check in the tts usage flow (monthKey/getMonthKey, redisKey, FREE_TTS_LIMIT) is vulnerable to a TOCTOU race because it calls redis.get(...) then redis.incr(...); change the logic to perform an atomic increment-first check: call redis.incr(redisKey) (or use a small Lua script that INCR and set TTL atomically), inspect the returned value and if it exceeds FREE_TTS_LIMIT immediately DECR the key (or return disallowed) and only set the expire when the counter transitions from 1 to 2 (or via the Lua script set TTL on first increment); this ensures correct limits without a separate get/incr race.components/features/tts-highlight.tsx (1)
94-144:currentWordis in the dependency array but unused in the effect body.
currentWord(line 144 dep array) is never read inside the effect — onlycurrentWordIndexis used. If the word text changes but the index stays the same, this triggers a no-op effect run (skipped at line 103). This is harmless but misleading; consider removing it from the deps to better document intent.Proposed change
- }, [currentWord, currentWordIndex, isActive, ensureWordIndex]); + }, [currentWordIndex, isActive, ensureWordIndex]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/features/tts-highlight.tsx` around lines 94 - 144, The useEffect currently lists currentWord in its dependency array but never reads it inside the effect body; remove currentWord from the dependency array of the useEffect that manages highlights so the effect only depends on currentWordIndex, isActive, and ensureWordIndex (update the dependency array used by the useEffect function that references lastHighlightIdx.current, ensureWordIndex(), wordIndexRef.current, and CSS.highlights). If you intended the effect to re-run when the word text changes, instead explicitly reference currentWord inside the effect; otherwise drop it to reflect actual dependencies.lib/azure-tts-ws.ts (1)
197-199: SkippingSec-WebSocket-Acceptverification weakens handshake integrity.The comment explains the rationale, but skipping this check means the client cannot detect a non-WebSocket response body that happens to include a
101status. For a server-to-Azure connection this is low risk but worth noting. If the connection ever routes through an untrusted proxy, this becomes exploitable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/azure-tts-ws.ts` around lines 197 - 199, The WebSocket handshake code in lib/azure-tts-ws.ts currently skips verifying Sec-WebSocket-Accept; re-enable strict verification by reading the request's Sec-WebSocket-Key, computing expectedAccept = base64(sha1(secWebSocketKey + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")), and comparing it against the Sec-WebSocket-Accept header returned — if they differ, treat the upgrade as failed and close the connection; update the upgrade/handshake function (the code that issues the HTTP upgrade and processes headers) to perform this check and optionally add a clear opt-out flag (e.g., allowInsecureHandshake) that logs a warning when set.components/features/tts-player.tsx (2)
102-106: Hard-coded limit "3" will drift ifFREE_TTS_LIMITchanges.The denominator in
{usageCount}/3is a magic number that duplicates the constant inuse-tts.ts. Consider accepting it as a prop (e.g.maxUsage) or importing the shared constant so the UI and logic stay in sync.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/features/tts-player.tsx` around lines 102 - 106, The UI uses a hard-coded "3" in the TTS limit display which will fall out of sync with the shared limit; update the TTSPlayer component (components/features/tts-player.tsx) to use a single source of truth: either accept a prop like maxUsage on the TTSPlayer component and pass the value from where useTTS is used, or import the shared FREE_TTS_LIMIT constant from use-tts.ts and replace the literal 3 with that constant (ensure typing/exports allow import); update any callers to provide the prop if you choose the prop approach so the denominator always matches the logic in useTTS/use-tts.ts.
163-196: Rate menu lacks keyboard accessibility.The dropdown doesn't close on
Escapeand doesn't trap focus or support arrow-key navigation. This can frustrate keyboard-only users. At minimum, add anonKeyDownhandler to close onEscape.💡 Minimal Escape-to-close addition
- <div className="relative" ref={rateMenuRef}> + <div className="relative" ref={rateMenuRef} onKeyDown={(e) => { if (e.key === "Escape") setShowRateMenu(false); }}>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/features/tts-player.tsx` around lines 163 - 196, The rate menu lacks keyboard accessibility—add an onKeyDown handler on the menu container (the element rendered when showRateMenu is true) that listens for Escape and calls setShowRateMenu(false) to close the menu; additionally, when toggling the menu via the rate button (the one using rateMenuRef and setShowRateMenu), move focus into the menu (e.g., first RATE_OPTIONS button) on open and restore focus to the trigger button on close, and ensure each RATE_OPTIONS button remains keyboard-focusable and calls onRateChange(r) as now.lib/hooks/use-tts.ts (2)
126-168: Consider throttling state updates intrackPlayback.
setProgress,setCurrentTime, andsetDurationfire on every animation frame (~60 fps), each triggering a re-render of every consumer. Since humans can't perceive time updates faster than ~4–10 Hz in a progress bar, you could throttle these updates (e.g., every 250 ms) while keeping the word-boundary tracking at full frame rate via refs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/hooks/use-tts.ts` around lines 126 - 168, trackPlayback currently calls setProgress, setCurrentTime, and setDuration on every animation frame; add a throttling ref (e.g., lastStateUpdateRef) and only call those three state setters when enough time has passed (e.g., 250ms via performance.now() or timeMs) while leaving the word-boundary tracking (lastWordIdxRef, setCurrentWordIndex, setCurrentWord) running at full frame rate; create and use lastStateUpdateRef in the hook, check elapsed time before invoking setProgress/setCurrentTime/setDuration, and update lastStateUpdateRef when you do so, keeping animFrameRef and the forward-scan/binary-search logic unchanged.
170-328: Audio only starts after the entire SSE stream completes — potentially long wait for large articles.The
playfunction buffers all audio chunks (Lines 244–272) before creating the blob and starting playback (Lines 289–309). For long articles, this means the user sees a loading spinner until the full audio is generated and transferred. This is a known trade-off with blob-based playback vs.MediaSourcestreaming, so just flagging for awareness — it may be fine for typical article lengths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/api/tts/voices/route.ts`:
- Around line 9-18: The GET handler currently parses and caches every upstream
response; update the GET function to check response.ok after
fetch(`${API_URL}/api/tts/voices`) and only attach the long Cache-Control header
for successful responses. If response.ok is false, read the upstream body (JSON
or text), return a Response.json (or Response) with the upstream status (e.g.,
response.status) and do not set the long public max-age cache header (use
no-store or a short cache), and ensure error details are included in the
returned payload; reference the GET function, the fetch call using API_URL, and
the Response.json usage when locating where to add the response.ok check and
alternate error path.
In `@components/features/floating-toolbar.tsx`:
- Around line 337-346: The TTS ToolbarButton is missing the tooltipDisabled prop
which causes its tooltip to show over open panels; update the ToolbarButton (the
instance using props icon/isTTSLoading, label/isTTSActive, shortcut "L",
onClick/onTTSToggle) to include tooltipDisabled={anyPanelOpen} so it matches the
other toolbar buttons and suppresses the tooltip when anyPanelOpen is true.
In `@docs/TTS.md`:
- Around line 236-241: Docs incorrectly state "no API key" and Edge websocket
usage; update the Connection details to describe Azure Cognitive Services TTS
(used in server/env.ts and server/routes/tts.ts) including required environment
variables AZURE_SPEECH_KEY and AZURE_SPEECH_REGION, the Azure REST/endpoint
format (region-specific TTS endpoint) and subscription-key auth headers, and
remove or clearly mark the Edge wss://speech.platform.bing.com flow as a
separate optional fallback (if kept, document its different auth/token
mechanism). Ensure the docs mention that AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION are required or document fallback selection logic if both
modes are supported.
- Line 7: Three fenced code blocks in the TTS documentation are missing language
specifiers, causing markdown-lint MD040 warnings; update each of the three
architecture/diagram fenced blocks (the ASCII/diagram blocks demarcated by the
triple backticks ``` ) by adding a language identifier such as text or plaintext
immediately after the opening backticks (i.e., change ``` to ```text) so the
three ASCII diagram/code blocks are properly tagged.
In `@lib/azure-tts-ws.ts`:
- Around line 389-397: The close() method sets this.closed before calling
sendClose(), so sendClose() returns early and the WebSocket close frame is never
sent; change the sequence so sendClose() is invoked while this.closed is still
false and only set this.closed = true after sendClose() completes (or on its
completion callback/Promise), ensuring sendClose() can transmit the close frame;
locate the close() and sendClose() methods in azure-tts-ws.ts and either move
the assignment of this.closed to after sendClose() or adapt sendClose() to use a
separate "closeSent" flag so the close frame is actually sent before marking the
socket closed.
In `@lib/hooks/use-tts.ts`:
- Line 98: The expression that computes canUse calls getLocalUsage() on every
render (causing localStorage reads/parses during high-frequency updates from
trackPlayback); fix by reading/parsing localStorage once and deriving a
persistent flag instead of calling getLocalUsage() inline: on hook
initialization compute an "hasUsedArticle" boolean (based on getLocalUsage() and
articleUrl) and store it in state or memoize canUse with useMemo depending only
on usageCount, isPremium, NODE_ENV, FREE_TTS_LIMIT and that one-time
hasUsedArticle; update the canUse expression to reference that cached flag and
remove repeated getLocalUsage() calls.
- Around line 244-272: The SSE parsing loop resets eventType on every chunk read
which drops events split across chunk boundaries; move the declaration of
eventType (currently inside the while (true) loop) to the outer scope above the
loop so its value persists between reads, and only reset eventType to "" after
successfully handling a corresponding data: line; keep the existing logic that
pushes to boundariesRef.current, audioChunksRef.current and throws on error, but
ensure eventType is not reinitialized on each reader.read() iteration.
- Around line 341-347: The resume callback calls audioRef.current.play() without
handling its returned Promise, causing unhandled rejections and leaving state
inconsistent; change resume (the function named resume) to handle the Promise
from audioRef.current.play() (use .then/.catch or async/await) so
setStatus("playing") and starting the animation frame (animFrameRef.current =
requestAnimationFrame(trackPlayback)) only happen after play() resolves, and in
the catch path revert or setStatus to "paused" (and optionally cancel any
animation frame) and log the error to avoid unhandled promise rejections.
In `@lib/tts-concurrency.ts`:
- Around line 170-192: getTTSSlotStats() currently returns perUserBreakdown
(derived from perUserActive) which leaks user IDs to the unauthenticated /health
endpoint; remove or sanitize that field. Update getTTSSlotStats to either omit
perUserBreakdown entirely or replace it with a non-identifying aggregate (e.g.,
counts histogram or keyed by hashed/salted IDs) so no raw user IDs are returned,
and ensure any code referencing perUserBreakdown (from perUserActive) is adapted
to the new shape; keep the rest of the returned stats (activeSlots,
queuedRequests, totalAcquired, etc.) unchanged.
- Around line 140-164: Queued entries rejected or skipped in the dequeue loop
leak their timeout and abort listener because releaseTTSSlot calls
next.reject(...) or the loop continues without running the cleanup from
acquireTTSSlot; modify the queued entry shape created in acquireTTSSlot to
include a cleanup() function that clears the timeout and removes the abort
listener, store that on each queue item, and then in the dequeue loop inside
releaseTTSSlot call next.cleanup() before calling next.reject(new
TTSUserLimitError(...)) or before continuing on aborted entries so
timers/listeners are always removed; update references to queue, acquireTTSSlot,
releaseTTSSlot, and TTSUserLimitError accordingly.
- Around line 60-71: The per-user limit currently maps null userId to the
literal "anonymous" in acquireTTSSlot which causes all unauthenticated requests
to share a single slot; change the logic so that when userId is null you do not
apply the per-user limit (i.e., do not map to "anonymous" and skip the
perUserActive/maxPerUser check), or alternatively accept and use an ephemeral
identifier (e.g., per-IP passed into acquireTTSSlot) instead of the fixed
"anonymous" string; update acquireTTSSlot to branch on userId === null (skip
per-user counting) and keep the rest of the function (totalRejected,
TTSUserLimitError, perUserActive updates) unchanged for authenticated users.
In `@server/env.ts`:
- Around line 59-60: AZURE_SPEECH_KEY and AZURE_SPEECH_REGION are currently
required and will break startups without Azure credentials; change their zod
schemas to optional (e.g., z.string().min(1).optional()) and update the TTS
route registration logic to only initialize/register TTS functionality when both
AZURE_SPEECH_KEY and AZURE_SPEECH_REGION are present (check these env variables
before constructing the Azure client in the TTS initialization code). Also
replace .optional() on MAX_CONCURRENT_TTS, MAX_TTS_PER_USER, and
TTS_SLOT_TIMEOUT_MS with .default(20), .default(2), and .default(15000)
respectively so their types are consistent and safe, and ensure any code reading
these values uses the defaulted values.
In `@server/routes/tts.ts`:
- Around line 526-568: The TTS slot can be released twice because
releaseTTSSlot(auth.userId) is called both in the ReadableStream start() finally
block and in cancel(); introduce a local boolean (e.g., slotReleased) scoped to
the stream creation and update both start() and cancel() to call releaseTTSSlot
only if slotReleased is false, then set it true immediately after calling
releaseTTSSlot; ensure the abortController.abort(), tracker.end(...) and logging
remain unchanged but are executed independently of the guarded release so the
slot never decrements twice.
- Around line 349-354: The SSML template assigns unsanitized voice, rate, and
pitch into ssmlMessage (see ssmlMessage construction and streamTTS usage)
allowing attribute/element injection; fix by validating and/or sanitizing those
parameters before interpolation: enforce an allowlist (preferred) or strict
regex for voice names, and numeric bounds/format for rate and pitch, or
XML-escape them when used in attributes; if values fail validation, reject the
request or substitute a safe default so only safe, expected strings are inserted
into the ssmlMessage.
- Around line 1-48: The startup currently calls
initAzureTTS(env.AZURE_SPEECH_KEY, env.AZURE_SPEECH_REGION) during module import
which fails if those env vars are absent; make AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION optional in the env schema and change tts.ts to guard or
defer initAzureTTS (e.g., only call initAzureTTS when both env.AZURE_SPEECH_KEY
and env.AZURE_SPEECH_REGION are present, or initialize lazily inside the
/api/tts handler), and ensure any uses of AzureTTS helpers (initAzureTTS,
buildTTSUrl, getTTSHeaders, getTTSHost, getTTSOrigin, buildVoiceListUrl,
getVoiceListHeaders) check for initialized state and return a clear 404/503 or
feature-disabled response when TTS is not configured.
---
Nitpick comments:
In `@app/api/tts/route.ts`:
- Around line 36-44: The fetch to `${API_URL}/api/tts` uses only
AbortSignal.timeout(120_000) so client disconnects aren't propagated; update the
fetch call in the route handler to combine the incoming request signal
(req.signal) with the timeout (e.g., use AbortSignal.any([req.signal,
AbortSignal.timeout(120_000)]) when available) or create an AbortController,
listen for req.signal.abort to call controller.abort(), and pass
controller.signal to fetch so upstream Elysia requests are cancelled immediately
on client disconnect.
In `@components/features/proxy-content.tsx`:
- Around line 160-168: handleTTSToggle is recreated every render because the
entire tts object from useTTS is in the dependency array; destructure the
specific stable bits you need (e.g., const { isPlaying, isPaused, isLoading,
play, stop } = tts or directly from useTTS) and replace [tts] with the concrete
dependencies (play, stop, isTTSActive), and introduce a derived const
isTTSActive = isPlaying || isPaused || isLoading to use inside handleTTSToggle
and elsewhere (lines referenced: handleTTSToggle, keyboard effect, and usages
around lines ~904 and ~1210) so the callback and keyboard listener only
re-register when actual TTS state/functions change.
In `@components/features/tts-highlight.tsx`:
- Around line 94-144: The useEffect currently lists currentWord in its
dependency array but never reads it inside the effect body; remove currentWord
from the dependency array of the useEffect that manages highlights so the effect
only depends on currentWordIndex, isActive, and ensureWordIndex (update the
dependency array used by the useEffect function that references
lastHighlightIdx.current, ensureWordIndex(), wordIndexRef.current, and
CSS.highlights). If you intended the effect to re-run when the word text
changes, instead explicitly reference currentWord inside the effect; otherwise
drop it to reflect actual dependencies.
In `@components/features/tts-player.tsx`:
- Around line 102-106: The UI uses a hard-coded "3" in the TTS limit display
which will fall out of sync with the shared limit; update the TTSPlayer
component (components/features/tts-player.tsx) to use a single source of truth:
either accept a prop like maxUsage on the TTSPlayer component and pass the value
from where useTTS is used, or import the shared FREE_TTS_LIMIT constant from
use-tts.ts and replace the literal 3 with that constant (ensure typing/exports
allow import); update any callers to provide the prop if you choose the prop
approach so the denominator always matches the logic in useTTS/use-tts.ts.
- Around line 163-196: The rate menu lacks keyboard accessibility—add an
onKeyDown handler on the menu container (the element rendered when showRateMenu
is true) that listens for Escape and calls setShowRateMenu(false) to close the
menu; additionally, when toggling the menu via the rate button (the one using
rateMenuRef and setShowRateMenu), move focus into the menu (e.g., first
RATE_OPTIONS button) on open and restore focus to the trigger button on close,
and ensure each RATE_OPTIONS button remains keyboard-focusable and calls
onRateChange(r) as now.
In `@docs/TTS.md`:
- Around line 238-241: Update the TTS docs to remove the claim about a
"hardcoded trusted-client token" and the WebSocket
`wss://speech.platform.bing.com/...`/Edge token workflow; instead document the
supported, supported-auth method(s) actually used by the codebase (e.g., Azure
Speech Service) and reference the environment variables AZURE_SPEECH_KEY and
AZURE_SPEECH_REGION (or other auth config symbols present) as the correct setup,
and add a short note warning not to rely on reverse-engineered tokens and that
those are not supported or recommended for production.
In `@lib/azure-tts-ws.ts`:
- Around line 197-199: The WebSocket handshake code in lib/azure-tts-ws.ts
currently skips verifying Sec-WebSocket-Accept; re-enable strict verification by
reading the request's Sec-WebSocket-Key, computing expectedAccept =
base64(sha1(secWebSocketKey + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")), and
comparing it against the Sec-WebSocket-Accept header returned — if they differ,
treat the upgrade as failed and close the connection; update the
upgrade/handshake function (the code that issues the HTTP upgrade and processes
headers) to perform this check and optionally add a clear opt-out flag (e.g.,
allowInsecureHandshake) that logs a warning when set.
In `@lib/hooks/use-tts.ts`:
- Around line 126-168: trackPlayback currently calls setProgress,
setCurrentTime, and setDuration on every animation frame; add a throttling ref
(e.g., lastStateUpdateRef) and only call those three state setters when enough
time has passed (e.g., 250ms via performance.now() or timeMs) while leaving the
word-boundary tracking (lastWordIdxRef, setCurrentWordIndex, setCurrentWord)
running at full frame rate; create and use lastStateUpdateRef in the hook, check
elapsed time before invoking setProgress/setCurrentTime/setDuration, and update
lastStateUpdateRef when you do so, keeping animFrameRef and the
forward-scan/binary-search logic unchanged.
In `@server/env.ts`:
- Around line 54-56: The TTS env vars MAX_CONCURRENT_TTS, MAX_TTS_PER_USER, and
TTS_SLOT_TIMEOUT_MS are defined with z.coerce.number().optional(), producing
number|undefined; change each to z.coerce.number().default(...) to match the
other env vars and guarantee a typed number: set MAX_CONCURRENT_TTS.default(20),
MAX_TTS_PER_USER.default(2), and TTS_SLOT_TIMEOUT_MS.default(15000) so
downstream code no longer needs to handle undefined.
In `@server/index.ts`:
- Around line 32-36: The unhealthy /health response omits TTS stats; update the
health-check handler that builds the unhealthy response to include tts:
getTTSSlotStats() just like the healthy branch does. Locate the
configureTTSLimiter setup and the getTTSSlotStats() accessor, then add
getTTSSlotStats() into the unhealthy response object so both success and failure
branches return TTS slot metrics for diagnostics.
In `@server/routes/tts.ts`:
- Around line 155-173: The Redis check in the tts usage flow
(monthKey/getMonthKey, redisKey, FREE_TTS_LIMIT) is vulnerable to a TOCTOU race
because it calls redis.get(...) then redis.incr(...); change the logic to
perform an atomic increment-first check: call redis.incr(redisKey) (or use a
small Lua script that INCR and set TTL atomically), inspect the returned value
and if it exceeds FREE_TTS_LIMIT immediately DECR the key (or return disallowed)
and only set the expire when the counter transitions from 1 to 2 (or via the Lua
script set TTL on first increment); this ensures correct limits without a
separate get/incr race.
# Conflicts: # components/features/mobile-bottom-bar.tsx # components/features/proxy-content.tsx
… closures, update error message
Additional Comments (1)
client and server compute different SHA-256 keys for identical content, causing cache misses and duplicate API calls |
… add Xing VBR header
Additional Comments (2)
update to "(300s timeout)" to match actual timeout value
update to "300s timeout" to match code |
Additional Comments (6)
Deduplication should use the article cache key (hash of cleaned text) instead of user-provided
Use a transaction or Lua script to make this atomic.
|
Summary by CodeRabbit
New Features
Documentation