v0.2.1#2
Conversation
Bump package version and add handling for oversized Matrix messages and thread reply metadata. sendRoomMessage now accepts an explicit reply target and applies thread reply fallback metadata when posting into Matrix threads. If a Matrix send fails with M_TOO_LARGE/413 for rich-text content, the adapter will split splittable text content into UTF-8 byte-sized chunks (DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES = 12000) and retry as plain-text message chunks, logging a warning; non-text oversized errors are rethrown and logged. Added helper methods for splitting by UTF-8 bytes, detecting splittable content, and improved error handling and client send logic. Updated and added tests to cover reply fallbacks, explicit reply targets, chunked retries, and M_TOO_LARGE behavior. Version bumped to 0.2.1.
Move @chat-adapter/state-memory and @chat-adapter/state-redis from dependencies to devDependencies in package.json because they are only needed for development/test environments.
Multiple refactors and bug fixes in MatrixAdapter: - Send logic: pass undefined for reply target on extra contents, apply thread/reply metadata before sending, use performSendRoomMessage and persist secrets where appropriate. - Message chunking: replace utf8ByteLength helper with Buffer.byteLength and simplify boundary search for splitting oversized messages. - Session keys: remove sessionBasePrefix indirection and use persistenceSessionPrefix directly for session storage keys and username temporary key. - Eviction and state fixes: only evict processedTimelineEventIDs and reactionByEventID when sizes exceed 10,000, avoiding unnecessary evictions; avoid reassigning userID during login device lookup; simplify joinRoomWithRetry error handling. - HTML parsing: stop stripping reply fallback markup in parseMatrixHTML and remove the unused stripReplyFallbackFromHTML helper. - Logging: ensure matrixSDKLogConfigured is set when configuring SDK logger. These changes improve correctness, reduce unexpected state mutations, and optimize memory/IO behavior during message sending.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds thread-reply metadata support (sets m.in_reply_to with optional matrixReplyToEventId), retries oversized text messages rejected with M_TOO_LARGE by splitting into plain-text chunks, introduces new config, cursor, message helper, and shared utils modules, updates tests, and bumps package to 0.2.1 with chat SDK ^4.25.0. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MatrixAdapter
participant MatrixServer
Client->>MatrixAdapter: postMessage(content, optional matrixReplyToEventId)
MatrixAdapter->>MatrixServer: sendEvent(rich-text / formatted)
alt MatrixServer responds M_TOO_LARGE / 413
MatrixAdapter->>MatrixAdapter: splitOversizedTextContent → plain-text chunks
loop for each chunk
MatrixAdapter->>MatrixAdapter: applyThreadReplyMetadata(chunk, replyEventID)
MatrixAdapter->>MatrixServer: sendEvent(plain-text chunk)
MatrixServer-->>MatrixAdapter: success
end
MatrixAdapter-->>Client: return first successful response
else success
MatrixServer-->>MatrixAdapter: success
MatrixAdapter-->>Client: return response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/index.ts`:
- Line 437: The extra attachment events still get m.in_reply_to because
sendRoomMessage is being called with undefined for the replyEventID so
applyThreadReplyMetadata resolves replyEventID ?? rootEventID; change the calls
to pass null (explicitly disable reply fallback) instead of undefined when you
intend no reply, e.g. update the sendRoomMessage invocation(s) around
sendRoomMessage(roomID, rootEventID, undefined, content) to use null, and
similarly fix the other occurrences noted (around the block at lines
~2642-2652); alternatively, update applyThreadReplyMetadata to treat undefined
as "no explicit reply" only if you prefer that semantic—reference
sendRoomMessage and applyThreadReplyMetadata and ensure m.in_reply_to is omitted
for attachments when no explicit reply ID is provided.
- Around line 1516-1547: sendRoomMessage currently returns only the server
response, so callers (e.g. postMessage) can't know the actual payload that was
accepted after applyThreadReplyMetadata or after splitting on M_TOO_LARGE;
update sendRoomMessage (and its helpers) to return both the response and the
actual content/event sent. Specifically, change the return shape of
sendRoomMessage to include a "sentContent" (or "sentEvent") alongside the
response from performSendRoomMessage; when you call applyThreadReplyMetadata,
use that modified content as the sentContent; in the M_TOO_LARGE branch, capture
the actual chunk content associated with the first successful response from
performSendRoomMessage (use firstResponse and the corresponding chunkWithMeta)
and return that chunk as sentContent; ensure maybePersistSecretsBundle() is
still invoked and update callers (e.g. postMessage) to consume the new returned
sentContent instead of assuming the original firstContent.
- Around line 2727-2781: The splitter currently uses UTF-16 indexing
(slice/text[i]) which can split surrogate pairs (emoji/astral chars); update
splitTextByUtf8Bytes and findSplitBoundary to be surrogate-safe by detecting
UTF-16 surrogate pairs with charCodeAt: when computing a boundary, if boundary >
0 and text.charCodeAt(boundary) is a low surrogate (0xDC00–0xDFFF) then move
boundary one left so you don't split the pair (or conversely if
text.charCodeAt(boundary-1) is a high surrogate (0xD800–0xDBFF) adjust
similarly); apply the same check after trimming head/tail to ensure head !==
remaining and before pushing chunks; you can keep using Buffer.byteLength for
byte checks but must never cut between a high and low surrogate in
findSplitBoundary and splitTextByUtf8Bytes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e502bd78-811b-463c-96f8-5f32740110f0
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
CHANGELOG.mdpackage.jsonsrc/index.test.tssrc/index.ts
| private splitTextByUtf8Bytes(text: string, maxBytes: number): string[] { | ||
| const normalizedMaxBytes = Math.max(1, Math.floor(maxBytes)); | ||
| const chunks: string[] = []; | ||
| let remaining = text; | ||
|
|
||
| while (remaining.length > 0) { | ||
| if (Buffer.byteLength(remaining, "utf8") <= normalizedMaxBytes) { | ||
| chunks.push(remaining); | ||
| remaining = ""; | ||
| break; | ||
| } | ||
|
|
||
| const boundary = this.findSplitBoundary(remaining, normalizedMaxBytes); | ||
| const head = remaining.slice(0, boundary).trimEnd(); | ||
| const tail = remaining.slice(boundary).trimStart(); | ||
|
|
||
| if (!head || head === remaining) { | ||
| break; | ||
| } | ||
|
|
||
| chunks.push(head); | ||
| remaining = tail; | ||
| } | ||
|
|
||
| if (remaining.length > 0 && chunks.at(-1) !== remaining) { | ||
| chunks.push(remaining); | ||
| } | ||
|
|
||
| return chunks.filter((chunk) => chunk.length > 0); | ||
| } | ||
|
|
||
| private findSplitBoundary(text: string, maxBytes: number): number { | ||
| let low = 1; | ||
| let high = text.length; | ||
| let best = 1; | ||
|
|
||
| while (low <= high) { | ||
| const mid = Math.floor((low + high) / 2); | ||
| const candidate = text.slice(0, mid); | ||
| if (Buffer.byteLength(candidate, "utf8") <= maxBytes) { | ||
| best = mid; | ||
| low = mid + 1; | ||
| } else { | ||
| high = mid - 1; | ||
| } | ||
| } | ||
|
|
||
| for (let i = best - 1; i > 0; i--) { | ||
| const ch = text[i]; | ||
| if (ch === "\n" || ch === " " || ch === "\t" || ch === "\r") { | ||
| return i + 1; | ||
| } | ||
| } | ||
|
|
||
| return best; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does JavaScript String.prototype.slice index UTF-16 code units, and can slicing between a surrogate pair corrupt emoji characters?
💡 Result:
Yes, JavaScript's String.prototype.slice indexes UTF-16 code units, and slicing between a surrogate pair can corrupt emoji characters by splitting them into lone surrogates, which often display as the replacement character � or broken visuals.
Citations:
- 1: https://stackoverflow.com/questions/62341685/javascript-unicode-aware-string-slice
- 2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/slice
- 3: https://medium.com/@zviazran/javascript-unicode-pitfall-why-str-length-breaks-for-emojis-and-how-to-fix-it-115a25ab965e
- 4: https://tc39.es/ecma262/multipage/text-processing.html
- 5: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global%5FObjects/String
String slicing can corrupt emoji and other astral characters at UTF-8 byte boundaries.
slice() and string indexing (text[i]) operate on UTF-16 code units, not Unicode code points. If a 12 KB byte boundary lands between a surrogate pair (which represents emoji and other astral plane characters), the chunks will be corrupted with lone surrogates. Use a code-point/grapheme-aware splitter instead, or add safeguards to ensure splits never occur within surrogate pairs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/index.ts` around lines 2727 - 2781, The splitter currently uses UTF-16
indexing (slice/text[i]) which can split surrogate pairs (emoji/astral chars);
update splitTextByUtf8Bytes and findSplitBoundary to be surrogate-safe by
detecting UTF-16 surrogate pairs with charCodeAt: when computing a boundary, if
boundary > 0 and text.charCodeAt(boundary) is a low surrogate (0xDC00–0xDFFF)
then move boundary one left so you don't split the pair (or conversely if
text.charCodeAt(boundary-1) is a high surrogate (0xD800–0xDBFF) adjust
similarly); apply the same check after trimming head/tail to ensure head !==
remaining and before pushing chunks; you can keep using Buffer.byteLength for
byte checks but must never cut between a high and low surrogate in
findSplitBoundary and splitTextByUtf8Bytes.
Extract core functionality out of src/index.ts into focused modules to reduce file size and improve organization: add src/config.ts (config parsing/validation and env helpers), src/history/cursor.ts (thread/cursor encoding & decoding), src/messages/inbound.ts and src/messages/outbound.ts (message parsing/formatting, upload/attachment helpers), and src/shared/utils.ts (general helpers). Update imports throughout index.ts to use the new modules and remove duplicated helper implementations. Intended to be a pure refactor with no behavioral changes: centralizes logic for easier testing and maintenance and simplifies createMatrixAdapter/config handling.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/index.ts (2)
369-377:⚠️ Potential issue | 🟠 MajorAttachments still inherit
m.in_reply_toin threaded sends.The extra attachments are sent with
replyEventID: undefined(line 376), butapplyThreadReplyMetadatausesreplyEventID ?? rootEventID, so attachments still getm.in_reply_topointing at the thread root whenrootEventIDexists. This was flagged in a previous review and remains unaddressed.To disable reply fallback for attachments, pass
nullinstead ofundefinedand updateapplyThreadReplyMetadatato treatnullas "explicitly no reply".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/index.ts` around lines 369 - 377, The extra attachments inherit m.in_reply_to because sendRoomMessage is called with replyEventID undefined; change the loop to call this.sendRoomMessage(roomID, rootEventID, null, content) so attachments explicitly opt out of reply fallback, and update applyThreadReplyMetadata to treat replyEventID === null as "explicitly no reply" (i.e. don't fallback to rootEventID when replyEventID is null, only fallback when replyEventID is undefined or missing); update any related type signatures for sendRoomMessage and applyThreadReplyMetadata to accept null for replyEventID.
1408-1462:⚠️ Potential issue | 🟠 Major
sendRoomMessagedoesn't return the actual sent content.As noted in a previous review, this method returns only the server response, not the potentially modified content (after
applyThreadReplyMetadataor after splitting onM_TOO_LARGE). Callers likepostMessagefall back to the originalfirstContentfor building the synthetic event, which may not reflect reply metadata or the actual chunked plain-text that was sent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/index.ts` around lines 1408 - 1462, sendRoomMessage currently returns only the server response but not the actual content sent after applyThreadReplyMetadata and after splitting for M_TOO_LARGE, so update sendRoomMessage to return both the Matrix response and the final outbound content that was actually sent (e.g. an object { response, sentContent } or tuple). Inside sendRoomMessage, after creating threadContent via applyThreadReplyMetadata and also after producing chunkWithMeta when handling splitOversizedTextContent, capture the actual content used for the first successful send (from performSendRoomMessage) and include that as sentContent in the return value; update callers (e.g. postMessage) to use the returned sentContent instead of the original firstContent when building synthetic events. Ensure references to performSendRoomMessage, applyThreadReplyMetadata, splitOversizedTextContent and sendRoomMessage are updated accordingly.src/messages/outbound.ts (1)
507-562:⚠️ Potential issue | 🟡 MinorString slicing can corrupt emoji and other astral characters.
As noted in a previous review,
slice()operates on UTF-16 code units, not Unicode code points. If a split boundary lands between a surrogate pair (which represents emoji and other astral plane characters), the chunks will contain lone surrogates that display as replacement characters (�).The
findSplitBoundaryfunction should check for surrogate pairs and adjust the boundary accordingly.🛡️ Suggested fix for surrogate-safe splitting
function findSplitBoundary(text: string, maxBytes: number): number { let low = 1; let high = text.length; let best = 1; while (low <= high) { const mid = Math.floor((low + high) / 2); const candidate = text.slice(0, mid); if (Buffer.byteLength(candidate, "utf8") <= maxBytes) { best = mid; low = mid + 1; } else { high = mid - 1; } } + // Avoid splitting between surrogate pairs + if (best > 0 && best < text.length) { + const code = text.charCodeAt(best); + // If we're at a low surrogate (0xDC00-0xDFFF), move back to not split the pair + if (code >= 0xDC00 && code <= 0xDFFF) { + best = best - 1; + } + } + for (let i = best - 1; i > 0; i--) { const ch = text[i]; if (ch === "\n" || ch === " " || ch === "\t" || ch === "\r") { return i + 1; } } return best; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/messages/outbound.ts` around lines 507 - 562, splitTextByUtf8Bytes/findSplitBoundary currently may slice between UTF-16 surrogate pairs (corrupting emoji); update findSplitBoundary to detect when the chosen boundary falls between a high surrogate (0xD800–0xDBFF) and a low surrogate (0xDC00–0xDFFF) and adjust the boundary (e.g., move boundary back one code unit so the high surrogate stays with its low surrogate) before returning; alternatively iterate using code-point-aware methods (Array.from or String.prototype[@@iterator]) when computing slices in splitTextByUtf8Bytes to ensure you never call slice at a code-point boundary that splits a surrogate pair while preserving the existing byte-length checks.
🧹 Nitpick comments (2)
src/shared/utils.ts (2)
3-13: Minor modulo bias in random character selection.The expression
bytes[i] % alphabet.lengthintroduces slight modulo bias since256 % 36 = 4. Characters A-D appear ~2.8% more frequently than others. For device ID generation, this is acceptable, but for cryptographic purposes it would not be.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shared/utils.ts` around lines 3 - 13, The generateDeviceID function uses bytes[i] % alphabet.length which causes a small modulo bias; change generation in generateDeviceID to use rejection sampling: compute a max usable byte value (floor(256 / alphabet.length) * alphabet.length - 1 or equivalently 256 - (256 % alphabet.length)), loop pulling bytes until value <= max and map value % alphabet.length to index; reference symbols: generateDeviceID, alphabet, bytes, randomBytes and the loop over i to implement the unbiased selection.
29-31: The nullish coalescing fallback is unreachable.
split(":")[0]always returns a string (possibly empty), so?? userIDnever executes. Consider removing it for clarity, or keep it as defensive coding.♻️ Optional simplification
export function matrixLocalpart(userID: string): string { - return userID.startsWith("@") ? userID.slice(1).split(":")[0] ?? userID : userID; + return userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shared/utils.ts` around lines 29 - 31, The matrixLocalpart function uses a nullish coalescing fallback (?? userID) after split(":")[0], but split always yields a string so that fallback is unreachable; update matrixLocalpart to remove the unnecessary "?? userID" and return the computed localpart directly (i.e., return userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID), or if you want defensive behavior instead handle the empty-string case explicitly; locate and edit the matrixLocalpart function to apply this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/history/cursor.ts`:
- Around line 28-38: The decodeThreadId function currently allows an empty
roomID when threadId is "matrix:" because parts[1] can be an empty string;
update decodeThreadId to validate that after decoding parts[1] the roomID is
non-empty (e.g., if !roomID throw an Error with a clear message) so that invalid
thread IDs are rejected; reference the existing symbols threadId, parts,
MATRIX_PREFIX, roomID and rootEventID when making the change.
In `@src/messages/outbound.ts`:
- Around line 86-112: The function applyThreadReplyMetadata currently uses
replyEventID ?? rootEventID which treats undefined the same as omission and
causes attachments to inherit replies; change the signature to accept
replyEventID: string | null | undefined and alter the logic so that if
replyEventID === null you do not add "m.in_reply_to" at all (explicitly disable
reply), otherwise set event_id to replyEventID ?? rootEventID (maintaining the
old fallback when replyEventID is undefined); update any call sites to pass null
when they intend to suppress reply fallback and adjust the type annotation for
replyEventID in applyThreadReplyMetadata accordingly.
---
Duplicate comments:
In `@src/index.ts`:
- Around line 369-377: The extra attachments inherit m.in_reply_to because
sendRoomMessage is called with replyEventID undefined; change the loop to call
this.sendRoomMessage(roomID, rootEventID, null, content) so attachments
explicitly opt out of reply fallback, and update applyThreadReplyMetadata to
treat replyEventID === null as "explicitly no reply" (i.e. don't fallback to
rootEventID when replyEventID is null, only fallback when replyEventID is
undefined or missing); update any related type signatures for sendRoomMessage
and applyThreadReplyMetadata to accept null for replyEventID.
- Around line 1408-1462: sendRoomMessage currently returns only the server
response but not the actual content sent after applyThreadReplyMetadata and
after splitting for M_TOO_LARGE, so update sendRoomMessage to return both the
Matrix response and the final outbound content that was actually sent (e.g. an
object { response, sentContent } or tuple). Inside sendRoomMessage, after
creating threadContent via applyThreadReplyMetadata and also after producing
chunkWithMeta when handling splitOversizedTextContent, capture the actual
content used for the first successful send (from performSendRoomMessage) and
include that as sentContent in the return value; update callers (e.g.
postMessage) to use the returned sentContent instead of the original
firstContent when building synthetic events. Ensure references to
performSendRoomMessage, applyThreadReplyMetadata, splitOversizedTextContent and
sendRoomMessage are updated accordingly.
In `@src/messages/outbound.ts`:
- Around line 507-562: splitTextByUtf8Bytes/findSplitBoundary currently may
slice between UTF-16 surrogate pairs (corrupting emoji); update
findSplitBoundary to detect when the chosen boundary falls between a high
surrogate (0xD800–0xDBFF) and a low surrogate (0xDC00–0xDFFF) and adjust the
boundary (e.g., move boundary back one code unit so the high surrogate stays
with its low surrogate) before returning; alternatively iterate using
code-point-aware methods (Array.from or String.prototype[@@iterator]) when
computing slices in splitTextByUtf8Bytes to ensure you never call slice at a
code-point boundary that splits a surrogate pair while preserving the existing
byte-length checks.
---
Nitpick comments:
In `@src/shared/utils.ts`:
- Around line 3-13: The generateDeviceID function uses bytes[i] %
alphabet.length which causes a small modulo bias; change generation in
generateDeviceID to use rejection sampling: compute a max usable byte value
(floor(256 / alphabet.length) * alphabet.length - 1 or equivalently 256 - (256 %
alphabet.length)), loop pulling bytes until value <= max and map value %
alphabet.length to index; reference symbols: generateDeviceID, alphabet, bytes,
randomBytes and the loop over i to implement the unbiased selection.
- Around line 29-31: The matrixLocalpart function uses a nullish coalescing
fallback (?? userID) after split(":")[0], but split always yields a string so
that fallback is unreachable; update matrixLocalpart to remove the unnecessary
"?? userID" and return the computed localpart directly (i.e., return
userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID), or if you want
defensive behavior instead handle the empty-string case explicitly; locate and
edit the matrixLocalpart function to apply this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 31f4de8e-51dd-471a-a5f2-d200421c441f
📒 Files selected for processing (6)
src/config.tssrc/history/cursor.tssrc/index.tssrc/messages/inbound.tssrc/messages/outbound.tssrc/shared/utils.ts
📜 Review details
🧰 Additional context used
🪛 ast-grep (0.42.1)
src/shared/utils.ts
[warning] 41-44: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
.replaceAll(">", ">")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html
(manual-html-sanitization)
[warning] 41-43: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html
(manual-html-sanitization)
🔇 Additional comments (12)
src/shared/utils.ts (1)
41-48: Static analysis warning is a false positive here.The
escapeHTMLfunction performs correct output encoding by escaping the five critical HTML characters (&,<,>,",'). The static analysis warning about "manual HTML sanitization" applies to parsing/cleaning untrusted HTML input, not to escaping plain text for safe HTML output. This implementation follows the OWASP recommendation for output encoding and is appropriate for its intended use (converting plain text to safe HTML entities forformatted_body).src/messages/inbound.ts (3)
34-63: LGTM - Well-structured parsing with proper fallback handling.The
parseMatrixContentfunction correctly prioritizesformatted_bodyoverbody, strips reply fallbacks, and aggregates mentions from bothm.mentionsand inlinematrix.tolinks. The fallback to plain text with proper msgtype handling for emotes is correct.
122-196: Solid HTML-to-Markdown conversion.The recursive rendering handles common HTML elements well. One observation: text nodes (line 127) return
node.textdirectly without markdown escaping, which is correct since the source is already HTML-encoded content that will be interpreted as markdown.
278-290: Reply fallback stripping logic correctly handles edge cases.The body stripping properly requires a blank line separator after quoted lines and preserves the original body when the format doesn't match the expected reply fallback pattern.
src/config.ts (2)
23-50: LGTM - Comprehensive validation with clear error messages.The validation correctly enforces required fields, positive numeric constraints, and the mutual dependency between
encryptanddecryptfunctions.
72-101: LGTM - Clear auth resolution with helpful error message.The function properly prioritizes password auth when both username and password are present, and provides a clear error message explaining the two authentication options when neither is configured.
src/history/cursor.ts (1)
52-109: LGTM - Thorough cursor validation with descriptive errors.The
decodeCursorV1function performs comprehensive validation including prefix, JSON parsing, expected kind/room/direction matching, and thread context verification. The error messages are specific and helpful for debugging.src/index.ts (2)
1715-1717: LGTM - Eviction logic is now properly threshold-based.The eviction is correctly triggered only when collection size exceeds 10,000, rather than on every insert. The
evictOldestEntrieshelper handles the eviction down to the target size of 5,000.Also applies to: 1955-1957
2668-2689: LGTM - SDK logging and environment-based config creation.The
matrixSDKLogConfiguredflag is now correctly set only inside thesetLevelbranch, andcreateMatrixAdapter()properly delegates tocreateMatrixAdapterConfigFromEnv()for environment-based configuration.src/messages/outbound.ts (3)
144-161: LGTM - Clean message content construction.The
toRoomMessageContentfunction properly builds the Matrix message content with conditionalformat/formatted_bodyandm.mentionsfields.
353-378: LGTM - Comprehensive message rendering with proper type handling.The
renderTextMessagefunction handles all adapter message variants (string, card, raw, markdown, AST) with appropriate fallbacks.
457-463: No type issues present. The marked library (v14.0.0+) includes explicit TypeScript overloads that correctly typemarked.parsewith{ async: false }as returningstring, notstring | Promise<string>. Since the code explicitly passesasync: false, TypeScript will infer the correct return type, making the function signature accurate with no type conflicts.> Likely an incorrect or invalid review comment.
Omit reply fallback metadata for follow-up attachments and avoid cutting UTF-16 surrogate pairs when splitting oversized text. sendRoomMessage now returns both the send response and the actual sent content so resolved events use the correct content; applyThreadReplyMetadata treats replyEventID === null as “do not attach reply metadata” (fixes attachments in threads). Text-splitting logic was hardened: surrogate-aware boundary clamping, trim helpers, and adjusted findSplitBoundary ensure emojis and other surrogate pairs are preserved across chunks. Other refactors: moved matrixMentionDisplayText to shared utils, parallelized attachment upload prep with Promise.all, removed thin cursor wrapper methods in favor of direct helpers, and tightened secrets-bundle persist throttling. Added tests for reply-fallback omission and surrogate-safe splitting.
|
@coderabbitai check all unresolved issues plz |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 63670
🟡 1. Surrogate-pair corruption in text splitter —
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/messages/inbound.ts (1)
132-134: Text nodes are rendered without markdown escaping.In
renderHTMLNodeToMarkdown, text nodes are returned verbatim (node.text). If the HTML contains special markdown characters (e.g.,*,_,`), they'll be interpreted as markdown formatting after conversion. This may produce unintended formatting when the original HTML had literal asterisks or underscores in text.Consider whether this is intentional. If preserving literal characters matters, escape markdown-special characters in text nodes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/messages/inbound.ts` around lines 132 - 134, In renderHTMLNodeToMarkdown, text nodes are returned verbatim (check the NodeType.TEXT_NODE branch that returns node.text), which allows markdown-special characters to be interpreted; update that branch to escape markdown-special characters before returning (e.g., implement or call an escapeMarkdown helper that replaces/backslashes characters like *, _, `, [, ], <, >, and backslash itself) so literal text from node.text is preserved; ensure the helper is used only for TEXT_NODE handling inside renderHTMLNodeToMarkdown to avoid double-escaping elsewhere.src/messages/outbound.ts (1)
73-84: Consider supporting explicitnullreturn for "no reply" semantics.The function returns
undefinedwhen no reply ID is found, butapplyThreadReplyMetadatadistinguishes betweennull(explicitly no reply) andundefined(use root fallback). If messages should be able to explicitly opt out of reply metadata, consider supporting anullvalue inmatrixReplyToEventId.This is only relevant if there's a use case for posting to a thread without any
m.in_reply_to. If not needed, the current implementation is fine.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/messages/outbound.ts` around lines 73 - 84, extractReplyEventID currently only returns a string or undefined, but applyThreadReplyMetadata treats null as an explicit "no reply" sentinel; update extractReplyEventID to return string | null | undefined and treat a message.matrixReplyToEventId === null as an explicit null (i.e., return null) while still returning a non-empty string when present and undefined when the property is absent; change the runtime check in extractReplyEventID (function name: extractReplyEventID) to detect and return null for explicit null values and update any call sites or types that consume extractReplyEventID (e.g., applyThreadReplyMetadata) to accept the new null return case.src/shared/utils.ts (1)
8-10: Minor modulo bias in device ID generation.
bytes[i] % 36introduces slight bias since 256 is not evenly divisible by 36. Characters at indices 0-3 (A-D) have ~2.8% probability vs ~2.7% for others. For an 8-character device ID used purely as an identifier, this is unlikely to cause practical issues.For stricter uniformity, consider rejection sampling or using a power-of-2 alphabet size.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/shared/utils.ts` around lines 8 - 10, The device ID loop that maps random bytes into alphabet characters using bytes[i] % alphabet.length causes a small modulo bias; update the generation in the function that builds `out` from `bytes`/`alphabet` to use rejection sampling: for each character, draw a random byte and accept it only if byte < (Math.floor(256 / alphabet.length) * alphabet.length), otherwise redraw, then use that byte % alphabet.length to index `alphabet`; alternatively switch to an alphabet whose length is a power of two so you can safely mask bits instead of using modulus.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/messages/inbound.ts`:
- Around line 90-95: The current mention detection builds usernameMention even
when userName is empty, so "@" will match any @ in parsed.text or formatted and
produce false positives; change the logic in the username detection so you first
validate userName (e.g., non-empty after trim) and only construct
usernameMention and evaluate hasUserName when userName is present—otherwise set
hasUserName to false; update the variables usernameMention, userName, and
hasUserName usage in this block to short-circuit when userName is empty.
---
Nitpick comments:
In `@src/messages/inbound.ts`:
- Around line 132-134: In renderHTMLNodeToMarkdown, text nodes are returned
verbatim (check the NodeType.TEXT_NODE branch that returns node.text), which
allows markdown-special characters to be interpreted; update that branch to
escape markdown-special characters before returning (e.g., implement or call an
escapeMarkdown helper that replaces/backslashes characters like *, _, `, [, ],
<, >, and backslash itself) so literal text from node.text is preserved; ensure
the helper is used only for TEXT_NODE handling inside renderHTMLNodeToMarkdown
to avoid double-escaping elsewhere.
In `@src/messages/outbound.ts`:
- Around line 73-84: extractReplyEventID currently only returns a string or
undefined, but applyThreadReplyMetadata treats null as an explicit "no reply"
sentinel; update extractReplyEventID to return string | null | undefined and
treat a message.matrixReplyToEventId === null as an explicit null (i.e., return
null) while still returning a non-empty string when present and undefined when
the property is absent; change the runtime check in extractReplyEventID
(function name: extractReplyEventID) to detect and return null for explicit null
values and update any call sites or types that consume extractReplyEventID
(e.g., applyThreadReplyMetadata) to accept the new null return case.
In `@src/shared/utils.ts`:
- Around line 8-10: The device ID loop that maps random bytes into alphabet
characters using bytes[i] % alphabet.length causes a small modulo bias; update
the generation in the function that builds `out` from `bytes`/`alphabet` to use
rejection sampling: for each character, draw a random byte and accept it only if
byte < (Math.floor(256 / alphabet.length) * alphabet.length), otherwise redraw,
then use that byte % alphabet.length to index `alphabet`; alternatively switch
to an alphabet whose length is a power of two so you can safely mask bits
instead of using modulus.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0996a19-c97e-476d-84d0-4b184cbb657a
📒 Files selected for processing (5)
src/index.test.tssrc/index.tssrc/messages/inbound.tssrc/messages/outbound.tssrc/shared/utils.ts
📜 Review details
🧰 Additional context used
🪛 ast-grep (0.42.1)
src/shared/utils.ts
[warning] 41-44: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
.replaceAll(">", ">")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html
(manual-html-sanitization)
[warning] 41-43: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html
(manual-html-sanitization)
🔇 Additional comments (15)
src/messages/inbound.ts (1)
1-309: LGTM!The inbound message parsing module is well-structured with clear separation of concerns. The HTML-to-markdown conversion handles common Matrix elements appropriately, and the mention extraction logic properly combines
m.mentionsmetadata with fallback link parsing.src/shared/utils.ts (2)
41-48: Static analysis hint is a false positive for this context.The
escapeHTMLfunction performs output encoding (escaping user text before embedding in HTML), not input sanitization. The five characters escaped (& < > " ') are the standard set for HTML entity encoding. This is appropriate for safely constructing HTML fragments from trusted text, which is the use case inmergeTextAndLinks.
64-77: LGTM!The
evictOldestEntriesutility correctly leverages JavaScript's guaranteed insertion-order iteration forMapandSetto implement LRU-style eviction. The early return for sizes withinmaxSizeavoids unnecessary work.src/messages/outbound.ts (3)
86-116: LGTM!The
applyThreadReplyMetadatafunction correctly distinguishes betweennull(explicitly disable reply) andundefined(userootEventIDfallback). This addresses the attachment reply inheritance concern from previous reviews.
507-541: LGTM!The
splitTextByUtf8Bytesfunction correctly handles UTF-16 surrogate pairs to avoid corrupting emoji and astral plane characters. TheclampSurrogateBoundaryhelper properly detects and adjusts split points that would break surrogate pairs. The infinite loop protection at lines 528-530 ensures the algorithm terminates.
461-467: No action needed. Withasync: falseexplicitly specified,marked.parsecorrectly returnsstringin TypeScript v14.0.0+. The type is properly narrowed via overloads, and the function signature is accurate.src/index.ts (4)
1394-1454: LGTM!The
sendRoomMessagerefactor correctly handles:
- Thread reply metadata application via
applyThreadReplyMetadataM_TOO_LARGEerror detection and retry with split plain-text chunks- Returning both
responseandsentContentfor accurate synthetic event fallback- Proper error logging and re-throwing for non-recoverable errors
The chunking retry preserves thread metadata on each chunk, maintaining thread context.
363-393: LGTM!The
postMessagemethod correctly passesnullforreplyEventIDon follow-up attachment messages (line 381), ensuring they don't inherit the reply-to metadata. This properly addresses the attachment reply inheritance concern.
1708-1710: LGTM!The cache eviction logic correctly triggers
evictOldestEntrieswhen collections exceed 10,000 entries. The shared utility handles the actual eviction down to 5,000 entries, preventing unbounded memory growth while maintaining a reasonable cache size.Also applies to: 1948-1950
403-442: LGTM!The
editMessagemethod correctly passesnullforreplyEventID, explicitly indicating no reply metadata should be added. WhileapplyThreadReplyMetadatawould already skip modification due to the existingrel_type: Replace, the explicitnullmakes the intent clear.src/index.test.ts (5)
1087-1132: LGTM!The thread reply tests comprehensively cover both the default fallback behavior (replying to root event) and explicit reply targeting via
matrixReplyToEventId. The assertions correctly verify them.relates_to.m.in_reply_to.event_idstructure.
1185-1220: LGTM!The test correctly verifies that follow-up attachment messages in a thread do not inherit
m.relates_tometadata, confirming the fix for the attachment reply inheritance issue.
1301-1413: LGTM!The oversized message retry tests provide excellent coverage:
- Verifies rich-text content is split into plain-text chunks on
M_TOO_LARGE- Confirms stripped formatting fields (
format,formatted_body,m.mentions)- Validates warning logs are emitted without error logs
- Ensures non-splittable content (images) correctly rethrows the error
1493-1509: LGTM!The surrogate-safe splitting test correctly verifies that emoji characters (UTF-16 surrogate pairs) are not corrupted during chunking. The assertions check both that chunks rejoin to the original and that no chunk boundaries fall inside surrogate pairs.
314-350: LGTM!The
makeStateAdapterhelper correctly implements the additionalStateAdaptermethods following the established pattern of wrapping base calls withafterReadyfor proper initialization handling.
Add validation to decodeThreadId to throw when the decoded room ID is empty (e.g. "matrix:") to prevent returning invalid MatrixThreadID objects. Add a unit test to assert this behavior. Also update findSplitBoundary to use clampSurrogateBoundary on its return values to avoid splitting inside Unicode surrogate pairs when chopping long outbound messages.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/messages/outbound.ts (1)
523-537:⚠️ Potential issue | 🟠 MajorDon't drop boundary whitespace while splitting.
Lines 525-526 remove whitespace on both sides of every split, so the resent chunks no longer preserve the original text. It also breaks on whitespace-only prefixes because
headbecomes empty and Line 528 aborts the split.💡 Preserve the original content
const rawHead = remaining.slice(0, boundary); const rawTail = remaining.slice(boundary); - const head = trimEndPreservingSurrogates(rawHead); - const tail = trimStartPreservingSurrogates(rawTail); + const head = rawHead; + const tail = rawTail; - if (!head || head === remaining) { + if (!head || head === remaining) { break; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/messages/outbound.ts` around lines 523 - 537, The split logic is dropping whitespace at the split boundary by using trimEndPreservingSurrogates/rawHead and trimStartPreservingSurrogates/rawTail and aborting when head is empty; change it to preserve original text by using rawHead/rawTail for chunking (or only trim the final assembled message, not each split), push rawHead into chunks and set remaining = rawTail, and remove the early-break when head === remaining so whitespace-only prefixes are preserved; reference variables: rawHead, rawTail, head, tail, trimEndPreservingSurrogates, trimStartPreservingSurrogates, chunks, remaining.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/messages/outbound.ts`:
- Around line 132-145: The code currently bails out and returns [] whenever
body.byteLength <= DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES, which prevents the
plain-text fallback from being attempted after a formatted-message M_TOO_LARGE;
instead remove the early return and the parts.length <= 1 guard so that you
always call splitTextByUtf8Bytes(body, DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES)
and return parts.map(part => ({ body: part, msgtype: content.msgtype })), which
will yield a single-part array when the body fits (allowing a plain-text retry)
and multiple parts when it actually needs chunking; update the logic around the
body variable, splitTextByUtf8Bytes, DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES, and
the map return accordingly.
---
Duplicate comments:
In `@src/messages/outbound.ts`:
- Around line 523-537: The split logic is dropping whitespace at the split
boundary by using trimEndPreservingSurrogates/rawHead and
trimStartPreservingSurrogates/rawTail and aborting when head is empty; change it
to preserve original text by using rawHead/rawTail for chunking (or only trim
the final assembled message, not each split), push rawHead into chunks and set
remaining = rawTail, and remove the early-break when head === remaining so
whitespace-only prefixes are preserved; reference variables: rawHead, rawTail,
head, tail, trimEndPreservingSurrogates, trimStartPreservingSurrogates, chunks,
remaining.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5dc6f7b4-a349-4cfa-8b1f-903bd0eb4323
📒 Files selected for processing (3)
src/history/cursor.tssrc/index.test.tssrc/messages/outbound.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/history/cursor.ts
- src/index.test.ts
📜 Review details
🔇 Additional comments (1)
src/messages/outbound.ts (1)
86-115: Nice fix for explicit reply suppression.Allowing
nullto bypassm.in_reply_tokeeps the oldundefinedfallback without reintroducing attachment reply inheritance.
Multiple changes to improve thread metadata, E2E test setup, and message splitting behavior: - listThreads: use local room data (requireRoom/processThreadRoots/getThread) to prefer live thread metadata, fall back to fetching the latest reply when bundled summaries lag, and compute replyCount using both room and bundled data. - Added fetchLatestThreadReplySummary helper to pull a latest reply when needed. - Sending retries: treat any split chunk result as valid fallback (retry even when there's a single plain-text chunk) and keep the existing warning logging. - outbound splitting: splitOversizedTextContent now returns the original part when it already fits and preserves boundary whitespace when splitting (removed trimming logic that dropped boundary whitespace). - inbound mention handling: isMentioned ignores empty usernames and normalizes username whitespace before matching. - E2E/tests & helpers: add fake-indexeddb to tests, enable indexedDB for e2ee in createParticipantFromSession via a per-session cryptoDatabasePrefix (with sanitization), add waitForCondition async support and switch tests to waitForMatchingMessage/waitForCondition where appropriate, plus several new/updated tests for thread behavior and splitting. - package.json: add fake-indexeddb test dependency. These changes improve accuracy of thread summaries, ensure E2E crypto state isolation across sessions, and avoid losing whitespace when splitting large text messages.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
e2e/helpers.ts (1)
142-147: Use a reversible encoding forcryptoDatabasePrefix.
sanitizeForIndexedDBName()is lossy: distinct IDs like@alice.test:hsand@alice_test:hscollapse to the same prefix. That can make different participants share the same crypto store and undermine the isolation this change is trying to add.♻️ Suggested change
function createE2EIndexedDBPrefix(session: MatrixLoginResponse): string { - return `matrix-chat-adapter-e2e-${sanitizeForIndexedDBName(session.userID)}-${sanitizeForIndexedDBName(session.deviceID)}`; + return `matrix-chat-adapter-e2e-${encodeForIndexedDBName(session.userID)}-${encodeForIndexedDBName(session.deviceID)}`; } -function sanitizeForIndexedDBName(value: string): string { - return value.replace(/[^a-zA-Z0-9_-]+/gu, "_"); +function encodeForIndexedDBName(value: string): string { + return Buffer.from(value, "utf8").toString("base64url"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/helpers.ts` around lines 142 - 147, The current sanitizeForIndexedDBName is lossy and can map distinct IDs to the same string; replace this with a reversible, ID-safe encoding and use it from createE2EIndexedDBPrefix: implement a encodeForIndexedDBName function that performs a URL-safe Base64 encoding (base64url, no padding) of the original value to guarantee uniqueness and reversibility, update createE2EIndexedDBPrefix to call encodeForIndexedDBName for session.userID and session.deviceID instead of sanitizeForIndexedDBName, and ensure the chosen encoding produces only safe characters (A-Z a-z 0-9 - _) for IndexedDB names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/helpers.ts`:
- Around line 296-305: The waitForCondition function can hang if the provided
async predicate never resolves; fix it by enforcing a per-iteration timeout and
by rejecting when the global timeoutMs elapses: inside waitForCondition (the
function to change), compute remaining = timeoutMs - (Date.now() - startedAt) at
the top of each loop and if remaining <= 0 throw; when awaiting condition(), use
Promise.race to race condition() against a per-call timeout Promise that rejects
after remaining (or a capped value) so a stalled predicate can't block the loop;
keep the existing polling interval logic (setTimeout with intervalMs) when
condition() resolves false.
In `@src/messages/inbound.ts`:
- Around line 246-265: parseMatrixToUserID currently calls
decodeURIComponent(firstSegment) which can throw on malformed percent-encodings
and crash parsing; wrap the decodeURIComponent call in a try/catch inside
parseMatrixToUserID (or validate percent-encoding beforehand) and return null on
decode failures so a bad matrix.to hash doesn't abort parseMatrixContent;
reference parseMatrixToUserID, firstSegment, and the identifier variable to
locate where to add the safe decode and fallback.
- Around line 226-243: The anchor text in renderHTMLLinkToMarkdown is inserted
verbatim into the generated Markdown link, which breaks if it contains
characters like ] ( ) or backslashes; update renderHTMLLinkToMarkdown to import
escapeMarkdownLinkText from ../shared/utils and use it to escape the link text
before embedding in `[text](href)` (i.e., replace uses of text in the final link
with escapeMarkdownLinkText(text || matrixMentionDisplayText(mentionedUserID))),
and ensure you still add mentionedUserID to mentionedUserIDs when applicable.
---
Nitpick comments:
In `@e2e/helpers.ts`:
- Around line 142-147: The current sanitizeForIndexedDBName is lossy and can map
distinct IDs to the same string; replace this with a reversible, ID-safe
encoding and use it from createE2EIndexedDBPrefix: implement a
encodeForIndexedDBName function that performs a URL-safe Base64 encoding
(base64url, no padding) of the original value to guarantee uniqueness and
reversibility, update createE2EIndexedDBPrefix to call encodeForIndexedDBName
for session.userID and session.deviceID instead of sanitizeForIndexedDBName, and
ensure the chosen encoding produces only safe characters (A-Z a-z 0-9 - _) for
IndexedDB names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 69ee5f62-0e18-46dc-aad4-19b9820797ba
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (7)
e2e/e2e.test.tse2e/helpers.tspackage.jsonsrc/index.test.tssrc/index.tssrc/messages/inbound.tssrc/messages/outbound.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- package.json
- src/index.ts
📜 Review details
🔇 Additional comments (5)
src/messages/outbound.ts (2)
132-147: Nice fix: plain-text retry now works even when the body already fits.Returning a single stripped chunk here keeps the
M_TOO_LARGEfallback reachable for rich-text events whereformatted_bodyor mentions are the real size offender.
521-580: Nice fix: chunk boundaries are now surrogate-safe.Clamping both the binary-search candidate and the final boundary avoids emitting lone UTF-16 surrogates while still preserving the original text across joins.
src/index.test.ts (2)
460-473: Good regression test for empty room IDs.This pins the
"matrix:"edge case that previously slipped throughdecodeThreadId().
1323-1455: The new split-retry and text-edge-case coverage looks solid.These assertions exercise both multi-chunk and single-chunk
M_TOO_LARGEretries, plus the surrogate-pair and empty-username cases that were easy to regress.Also applies to: 1577-1623
e2e/e2e.test.ts (1)
474-479: These E2E assertions are less flaky now.Matching on event ID avoids false positives from older messages with similar text, and polling
listThreads()is a much better fit for eventually hydrated thread summaries.Also applies to: 630-663
• messages/inbound.ts: Escape markdown link text when rendering HTML anchors (escapeMarkdownLinkText) to avoid accidental markdown injection, and make parseMatrixToUserID resilient to malformed percent-encoding by wrapping decodeURIComponent in a try/catch. • e2e/helpers.ts: Replace sanitizeForIndexedDBName with base64url encoding (encodeForIndexedDBName) for safer IndexedDB keys, and rework waitForCondition to respect timeouts using Promise.race, clearing timers and rejecting on timeout. • src/index.test.ts: Add tests ensuring HTML anchor text is escaped when re-emitting markdown links and that malformed matrix.to links are ignored instead of aborting formatted-body parsing. These changes improve robustness around link handling, encoding for storage, and test coverage for edge cases.
No description provided.