Skip to content

v0.2.1#2

Merged
batuhan merged 9 commits into
mainfrom
batuhan/repply
Apr 13, 2026
Merged

v0.2.1#2
batuhan merged 9 commits into
mainfrom
batuhan/repply

Conversation

@batuhan

@batuhan batuhan commented Apr 12, 2026

Copy link
Copy Markdown
Member

No description provided.

batuhan added 4 commits April 12, 2026 14:36
Bump package version and add handling for oversized Matrix messages and thread reply metadata. sendRoomMessage now accepts an explicit reply target and applies thread reply fallback metadata when posting into Matrix threads. If a Matrix send fails with M_TOO_LARGE/413 for rich-text content, the adapter will split splittable text content into UTF-8 byte-sized chunks (DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES = 12000) and retry as plain-text message chunks, logging a warning; non-text oversized errors are rethrown and logged. Added helper methods for splitting by UTF-8 bytes, detecting splittable content, and improved error handling and client send logic. Updated and added tests to cover reply fallbacks, explicit reply targets, chunked retries, and M_TOO_LARGE behavior. Version bumped to 0.2.1.
Move @chat-adapter/state-memory and @chat-adapter/state-redis from dependencies to devDependencies in package.json because they are only needed for development/test environments.
Multiple refactors and bug fixes in MatrixAdapter:

- Send logic: pass undefined for reply target on extra contents, apply thread/reply metadata before sending, use performSendRoomMessage and persist secrets where appropriate.
- Message chunking: replace utf8ByteLength helper with Buffer.byteLength and simplify boundary search for splitting oversized messages.
- Session keys: remove sessionBasePrefix indirection and use persistenceSessionPrefix directly for session storage keys and username temporary key.
- Eviction and state fixes: only evict processedTimelineEventIDs and reactionByEventID when sizes exceed 10,000, avoiding unnecessary evictions; avoid reassigning userID during login device lookup; simplify joinRoomWithRetry error handling.
- HTML parsing: stop stripping reply fallback markup in parseMatrixHTML and remove the unused stripReplyFallbackFromHTML helper.
- Logging: ensure matrixSDKLogConfigured is set when configuring SDK logger.

These changes improve correctness, reduce unexpected state mutations, and optimize memory/IO behavior during message sending.
@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds thread-reply metadata support (sets m.in_reply_to with optional matrixReplyToEventId), retries oversized text messages rejected with M_TOO_LARGE by splitting into plain-text chunks, introduces new config, cursor, message helper, and shared utils modules, updates tests, and bumps package to 0.2.1 with chat SDK ^4.25.0.

Changes

Cohort / File(s) Summary
Packaging & Changelog
CHANGELOG.md, package.json
Bumped package to 0.2.1; upgraded chat SDK to ^4.25.0; moved @chat-adapter/state-memory and @chat-adapter/state-redis from dependencies to devDependencies; added fake-indexeddb dev dep.
Adapter Core
src/index.ts
Thread-reply wiring (extract matrixReplyToEventId, apply m.in_reply_to), refactored send flow to route through send wrapper with try/catch, retry-on-M_TOO_LARGE by splitting text into surrogate-safe plain-text chunks and reapplying reply metadata per chunk, replaced inline helpers with imports, adjusted cache eviction thresholds, simplified join-room retry bookkeeping, and fixed Matrix SDK log configuration.
Configuration
src/config.ts
New config module: env→config resolution, auth resolution, SDK log level parsing, persistence normalization/validation, defaults (command/prefix/persistence) and helper createMatrixAdapterConfigFromEnv.
History / Cursor
src/history/cursor.ts
New cursor/thread ID utilities: encode/decode thread IDs, channelIdFromThreadId, v1 cursor serialize/validate (mxv1: base64url JSON), and mapping to SDK direction enum.
Inbound Message Parsing
src/messages/inbound.ts
New inbound parser converting Matrix content to markdown/plain text, extracting mentions from m.mentions and matrix.to links, stripping reply-fallbacks, rendering HTML→markdown, and exporting isMentioned.
Outbound Message Preparation
src/messages/outbound.ts
New outbound helpers: render adapter messages to Matrix content (plain/HTML), mention rendering and m.mentions, attachment/upload normalization and reading, reply extraction/apply (matrixReplyToEventId support), isTooLargeMatrixError, surrogate-safe splitOversizedTextContent producing plain-text chunks, and mime/msgtype helpers.
Shared Utilities
src/shared/utils.ts
New utilities: cryptographically secure device ID generator, string normalization/reading, markdown/html escaping, list normalization, IndexedDB presence check, evictOldestEntries, matrixLocalpart/mention display helpers, and isRecord guard.
Tests & Unit
src/index.test.ts, src/**/*.test.ts
Expanded StateAdapter test wrapper to include list/queue operations; added tests for decodeThreadId validation, thread-reply metadata (including explicit matrixReplyToEventId), omission of reply metadata on follow-up attachments, M_TOO_LARGE split/retry behavior and logging, surrogate-safe chunking, mention detection, and thread-list hydration behaviors.
E2E & Helpers
e2e/e2e.test.ts, e2e/helpers.ts
Adjusted e2e assertions to stricter message matching and async polling; enabled IndexedDB-backed E2EE in test helpers, added fake-indexeddb/auto import, and widened waitForCondition to accept async predicates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MatrixAdapter
    participant MatrixServer

    Client->>MatrixAdapter: postMessage(content, optional matrixReplyToEventId)
    MatrixAdapter->>MatrixServer: sendEvent(rich-text / formatted)
    alt MatrixServer responds M_TOO_LARGE / 413
        MatrixAdapter->>MatrixAdapter: splitOversizedTextContent → plain-text chunks
        loop for each chunk
            MatrixAdapter->>MatrixAdapter: applyThreadReplyMetadata(chunk, replyEventID)
            MatrixAdapter->>MatrixServer: sendEvent(plain-text chunk)
            MatrixServer-->>MatrixAdapter: success
        end
        MatrixAdapter-->>Client: return first successful response
    else success
        MatrixServer-->>MatrixAdapter: success
        MatrixAdapter-->>Client: return response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning No pull request description was provided by the author; the PR lacks any explanation of the changes, context, or intent. Add a meaningful description that explains the key changes, their purpose, and any relevant context for reviewers.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'v0.2.1' is vague and generic—it only indicates a version bump without describing the main change or purpose of the PR. Use a more descriptive title that summarizes the primary change, such as 'v0.2.1: Add thread reply metadata and message splitting for oversized messages' or similar.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch batuhan/repply

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/index.ts`:
- Line 437: The extra attachment events still get m.in_reply_to because
sendRoomMessage is being called with undefined for the replyEventID so
applyThreadReplyMetadata resolves replyEventID ?? rootEventID; change the calls
to pass null (explicitly disable reply fallback) instead of undefined when you
intend no reply, e.g. update the sendRoomMessage invocation(s) around
sendRoomMessage(roomID, rootEventID, undefined, content) to use null, and
similarly fix the other occurrences noted (around the block at lines
~2642-2652); alternatively, update applyThreadReplyMetadata to treat undefined
as "no explicit reply" only if you prefer that semantic—reference
sendRoomMessage and applyThreadReplyMetadata and ensure m.in_reply_to is omitted
for attachments when no explicit reply ID is provided.
- Around line 1516-1547: sendRoomMessage currently returns only the server
response, so callers (e.g. postMessage) can't know the actual payload that was
accepted after applyThreadReplyMetadata or after splitting on M_TOO_LARGE;
update sendRoomMessage (and its helpers) to return both the response and the
actual content/event sent. Specifically, change the return shape of
sendRoomMessage to include a "sentContent" (or "sentEvent") alongside the
response from performSendRoomMessage; when you call applyThreadReplyMetadata,
use that modified content as the sentContent; in the M_TOO_LARGE branch, capture
the actual chunk content associated with the first successful response from
performSendRoomMessage (use firstResponse and the corresponding chunkWithMeta)
and return that chunk as sentContent; ensure maybePersistSecretsBundle() is
still invoked and update callers (e.g. postMessage) to consume the new returned
sentContent instead of assuming the original firstContent.
- Around line 2727-2781: The splitter currently uses UTF-16 indexing
(slice/text[i]) which can split surrogate pairs (emoji/astral chars); update
splitTextByUtf8Bytes and findSplitBoundary to be surrogate-safe by detecting
UTF-16 surrogate pairs with charCodeAt: when computing a boundary, if boundary >
0 and text.charCodeAt(boundary) is a low surrogate (0xDC00–0xDFFF) then move
boundary one left so you don't split the pair (or conversely if
text.charCodeAt(boundary-1) is a high surrogate (0xD800–0xDBFF) adjust
similarly); apply the same check after trimming head/tail to ensure head !==
remaining and before pushing chunks; you can keep using Buffer.byteLength for
byte checks but must never cut between a high and low surrogate in
findSplitBoundary and splitTextByUtf8Bytes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e502bd78-811b-463c-96f8-5f32740110f0

📥 Commits

Reviewing files that changed from the base of the PR and between 122c87b and 6130839.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (4)
  • CHANGELOG.md
  • package.json
  • src/index.test.ts
  • src/index.ts

Comment thread src/index.ts Outdated
Comment thread src/index.ts Outdated
Comment thread src/index.ts Outdated
Comment on lines +2727 to +2781
private splitTextByUtf8Bytes(text: string, maxBytes: number): string[] {
const normalizedMaxBytes = Math.max(1, Math.floor(maxBytes));
const chunks: string[] = [];
let remaining = text;

while (remaining.length > 0) {
if (Buffer.byteLength(remaining, "utf8") <= normalizedMaxBytes) {
chunks.push(remaining);
remaining = "";
break;
}

const boundary = this.findSplitBoundary(remaining, normalizedMaxBytes);
const head = remaining.slice(0, boundary).trimEnd();
const tail = remaining.slice(boundary).trimStart();

if (!head || head === remaining) {
break;
}

chunks.push(head);
remaining = tail;
}

if (remaining.length > 0 && chunks.at(-1) !== remaining) {
chunks.push(remaining);
}

return chunks.filter((chunk) => chunk.length > 0);
}

private findSplitBoundary(text: string, maxBytes: number): number {
let low = 1;
let high = text.length;
let best = 1;

while (low <= high) {
const mid = Math.floor((low + high) / 2);
const candidate = text.slice(0, mid);
if (Buffer.byteLength(candidate, "utf8") <= maxBytes) {
best = mid;
low = mid + 1;
} else {
high = mid - 1;
}
}

for (let i = best - 1; i > 0; i--) {
const ch = text[i];
if (ch === "\n" || ch === " " || ch === "\t" || ch === "\r") {
return i + 1;
}
}

return best;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does JavaScript String.prototype.slice index UTF-16 code units, and can slicing between a surrogate pair corrupt emoji characters?

💡 Result:

Yes, JavaScript's String.prototype.slice indexes UTF-16 code units, and slicing between a surrogate pair can corrupt emoji characters by splitting them into lone surrogates, which often display as the replacement character � or broken visuals.

Citations:


String slicing can corrupt emoji and other astral characters at UTF-8 byte boundaries.

slice() and string indexing (text[i]) operate on UTF-16 code units, not Unicode code points. If a 12 KB byte boundary lands between a surrogate pair (which represents emoji and other astral plane characters), the chunks will be corrupted with lone surrogates. Use a code-point/grapheme-aware splitter instead, or add safeguards to ensure splits never occur within surrogate pairs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 2727 - 2781, The splitter currently uses UTF-16
indexing (slice/text[i]) which can split surrogate pairs (emoji/astral chars);
update splitTextByUtf8Bytes and findSplitBoundary to be surrogate-safe by
detecting UTF-16 surrogate pairs with charCodeAt: when computing a boundary, if
boundary > 0 and text.charCodeAt(boundary) is a low surrogate (0xDC00–0xDFFF)
then move boundary one left so you don't split the pair (or conversely if
text.charCodeAt(boundary-1) is a high surrogate (0xD800–0xDBFF) adjust
similarly); apply the same check after trimming head/tail to ensure head !==
remaining and before pushing chunks; you can keep using Buffer.byteLength for
byte checks but must never cut between a high and low surrogate in
findSplitBoundary and splitTextByUtf8Bytes.

Extract core functionality out of src/index.ts into focused modules to reduce file size and improve organization: add src/config.ts (config parsing/validation and env helpers), src/history/cursor.ts (thread/cursor encoding & decoding), src/messages/inbound.ts and src/messages/outbound.ts (message parsing/formatting, upload/attachment helpers), and src/shared/utils.ts (general helpers). Update imports throughout index.ts to use the new modules and remove duplicated helper implementations. Intended to be a pure refactor with no behavioral changes: centralizes logic for easier testing and maintenance and simplifies createMatrixAdapter/config handling.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/index.ts (2)

369-377: ⚠️ Potential issue | 🟠 Major

Attachments still inherit m.in_reply_to in threaded sends.

The extra attachments are sent with replyEventID: undefined (line 376), but applyThreadReplyMetadata uses replyEventID ?? rootEventID, so attachments still get m.in_reply_to pointing at the thread root when rootEventID exists. This was flagged in a previous review and remains unaddressed.

To disable reply fallback for attachments, pass null instead of undefined and update applyThreadReplyMetadata to treat null as "explicitly no reply".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 369 - 377, The extra attachments inherit
m.in_reply_to because sendRoomMessage is called with replyEventID undefined;
change the loop to call this.sendRoomMessage(roomID, rootEventID, null, content)
so attachments explicitly opt out of reply fallback, and update
applyThreadReplyMetadata to treat replyEventID === null as "explicitly no reply"
(i.e. don't fallback to rootEventID when replyEventID is null, only fallback
when replyEventID is undefined or missing); update any related type signatures
for sendRoomMessage and applyThreadReplyMetadata to accept null for
replyEventID.

1408-1462: ⚠️ Potential issue | 🟠 Major

sendRoomMessage doesn't return the actual sent content.

As noted in a previous review, this method returns only the server response, not the potentially modified content (after applyThreadReplyMetadata or after splitting on M_TOO_LARGE). Callers like postMessage fall back to the original firstContent for building the synthetic event, which may not reflect reply metadata or the actual chunked plain-text that was sent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/index.ts` around lines 1408 - 1462, sendRoomMessage currently returns
only the server response but not the actual content sent after
applyThreadReplyMetadata and after splitting for M_TOO_LARGE, so update
sendRoomMessage to return both the Matrix response and the final outbound
content that was actually sent (e.g. an object { response, sentContent } or
tuple). Inside sendRoomMessage, after creating threadContent via
applyThreadReplyMetadata and also after producing chunkWithMeta when handling
splitOversizedTextContent, capture the actual content used for the first
successful send (from performSendRoomMessage) and include that as sentContent in
the return value; update callers (e.g. postMessage) to use the returned
sentContent instead of the original firstContent when building synthetic events.
Ensure references to performSendRoomMessage, applyThreadReplyMetadata,
splitOversizedTextContent and sendRoomMessage are updated accordingly.
src/messages/outbound.ts (1)

507-562: ⚠️ Potential issue | 🟡 Minor

String slicing can corrupt emoji and other astral characters.

As noted in a previous review, slice() operates on UTF-16 code units, not Unicode code points. If a split boundary lands between a surrogate pair (which represents emoji and other astral plane characters), the chunks will contain lone surrogates that display as replacement characters (�).

The findSplitBoundary function should check for surrogate pairs and adjust the boundary accordingly.

🛡️ Suggested fix for surrogate-safe splitting
 function findSplitBoundary(text: string, maxBytes: number): number {
   let low = 1;
   let high = text.length;
   let best = 1;

   while (low <= high) {
     const mid = Math.floor((low + high) / 2);
     const candidate = text.slice(0, mid);
     if (Buffer.byteLength(candidate, "utf8") <= maxBytes) {
       best = mid;
       low = mid + 1;
     } else {
       high = mid - 1;
     }
   }

+  // Avoid splitting between surrogate pairs
+  if (best > 0 && best < text.length) {
+    const code = text.charCodeAt(best);
+    // If we're at a low surrogate (0xDC00-0xDFFF), move back to not split the pair
+    if (code >= 0xDC00 && code <= 0xDFFF) {
+      best = best - 1;
+    }
+  }
+
   for (let i = best - 1; i > 0; i--) {
     const ch = text[i];
     if (ch === "\n" || ch === " " || ch === "\t" || ch === "\r") {
       return i + 1;
     }
   }

   return best;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messages/outbound.ts` around lines 507 - 562,
splitTextByUtf8Bytes/findSplitBoundary currently may slice between UTF-16
surrogate pairs (corrupting emoji); update findSplitBoundary to detect when the
chosen boundary falls between a high surrogate (0xD800–0xDBFF) and a low
surrogate (0xDC00–0xDFFF) and adjust the boundary (e.g., move boundary back one
code unit so the high surrogate stays with its low surrogate) before returning;
alternatively iterate using code-point-aware methods (Array.from or
String.prototype[@@iterator]) when computing slices in splitTextByUtf8Bytes to
ensure you never call slice at a code-point boundary that splits a surrogate
pair while preserving the existing byte-length checks.
🧹 Nitpick comments (2)
src/shared/utils.ts (2)

3-13: Minor modulo bias in random character selection.

The expression bytes[i] % alphabet.length introduces slight modulo bias since 256 % 36 = 4. Characters A-D appear ~2.8% more frequently than others. For device ID generation, this is acceptable, but for cryptographic purposes it would not be.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/shared/utils.ts` around lines 3 - 13, The generateDeviceID function uses
bytes[i] % alphabet.length which causes a small modulo bias; change generation
in generateDeviceID to use rejection sampling: compute a max usable byte value
(floor(256 / alphabet.length) * alphabet.length - 1 or equivalently 256 - (256 %
alphabet.length)), loop pulling bytes until value <= max and map value %
alphabet.length to index; reference symbols: generateDeviceID, alphabet, bytes,
randomBytes and the loop over i to implement the unbiased selection.

29-31: The nullish coalescing fallback is unreachable.

split(":")[0] always returns a string (possibly empty), so ?? userID never executes. Consider removing it for clarity, or keep it as defensive coding.

♻️ Optional simplification
 export function matrixLocalpart(userID: string): string {
-  return userID.startsWith("@") ? userID.slice(1).split(":")[0] ?? userID : userID;
+  return userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/shared/utils.ts` around lines 29 - 31, The matrixLocalpart function uses
a nullish coalescing fallback (?? userID) after split(":")[0], but split always
yields a string so that fallback is unreachable; update matrixLocalpart to
remove the unnecessary "?? userID" and return the computed localpart directly
(i.e., return userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID),
or if you want defensive behavior instead handle the empty-string case
explicitly; locate and edit the matrixLocalpart function to apply this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/history/cursor.ts`:
- Around line 28-38: The decodeThreadId function currently allows an empty
roomID when threadId is "matrix:" because parts[1] can be an empty string;
update decodeThreadId to validate that after decoding parts[1] the roomID is
non-empty (e.g., if !roomID throw an Error with a clear message) so that invalid
thread IDs are rejected; reference the existing symbols threadId, parts,
MATRIX_PREFIX, roomID and rootEventID when making the change.

In `@src/messages/outbound.ts`:
- Around line 86-112: The function applyThreadReplyMetadata currently uses
replyEventID ?? rootEventID which treats undefined the same as omission and
causes attachments to inherit replies; change the signature to accept
replyEventID: string | null | undefined and alter the logic so that if
replyEventID === null you do not add "m.in_reply_to" at all (explicitly disable
reply), otherwise set event_id to replyEventID ?? rootEventID (maintaining the
old fallback when replyEventID is undefined); update any call sites to pass null
when they intend to suppress reply fallback and adjust the type annotation for
replyEventID in applyThreadReplyMetadata accordingly.

---

Duplicate comments:
In `@src/index.ts`:
- Around line 369-377: The extra attachments inherit m.in_reply_to because
sendRoomMessage is called with replyEventID undefined; change the loop to call
this.sendRoomMessage(roomID, rootEventID, null, content) so attachments
explicitly opt out of reply fallback, and update applyThreadReplyMetadata to
treat replyEventID === null as "explicitly no reply" (i.e. don't fallback to
rootEventID when replyEventID is null, only fallback when replyEventID is
undefined or missing); update any related type signatures for sendRoomMessage
and applyThreadReplyMetadata to accept null for replyEventID.
- Around line 1408-1462: sendRoomMessage currently returns only the server
response but not the actual content sent after applyThreadReplyMetadata and
after splitting for M_TOO_LARGE, so update sendRoomMessage to return both the
Matrix response and the final outbound content that was actually sent (e.g. an
object { response, sentContent } or tuple). Inside sendRoomMessage, after
creating threadContent via applyThreadReplyMetadata and also after producing
chunkWithMeta when handling splitOversizedTextContent, capture the actual
content used for the first successful send (from performSendRoomMessage) and
include that as sentContent in the return value; update callers (e.g.
postMessage) to use the returned sentContent instead of the original
firstContent when building synthetic events. Ensure references to
performSendRoomMessage, applyThreadReplyMetadata, splitOversizedTextContent and
sendRoomMessage are updated accordingly.

In `@src/messages/outbound.ts`:
- Around line 507-562: splitTextByUtf8Bytes/findSplitBoundary currently may
slice between UTF-16 surrogate pairs (corrupting emoji); update
findSplitBoundary to detect when the chosen boundary falls between a high
surrogate (0xD800–0xDBFF) and a low surrogate (0xDC00–0xDFFF) and adjust the
boundary (e.g., move boundary back one code unit so the high surrogate stays
with its low surrogate) before returning; alternatively iterate using
code-point-aware methods (Array.from or String.prototype[@@iterator]) when
computing slices in splitTextByUtf8Bytes to ensure you never call slice at a
code-point boundary that splits a surrogate pair while preserving the existing
byte-length checks.

---

Nitpick comments:
In `@src/shared/utils.ts`:
- Around line 3-13: The generateDeviceID function uses bytes[i] %
alphabet.length which causes a small modulo bias; change generation in
generateDeviceID to use rejection sampling: compute a max usable byte value
(floor(256 / alphabet.length) * alphabet.length - 1 or equivalently 256 - (256 %
alphabet.length)), loop pulling bytes until value <= max and map value %
alphabet.length to index; reference symbols: generateDeviceID, alphabet, bytes,
randomBytes and the loop over i to implement the unbiased selection.
- Around line 29-31: The matrixLocalpart function uses a nullish coalescing
fallback (?? userID) after split(":")[0], but split always yields a string so
that fallback is unreachable; update matrixLocalpart to remove the unnecessary
"?? userID" and return the computed localpart directly (i.e., return
userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID), or if you want
defensive behavior instead handle the empty-string case explicitly; locate and
edit the matrixLocalpart function to apply this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31f4de8e-51dd-471a-a5f2-d200421c441f

📥 Commits

Reviewing files that changed from the base of the PR and between 6130839 and bd6a906.

📒 Files selected for processing (6)
  • src/config.ts
  • src/history/cursor.ts
  • src/index.ts
  • src/messages/inbound.ts
  • src/messages/outbound.ts
  • src/shared/utils.ts
📜 Review details
🧰 Additional context used
🪛 ast-grep (0.42.1)
src/shared/utils.ts

[warning] 41-44: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
.replaceAll(">", ">")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html

(manual-html-sanitization)


[warning] 41-43: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html

(manual-html-sanitization)

🔇 Additional comments (12)
src/shared/utils.ts (1)

41-48: Static analysis warning is a false positive here.

The escapeHTML function performs correct output encoding by escaping the five critical HTML characters (&, <, >, ", '). The static analysis warning about "manual HTML sanitization" applies to parsing/cleaning untrusted HTML input, not to escaping plain text for safe HTML output. This implementation follows the OWASP recommendation for output encoding and is appropriate for its intended use (converting plain text to safe HTML entities for formatted_body).

src/messages/inbound.ts (3)

34-63: LGTM - Well-structured parsing with proper fallback handling.

The parseMatrixContent function correctly prioritizes formatted_body over body, strips reply fallbacks, and aggregates mentions from both m.mentions and inline matrix.to links. The fallback to plain text with proper msgtype handling for emotes is correct.


122-196: Solid HTML-to-Markdown conversion.

The recursive rendering handles common HTML elements well. One observation: text nodes (line 127) return node.text directly without markdown escaping, which is correct since the source is already HTML-encoded content that will be interpreted as markdown.


278-290: Reply fallback stripping logic correctly handles edge cases.

The body stripping properly requires a blank line separator after quoted lines and preserves the original body when the format doesn't match the expected reply fallback pattern.

src/config.ts (2)

23-50: LGTM - Comprehensive validation with clear error messages.

The validation correctly enforces required fields, positive numeric constraints, and the mutual dependency between encrypt and decrypt functions.


72-101: LGTM - Clear auth resolution with helpful error message.

The function properly prioritizes password auth when both username and password are present, and provides a clear error message explaining the two authentication options when neither is configured.

src/history/cursor.ts (1)

52-109: LGTM - Thorough cursor validation with descriptive errors.

The decodeCursorV1 function performs comprehensive validation including prefix, JSON parsing, expected kind/room/direction matching, and thread context verification. The error messages are specific and helpful for debugging.

src/index.ts (2)

1715-1717: LGTM - Eviction logic is now properly threshold-based.

The eviction is correctly triggered only when collection size exceeds 10,000, rather than on every insert. The evictOldestEntries helper handles the eviction down to the target size of 5,000.

Also applies to: 1955-1957


2668-2689: LGTM - SDK logging and environment-based config creation.

The matrixSDKLogConfigured flag is now correctly set only inside the setLevel branch, and createMatrixAdapter() properly delegates to createMatrixAdapterConfigFromEnv() for environment-based configuration.

src/messages/outbound.ts (3)

144-161: LGTM - Clean message content construction.

The toRoomMessageContent function properly builds the Matrix message content with conditional format/formatted_body and m.mentions fields.


353-378: LGTM - Comprehensive message rendering with proper type handling.

The renderTextMessage function handles all adapter message variants (string, card, raw, markdown, AST) with appropriate fallbacks.


457-463: No type issues present. The marked library (v14.0.0+) includes explicit TypeScript overloads that correctly type marked.parse with { async: false } as returning string, not string | Promise<string>. Since the code explicitly passes async: false, TypeScript will infer the correct return type, making the function signature accurate with no type conflicts.

			> Likely an incorrect or invalid review comment.

Comment thread src/history/cursor.ts
Comment thread src/messages/outbound.ts
Omit reply fallback metadata for follow-up attachments and avoid cutting UTF-16 surrogate pairs when splitting oversized text. sendRoomMessage now returns both the send response and the actual sent content so resolved events use the correct content; applyThreadReplyMetadata treats replyEventID === null as “do not attach reply metadata” (fixes attachments in threads). Text-splitting logic was hardened: surrogate-aware boundary clamping, trim helpers, and adjusted findSplitBoundary ensure emojis and other surrogate pairs are preserved across chunks. Other refactors: moved matrixMentionDisplayText to shared utils, parallelized attachment upload prep with Promise.all, removed thin cursor wrapper methods in favor of direct helpers, and tightened secrets-bundle persist throttling. Added tests for reply-fallback omission and surrogate-safe splitting.
@batuhan

batuhan commented Apr 13, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai check all unresolved issues plz

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get all review comments on PR `#2`
gh pr view 2 --repo beeper/chat-adapter-matrix --json reviews,comments,reviewRequests

# Get review threads (inline comments) to check resolved status
gh api repos/beeper/chat-adapter-matrix/pulls/2/comments --jq '.[] | {id: .id, body: .body, path: .path, line: .line, resolved: .resolved}'

Length of output: 63670


@batuhan Here's a summary of all unresolved issues across the two review rounds:


🟡 1. Surrogate-pair corruption in text splitter — src/messages/outbound.ts (lines 507–562)

findSplitBoundary and splitTextByUtf8Bytes use UTF-16 slice()/text[i] indexing. If a 12 KB byte boundary falls between a high surrogate (0xD800–0xDBFF) and a low surrogate (0xDC00–0xDFFF) (e.g., inside an emoji), the resulting chunk will contain a lone surrogate and display as \uFFFD.

Suggested fix — add a surrogate-pair guard in findSplitBoundary:

+ // Avoid splitting between surrogate pairs
+ if (best > 0 && best < text.length) {
+   const code = text.charCodeAt(best);
+   if (code >= 0xDC00 && code <= 0xDFFF) {
+     best = best - 1;
+   }
+ }

🟡 2. Empty roomID not rejected — src/history/cursor.ts (line 38)

decodeThreadId("matrix:") produces parts[1] = "", which passes the length check but yields an empty roomID.

Suggested fix:

  const roomID = decodeURIComponent(parts[1]);
+ if (!roomID) {
+   throw new Error(`Invalid Matrix thread ID: ${threadId}`);
+ }

🔵 3. (Nitpick) Modulo bias in generateDeviceIDsrc/shared/utils.ts (lines 3–13)

bytes[i] % alphabet.length introduces a slight bias since 256 % 36 = 4 (chars A–D appear ~2.8% more often). Acceptable for device IDs but worth noting.


🔵 4. (Nitpick) Unreachable nullish coalescing in matrixLocalpartsrc/shared/utils.ts (lines 29–31)

split(":")[0] always returns a string, so ?? userID is never executed.

- return userID.startsWith("@") ? userID.slice(1).split(":")[0] ?? userID : userID;
+ return userID.startsWith("@") ? userID.slice(1).split(":")[0] : userID;

Previously flagged issues that are now resolved (via commit bf66abb):

  • ✅ Attachments inheriting m.in_reply_to in threaded sends
  • sendRoomMessage not returning actual sent content
  • applyThreadReplyMetadata fallback causing attachment reply inheritance

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/messages/inbound.ts (1)

132-134: Text nodes are rendered without markdown escaping.

In renderHTMLNodeToMarkdown, text nodes are returned verbatim (node.text). If the HTML contains special markdown characters (e.g., *, _, `), they'll be interpreted as markdown formatting after conversion. This may produce unintended formatting when the original HTML had literal asterisks or underscores in text.

Consider whether this is intentional. If preserving literal characters matters, escape markdown-special characters in text nodes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messages/inbound.ts` around lines 132 - 134, In renderHTMLNodeToMarkdown,
text nodes are returned verbatim (check the NodeType.TEXT_NODE branch that
returns node.text), which allows markdown-special characters to be interpreted;
update that branch to escape markdown-special characters before returning (e.g.,
implement or call an escapeMarkdown helper that replaces/backslashes characters
like *, _, `, [, ], <, >, and backslash itself) so literal text from node.text
is preserved; ensure the helper is used only for TEXT_NODE handling inside
renderHTMLNodeToMarkdown to avoid double-escaping elsewhere.
src/messages/outbound.ts (1)

73-84: Consider supporting explicit null return for "no reply" semantics.

The function returns undefined when no reply ID is found, but applyThreadReplyMetadata distinguishes between null (explicitly no reply) and undefined (use root fallback). If messages should be able to explicitly opt out of reply metadata, consider supporting a null value in matrixReplyToEventId.

This is only relevant if there's a use case for posting to a thread without any m.in_reply_to. If not needed, the current implementation is fine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messages/outbound.ts` around lines 73 - 84, extractReplyEventID currently
only returns a string or undefined, but applyThreadReplyMetadata treats null as
an explicit "no reply" sentinel; update extractReplyEventID to return string |
null | undefined and treat a message.matrixReplyToEventId === null as an
explicit null (i.e., return null) while still returning a non-empty string when
present and undefined when the property is absent; change the runtime check in
extractReplyEventID (function name: extractReplyEventID) to detect and return
null for explicit null values and update any call sites or types that consume
extractReplyEventID (e.g., applyThreadReplyMetadata) to accept the new null
return case.
src/shared/utils.ts (1)

8-10: Minor modulo bias in device ID generation.

bytes[i] % 36 introduces slight bias since 256 is not evenly divisible by 36. Characters at indices 0-3 (A-D) have ~2.8% probability vs ~2.7% for others. For an 8-character device ID used purely as an identifier, this is unlikely to cause practical issues.

For stricter uniformity, consider rejection sampling or using a power-of-2 alphabet size.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/shared/utils.ts` around lines 8 - 10, The device ID loop that maps random
bytes into alphabet characters using bytes[i] % alphabet.length causes a small
modulo bias; update the generation in the function that builds `out` from
`bytes`/`alphabet` to use rejection sampling: for each character, draw a random
byte and accept it only if byte < (Math.floor(256 / alphabet.length) *
alphabet.length), otherwise redraw, then use that byte % alphabet.length to
index `alphabet`; alternatively switch to an alphabet whose length is a power of
two so you can safely mask bits instead of using modulus.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/messages/inbound.ts`:
- Around line 90-95: The current mention detection builds usernameMention even
when userName is empty, so "@" will match any @ in parsed.text or formatted and
produce false positives; change the logic in the username detection so you first
validate userName (e.g., non-empty after trim) and only construct
usernameMention and evaluate hasUserName when userName is present—otherwise set
hasUserName to false; update the variables usernameMention, userName, and
hasUserName usage in this block to short-circuit when userName is empty.

---

Nitpick comments:
In `@src/messages/inbound.ts`:
- Around line 132-134: In renderHTMLNodeToMarkdown, text nodes are returned
verbatim (check the NodeType.TEXT_NODE branch that returns node.text), which
allows markdown-special characters to be interpreted; update that branch to
escape markdown-special characters before returning (e.g., implement or call an
escapeMarkdown helper that replaces/backslashes characters like *, _, `, [, ],
<, >, and backslash itself) so literal text from node.text is preserved; ensure
the helper is used only for TEXT_NODE handling inside renderHTMLNodeToMarkdown
to avoid double-escaping elsewhere.

In `@src/messages/outbound.ts`:
- Around line 73-84: extractReplyEventID currently only returns a string or
undefined, but applyThreadReplyMetadata treats null as an explicit "no reply"
sentinel; update extractReplyEventID to return string | null | undefined and
treat a message.matrixReplyToEventId === null as an explicit null (i.e., return
null) while still returning a non-empty string when present and undefined when
the property is absent; change the runtime check in extractReplyEventID
(function name: extractReplyEventID) to detect and return null for explicit null
values and update any call sites or types that consume extractReplyEventID
(e.g., applyThreadReplyMetadata) to accept the new null return case.

In `@src/shared/utils.ts`:
- Around line 8-10: The device ID loop that maps random bytes into alphabet
characters using bytes[i] % alphabet.length causes a small modulo bias; update
the generation in the function that builds `out` from `bytes`/`alphabet` to use
rejection sampling: for each character, draw a random byte and accept it only if
byte < (Math.floor(256 / alphabet.length) * alphabet.length), otherwise redraw,
then use that byte % alphabet.length to index `alphabet`; alternatively switch
to an alphabet whose length is a power of two so you can safely mask bits
instead of using modulus.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0996a19-c97e-476d-84d0-4b184cbb657a

📥 Commits

Reviewing files that changed from the base of the PR and between bd6a906 and bf66abb.

📒 Files selected for processing (5)
  • src/index.test.ts
  • src/index.ts
  • src/messages/inbound.ts
  • src/messages/outbound.ts
  • src/shared/utils.ts
📜 Review details
🧰 Additional context used
🪛 ast-grep (0.42.1)
src/shared/utils.ts

[warning] 41-44: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
.replaceAll(">", ">")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html

(manual-html-sanitization)


[warning] 41-43: Manual HTML sanitization detected using string replacement methods. Manual sanitization is error-prone and can be bypassed. Use dedicated HTML sanitization libraries like 'sanitize-html' or 'DOMPurify' instead.
Context: value
.replaceAll("&", "&")
.replaceAll("<", "<")
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
- https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
- https://cwe.mitre.org/data/definitions/79.html

(manual-html-sanitization)

🔇 Additional comments (15)
src/messages/inbound.ts (1)

1-309: LGTM!

The inbound message parsing module is well-structured with clear separation of concerns. The HTML-to-markdown conversion handles common Matrix elements appropriately, and the mention extraction logic properly combines m.mentions metadata with fallback link parsing.

src/shared/utils.ts (2)

41-48: Static analysis hint is a false positive for this context.

The escapeHTML function performs output encoding (escaping user text before embedding in HTML), not input sanitization. The five characters escaped (& < > " ') are the standard set for HTML entity encoding. This is appropriate for safely constructing HTML fragments from trusted text, which is the use case in mergeTextAndLinks.


64-77: LGTM!

The evictOldestEntries utility correctly leverages JavaScript's guaranteed insertion-order iteration for Map and Set to implement LRU-style eviction. The early return for sizes within maxSize avoids unnecessary work.

src/messages/outbound.ts (3)

86-116: LGTM!

The applyThreadReplyMetadata function correctly distinguishes between null (explicitly disable reply) and undefined (use rootEventID fallback). This addresses the attachment reply inheritance concern from previous reviews.


507-541: LGTM!

The splitTextByUtf8Bytes function correctly handles UTF-16 surrogate pairs to avoid corrupting emoji and astral plane characters. The clampSurrogateBoundary helper properly detects and adjusts split points that would break surrogate pairs. The infinite loop protection at lines 528-530 ensures the algorithm terminates.


461-467: No action needed. With async: false explicitly specified, marked.parse correctly returns string in TypeScript v14.0.0+. The type is properly narrowed via overloads, and the function signature is accurate.

src/index.ts (4)

1394-1454: LGTM!

The sendRoomMessage refactor correctly handles:

  1. Thread reply metadata application via applyThreadReplyMetadata
  2. M_TOO_LARGE error detection and retry with split plain-text chunks
  3. Returning both response and sentContent for accurate synthetic event fallback
  4. Proper error logging and re-throwing for non-recoverable errors

The chunking retry preserves thread metadata on each chunk, maintaining thread context.


363-393: LGTM!

The postMessage method correctly passes null for replyEventID on follow-up attachment messages (line 381), ensuring they don't inherit the reply-to metadata. This properly addresses the attachment reply inheritance concern.


1708-1710: LGTM!

The cache eviction logic correctly triggers evictOldestEntries when collections exceed 10,000 entries. The shared utility handles the actual eviction down to 5,000 entries, preventing unbounded memory growth while maintaining a reasonable cache size.

Also applies to: 1948-1950


403-442: LGTM!

The editMessage method correctly passes null for replyEventID, explicitly indicating no reply metadata should be added. While applyThreadReplyMetadata would already skip modification due to the existing rel_type: Replace, the explicit null makes the intent clear.

src/index.test.ts (5)

1087-1132: LGTM!

The thread reply tests comprehensively cover both the default fallback behavior (replying to root event) and explicit reply targeting via matrixReplyToEventId. The assertions correctly verify the m.relates_to.m.in_reply_to.event_id structure.


1185-1220: LGTM!

The test correctly verifies that follow-up attachment messages in a thread do not inherit m.relates_to metadata, confirming the fix for the attachment reply inheritance issue.


1301-1413: LGTM!

The oversized message retry tests provide excellent coverage:

  1. Verifies rich-text content is split into plain-text chunks on M_TOO_LARGE
  2. Confirms stripped formatting fields (format, formatted_body, m.mentions)
  3. Validates warning logs are emitted without error logs
  4. Ensures non-splittable content (images) correctly rethrows the error

1493-1509: LGTM!

The surrogate-safe splitting test correctly verifies that emoji characters (UTF-16 surrogate pairs) are not corrupted during chunking. The assertions check both that chunks rejoin to the original and that no chunk boundaries fall inside surrogate pairs.


314-350: LGTM!

The makeStateAdapter helper correctly implements the additional StateAdapter methods following the established pattern of wrapping base calls with afterReady for proper initialization handling.

Comment thread src/messages/inbound.ts Outdated
Add validation to decodeThreadId to throw when the decoded room ID is empty (e.g. "matrix:") to prevent returning invalid MatrixThreadID objects. Add a unit test to assert this behavior. Also update findSplitBoundary to use clampSurrogateBoundary on its return values to avoid splitting inside Unicode surrogate pairs when chopping long outbound messages.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/messages/outbound.ts (1)

523-537: ⚠️ Potential issue | 🟠 Major

Don't drop boundary whitespace while splitting.

Lines 525-526 remove whitespace on both sides of every split, so the resent chunks no longer preserve the original text. It also breaks on whitespace-only prefixes because head becomes empty and Line 528 aborts the split.

💡 Preserve the original content
     const rawHead = remaining.slice(0, boundary);
     const rawTail = remaining.slice(boundary);
-    const head = trimEndPreservingSurrogates(rawHead);
-    const tail = trimStartPreservingSurrogates(rawTail);
+    const head = rawHead;
+    const tail = rawTail;

-    if (!head || head === remaining) {
+    if (!head || head === remaining) {
       break;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/messages/outbound.ts` around lines 523 - 537, The split logic is dropping
whitespace at the split boundary by using trimEndPreservingSurrogates/rawHead
and trimStartPreservingSurrogates/rawTail and aborting when head is empty;
change it to preserve original text by using rawHead/rawTail for chunking (or
only trim the final assembled message, not each split), push rawHead into chunks
and set remaining = rawTail, and remove the early-break when head === remaining
so whitespace-only prefixes are preserved; reference variables: rawHead,
rawTail, head, tail, trimEndPreservingSurrogates, trimStartPreservingSurrogates,
chunks, remaining.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/messages/outbound.ts`:
- Around line 132-145: The code currently bails out and returns [] whenever
body.byteLength <= DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES, which prevents the
plain-text fallback from being attempted after a formatted-message M_TOO_LARGE;
instead remove the early return and the parts.length <= 1 guard so that you
always call splitTextByUtf8Bytes(body, DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES)
and return parts.map(part => ({ body: part, msgtype: content.msgtype })), which
will yield a single-part array when the body fits (allowing a plain-text retry)
and multiple parts when it actually needs chunking; update the logic around the
body variable, splitTextByUtf8Bytes, DEFAULT_OVERSIZED_MESSAGE_CHUNK_BYTES, and
the map return accordingly.

---

Duplicate comments:
In `@src/messages/outbound.ts`:
- Around line 523-537: The split logic is dropping whitespace at the split
boundary by using trimEndPreservingSurrogates/rawHead and
trimStartPreservingSurrogates/rawTail and aborting when head is empty; change it
to preserve original text by using rawHead/rawTail for chunking (or only trim
the final assembled message, not each split), push rawHead into chunks and set
remaining = rawTail, and remove the early-break when head === remaining so
whitespace-only prefixes are preserved; reference variables: rawHead, rawTail,
head, tail, trimEndPreservingSurrogates, trimStartPreservingSurrogates, chunks,
remaining.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5dc6f7b4-a349-4cfa-8b1f-903bd0eb4323

📥 Commits

Reviewing files that changed from the base of the PR and between bf66abb and f867b64.

📒 Files selected for processing (3)
  • src/history/cursor.ts
  • src/index.test.ts
  • src/messages/outbound.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/history/cursor.ts
  • src/index.test.ts
📜 Review details
🔇 Additional comments (1)
src/messages/outbound.ts (1)

86-115: Nice fix for explicit reply suppression.

Allowing null to bypass m.in_reply_to keeps the old undefined fallback without reintroducing attachment reply inheritance.

Comment thread src/messages/outbound.ts
Multiple changes to improve thread metadata, E2E test setup, and message splitting behavior:

- listThreads: use local room data (requireRoom/processThreadRoots/getThread) to prefer live thread metadata, fall back to fetching the latest reply when bundled summaries lag, and compute replyCount using both room and bundled data.
- Added fetchLatestThreadReplySummary helper to pull a latest reply when needed.
- Sending retries: treat any split chunk result as valid fallback (retry even when there's a single plain-text chunk) and keep the existing warning logging.
- outbound splitting: splitOversizedTextContent now returns the original part when it already fits and preserves boundary whitespace when splitting (removed trimming logic that dropped boundary whitespace).
- inbound mention handling: isMentioned ignores empty usernames and normalizes username whitespace before matching.
- E2E/tests & helpers: add fake-indexeddb to tests, enable indexedDB for e2ee in createParticipantFromSession via a per-session cryptoDatabasePrefix (with sanitization), add waitForCondition async support and switch tests to waitForMatchingMessage/waitForCondition where appropriate, plus several new/updated tests for thread behavior and splitting.
- package.json: add fake-indexeddb test dependency.

These changes improve accuracy of thread summaries, ensure E2E crypto state isolation across sessions, and avoid losing whitespace when splitting large text messages.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
e2e/helpers.ts (1)

142-147: Use a reversible encoding for cryptoDatabasePrefix.

sanitizeForIndexedDBName() is lossy: distinct IDs like @alice.test:hs and @alice_test:hs collapse to the same prefix. That can make different participants share the same crypto store and undermine the isolation this change is trying to add.

♻️ Suggested change
 function createE2EIndexedDBPrefix(session: MatrixLoginResponse): string {
-  return `matrix-chat-adapter-e2e-${sanitizeForIndexedDBName(session.userID)}-${sanitizeForIndexedDBName(session.deviceID)}`;
+  return `matrix-chat-adapter-e2e-${encodeForIndexedDBName(session.userID)}-${encodeForIndexedDBName(session.deviceID)}`;
 }
 
-function sanitizeForIndexedDBName(value: string): string {
-  return value.replace(/[^a-zA-Z0-9_-]+/gu, "_");
+function encodeForIndexedDBName(value: string): string {
+  return Buffer.from(value, "utf8").toString("base64url");
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/helpers.ts` around lines 142 - 147, The current sanitizeForIndexedDBName
is lossy and can map distinct IDs to the same string; replace this with a
reversible, ID-safe encoding and use it from createE2EIndexedDBPrefix: implement
a encodeForIndexedDBName function that performs a URL-safe Base64 encoding
(base64url, no padding) of the original value to guarantee uniqueness and
reversibility, update createE2EIndexedDBPrefix to call encodeForIndexedDBName
for session.userID and session.deviceID instead of sanitizeForIndexedDBName, and
ensure the chosen encoding produces only safe characters (A-Z a-z 0-9 - _) for
IndexedDB names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e/helpers.ts`:
- Around line 296-305: The waitForCondition function can hang if the provided
async predicate never resolves; fix it by enforcing a per-iteration timeout and
by rejecting when the global timeoutMs elapses: inside waitForCondition (the
function to change), compute remaining = timeoutMs - (Date.now() - startedAt) at
the top of each loop and if remaining <= 0 throw; when awaiting condition(), use
Promise.race to race condition() against a per-call timeout Promise that rejects
after remaining (or a capped value) so a stalled predicate can't block the loop;
keep the existing polling interval logic (setTimeout with intervalMs) when
condition() resolves false.

In `@src/messages/inbound.ts`:
- Around line 246-265: parseMatrixToUserID currently calls
decodeURIComponent(firstSegment) which can throw on malformed percent-encodings
and crash parsing; wrap the decodeURIComponent call in a try/catch inside
parseMatrixToUserID (or validate percent-encoding beforehand) and return null on
decode failures so a bad matrix.to hash doesn't abort parseMatrixContent;
reference parseMatrixToUserID, firstSegment, and the identifier variable to
locate where to add the safe decode and fallback.
- Around line 226-243: The anchor text in renderHTMLLinkToMarkdown is inserted
verbatim into the generated Markdown link, which breaks if it contains
characters like ] ( ) or backslashes; update renderHTMLLinkToMarkdown to import
escapeMarkdownLinkText from ../shared/utils and use it to escape the link text
before embedding in `[text](href)` (i.e., replace uses of text in the final link
with escapeMarkdownLinkText(text || matrixMentionDisplayText(mentionedUserID))),
and ensure you still add mentionedUserID to mentionedUserIDs when applicable.

---

Nitpick comments:
In `@e2e/helpers.ts`:
- Around line 142-147: The current sanitizeForIndexedDBName is lossy and can map
distinct IDs to the same string; replace this with a reversible, ID-safe
encoding and use it from createE2EIndexedDBPrefix: implement a
encodeForIndexedDBName function that performs a URL-safe Base64 encoding
(base64url, no padding) of the original value to guarantee uniqueness and
reversibility, update createE2EIndexedDBPrefix to call encodeForIndexedDBName
for session.userID and session.deviceID instead of sanitizeForIndexedDBName, and
ensure the chosen encoding produces only safe characters (A-Z a-z 0-9 - _) for
IndexedDB names.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69ee5f62-0e18-46dc-aad4-19b9820797ba

📥 Commits

Reviewing files that changed from the base of the PR and between f867b64 and a17ee5a.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (7)
  • e2e/e2e.test.ts
  • e2e/helpers.ts
  • package.json
  • src/index.test.ts
  • src/index.ts
  • src/messages/inbound.ts
  • src/messages/outbound.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • package.json
  • src/index.ts
📜 Review details
🔇 Additional comments (5)
src/messages/outbound.ts (2)

132-147: Nice fix: plain-text retry now works even when the body already fits.

Returning a single stripped chunk here keeps the M_TOO_LARGE fallback reachable for rich-text events where formatted_body or mentions are the real size offender.


521-580: Nice fix: chunk boundaries are now surrogate-safe.

Clamping both the binary-search candidate and the final boundary avoids emitting lone UTF-16 surrogates while still preserving the original text across joins.

src/index.test.ts (2)

460-473: Good regression test for empty room IDs.

This pins the "matrix:" edge case that previously slipped through decodeThreadId().


1323-1455: The new split-retry and text-edge-case coverage looks solid.

These assertions exercise both multi-chunk and single-chunk M_TOO_LARGE retries, plus the surrogate-pair and empty-username cases that were easy to regress.

Also applies to: 1577-1623

e2e/e2e.test.ts (1)

474-479: These E2E assertions are less flaky now.

Matching on event ID avoids false positives from older messages with similar text, and polling listThreads() is a much better fit for eventually hydrated thread summaries.

Also applies to: 630-663

Comment thread e2e/helpers.ts Outdated
Comment thread src/messages/inbound.ts Outdated
Comment thread src/messages/inbound.ts Outdated
• messages/inbound.ts: Escape markdown link text when rendering HTML anchors (escapeMarkdownLinkText) to avoid accidental markdown injection, and make parseMatrixToUserID resilient to malformed percent-encoding by wrapping decodeURIComponent in a try/catch.
• e2e/helpers.ts: Replace sanitizeForIndexedDBName with base64url encoding (encodeForIndexedDBName) for safer IndexedDB keys, and rework waitForCondition to respect timeouts using Promise.race, clearing timers and rejecting on timeout.
• src/index.test.ts: Add tests ensuring HTML anchor text is escaped when re-emitting markdown links and that malformed matrix.to links are ignored instead of aborting formatted-body parsing.
These changes improve robustness around link handling, encoding for storage, and test coverage for edge cases.
@batuhan batuhan merged commit 632fb14 into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant