fix(message-parser): parse underscore decimals as italic (fixes #31920)#38907
fix(message-parser): parse underscore decimals as italic (fixes #31920)#38907smirk-dev wants to merge 1 commit intoRocketChat:developfrom
Conversation
…es (RocketChat#31920) The PEG parser was greedily matching patterns like '_1.5_' as schemeless domain names via AutolinkedURL before the Emphasis rule could try italic parsing. Since PEG ordered choice doesn't backtrack from a successful match, autoLink() returning plain() still prevented the italic parse. Add a negative lookahead '!"_"' to AutoLinkURL's schemeless domain alternative so strings starting with underscore are not consumed as URLs. This allows the Emphasis rule to correctly parse '_1.5_' as italic while preserving URL parsing for schemed URLs like 'http://te_st.com' and schemeless URLs like 'te_st.com'. Fixes RocketChat#31920
|
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
WalkthroughThis PR fixes a parsing bug where decimal numbers and dotted text enclosed in underscores (e.g., Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/message-parser/tests/emphasis.test.ts (2)
129-129: Remove inline comment — violates the no-comments guideline.The
// Decimal numbers and dots inside italic (#31920)comment is unnecessary; the issue reference belongs in the commit message rather than test source.🧹 Proposed change
- // Decimal numbers and dots inside italic (`#31920`) ['_1.5_', [paragraph([italic([plain('1.5')])])]],As per coding guidelines,
**/*.{ts,tsx,js}: "Avoid code comments in the implementation."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/tests/emphasis.test.ts` at line 129, Remove the inline comment "// Decimal numbers and dots inside italic (`#31920`)" from the test in emphasis.test.ts; locate the test block that covers decimal/italic behavior (the test near the comment) and delete that comment line so the test source contains no code comments, keeping any issue references in the commit message instead.
130-133: Add a regression test documenting the schemeless URL behavior change for patterns starting with underscore.The grammar fix for issue
#31920(adding tests for dots in italic content) also silently changes how schemeless URLs starting with_are handled. Patterns like_domain.compreviously auto-linked but now render as plain text or broken emphasis. None of the new tests cover this side-effect, creating a regression risk if future refactors inadvertently revert to URL-first matching without test coverage to catch it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/tests/emphasis.test.ts` around lines 130 - 133, Add a regression test that covers schemeless URLs that start with an underscore by extending the same test vectors array used in emphasis.test.ts (the list containing entries like '_1.5_' and functions/constructors paragraph, italic, plain) with cases such as '_domain.com_' and '_domain.com/path_' and assert they produce an autolink/link node instead of an italic/paragraph node; place these new entries alongside the existing patterns so the parser test verifies URL-first matching for leading-underscore schemeless URLs and prevents the regression introduced by the dots-in-italic grammar change.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/message-parser/src/grammar.pegjspackages/message-parser/tests/emphasis.test.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)
**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation
Files:
packages/message-parser/tests/emphasis.test.ts
🧠 Learnings (2)
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files
Applied to files:
packages/message-parser/tests/emphasis.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.spec.ts : Use descriptive test names that clearly communicate expected behavior in Playwright tests
Applied to files:
packages/message-parser/tests/emphasis.test.ts
🧬 Code graph analysis (1)
packages/message-parser/tests/emphasis.test.ts (1)
packages/message-parser/src/utils.ts (3)
paragraph(27-27)italic(62-62)plain(65-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: 📦 Build Packages
- GitHub Check: cubic · AI code reviewer
- GitHub Check: CodeQL-Build
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/message-parser/tests/emphasis.test.ts`:
- Line 129: Remove the inline comment "// Decimal numbers and dots inside italic
(`#31920`)" from the test in emphasis.test.ts; locate the test block that covers
decimal/italic behavior (the test near the comment) and delete that comment line
so the test source contains no code comments, keeping any issue references in
the commit message instead.
- Around line 130-133: Add a regression test that covers schemeless URLs that
start with an underscore by extending the same test vectors array used in
emphasis.test.ts (the list containing entries like '_1.5_' and
functions/constructors paragraph, italic, plain) with cases such as
'_domain.com_' and '_domain.com/path_' and assert they produce an autolink/link
node instead of an italic/paragraph node; place these new entries alongside the
existing patterns so the parser test verifies URL-first matching for
leading-underscore schemeless URLs and prevents the regression introduced by the
dots-in-italic grammar change.
| AutoLinkURL | ||
| = $(URLScheme URLAuthority AutoLinkURLBody*) | ||
| / $(URLAuthorityHost AutoLinkURLBody*) | ||
| / !"_" @$(URLAuthorityHost AutoLinkURLBody*) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find any test asserting autoLink / URL node for inputs starting with `_`
rg -n "autoLink\|AutoLink\|autolink\|_[a-zA-Z0-9]" packages/message-parser/tests/ -A2 | grep -A2 "'_"Repository: RocketChat/Rocket.Chat
Length of output: 48
🏁 Script executed:
#!/bin/bash
# List test files
echo "=== Test files in packages/message-parser/tests/ ==="
find packages/message-parser/tests/ -type f -name "*.ts" -o -name "*.js" | head -20
echo ""
echo "=== Searching for AutoLink/autoLink test cases ==="
rg -i "autolink" packages/message-parser/tests/ -l
echo ""
echo "=== Searching for schemeless URL tests ==="
rg "schemeless\|domain\\.com" packages/message-parser/tests/ -n -B2 -A2Repository: RocketChat/Rocket.Chat
Length of output: 1198
🏁 Script executed:
#!/bin/bash
# Check size and read url.test.ts
wc -l packages/message-parser/tests/url.test.ts
echo ""
echo "=== File contents ==="
cat -n packages/message-parser/tests/url.test.tsRepository: RocketChat/Rocket.Chat
Length of output: 11320
🏁 Script executed:
#!/bin/bash
# Check emphasis tests for underscore patterns
echo "=== Checking emphasis.test.ts for underscore patterns ==="
rg "_[0-9]|_1\.5" packages/message-parser/tests/emphasis.test.ts -n -B1 -A1
echo ""
echo "=== Checking grammar.pegjs around lines 432-435 ==="
sed -n '420,445p' packages/message-parser/src/grammar.pegjsRepository: RocketChat/Rocket.Chat
Length of output: 1049
The change correctly fixes the emphasis bug (_1.5_ now italicizes properly), but schemeless URLs starting with _ will no longer auto-link.
The !"_" negation on the schemeless alternative blocks all such URLs — not just edge cases with emphasis markers. No existing tests cover schemeless URLs starting with _ (like _domain.com or _dmarc.example.com), so none will fail, but this is a real side effect to accept as part of the fix.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #38907 +/- ##
===========================================
- Coverage 70.55% 70.55% -0.01%
===========================================
Files 3189 3189
Lines 112703 112702 -1
Branches 20429 20429
===========================================
- Hits 79519 79518 -1
Misses 31123 31123
Partials 2061 2061 🚀 New features to boost your workflow:
|
Summary
Fixes decimal and dot-containing underscore emphasis parsing (e.g.
_1.5_) being treated as plain text instead of italic.Root cause
AutoLinkURL(schemeless branch) was matching underscore-prefixed text first, preventingEmphasisfrom parsing it as italic in PEG ordered choice.What changed
AutoLinkURLinsrc/grammar.pegjsto avoid matching when input starts with_tests/emphasis.test.ts:_1.5__a.b__hello.world__1.5 text_Testing
packages/message-parsertest suite passesFixes #31920
Summary by CodeRabbit
Bug Fixes
Tests