test(message-parser): add property-based fuzz tests with fast-check#38906
test(message-parser): add property-based fuzz tests with fast-check#38906smirk-dev wants to merge 1 commit intoRocketChat:developfrom
Conversation
Add comprehensive fuzz/property-based tests using fast-check to verify parser robustness against arbitrary and adversarial inputs: - Never-throw invariant: ASCII, unicode, full-unicode, all options - Output structure validation: Root type conformance - Plain text identity: alphanumeric → PARAGRAPH + PLAIN_TEXT - Formatting delimiter stress: random markdown sequences - Emphasis parity: balanced bold/italic/strike produce correct AST nodes - Special constructs: URLs, emails, emoji shortcodes, mentions, code blocks - Multiline message handling - Documents empty-string crash as known parser limitation 19 property-based tests with ~6000 total generated inputs.
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
WalkthroughThis change adds property-based testing infrastructure to the message parser package. The fast-check library is introduced as a dependency, and a comprehensive fuzz test suite is created to validate parser resilience across diverse input types and formatting scenarios. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
packages/message-parser/tests/fuzz.test.ts (2)
39-52:validTopLevelTypesSet is recreated on everyisValidRoot()callConstructing the
Setinside the function body creates a new allocation for each of the hundreds of fast-check runner invocations. Hoist it to module-level.♻️ Proposed refactor
+const VALID_TOP_LEVEL_TYPES = new Set([ + 'PARAGRAPH', 'CODE', 'HEADING', 'QUOTE', 'SPOILER_BLOCK', + 'LIST_ITEM', 'TASKS', 'ORDERED_LIST', 'UNORDERED_LIST', + 'LINE_BREAK', 'KATEX', 'BIG_EMOJI', +]); + function isValidRoot(result: Root): boolean { if (!Array.isArray(result) || result.length === 0) { return false; } - - const validTopLevelTypes = new Set([ - 'PARAGRAPH', 'CODE', 'HEADING', 'QUOTE', 'SPOILER_BLOCK', - 'LIST_ITEM', 'TASKS', 'ORDERED_LIST', 'UNORDERED_LIST', - 'LINE_BREAK', 'KATEX', 'BIG_EMOJI', - ]); - - return result.every( - (node) => node !== null && typeof node === 'object' && typeof node.type === 'string' && validTopLevelTypes.has(node.type), - ); + return result.every( + (node) => node !== null && typeof node === 'object' && typeof node.type === 'string' && VALID_TOP_LEVEL_TYPES.has(node.type), + ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/tests/fuzz.test.ts` around lines 39 - 52, The Set validTopLevelTypes is being constructed inside isValidRoot on every call causing unnecessary allocations; move the const validTopLevelTypes declaration out of isValidRoot to module scope so the Set is created once, then have isValidRoot reference that top-level validTopLevelTypes (ensure the name remains unchanged and update any imports/uses of isValidRoot accordingly).
84-84: All highlighted calls use APIs deprecated since fast-check 3.22.0
asciiString,unicodeString,fullUnicodeString,string16bits,hexaString, andstringOfwere officially deprecated in 3.22.0, along with their single-character variants. These arbitraries will remain available in all 3.x versions but will likely be removed in version 4.x.The affected calls across the file are:
Deprecated call Replacement fc.unicodeString({...})(L84, L146)fc.string({...})(now polyglot by default)fc.fullUnicodeString({...})(L98)fc.string({ unit: 'grapheme', ... })fc.stringOf(arb, constraints)(L165, L238, L264, L288, L347, L364)fc.string({ unit: arb, ...constraints })♻️ Representative migration diff (lines 84, 98, 165)
-fc.unicodeString({ minLength: 1, maxLength: 300 }), +fc.string({ minLength: 1, maxLength: 300 }), -fc.fullUnicodeString({ minLength: 1, maxLength: 200 }), +fc.string({ unit: 'grapheme', minLength: 1, maxLength: 200 }), -fc.stringOf(fc.constantFrom(...'abcdefghijklmnopqrstuvwxyz...'), { minLength: 1, maxLength: 200 }), +fc.string({ unit: fc.constantFrom(...'abcdefghijklmnopqrstuvwxyz...'), minLength: 1, maxLength: 200 }),Apply the
stringOf→fc.string({ unit: ... })pattern uniformly to L238, L264, L288, L347, and L364.Also applies to: 98-98, 146-146, 165-165, 238-238, 264-264, 288-288, 347-347, 364-364
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/tests/fuzz.test.ts` at line 84, Replace deprecated fast-check arbitraries used in the fuzz tests: change fc.unicodeString(...) calls to fc.string(...), change fc.fullUnicodeString(...) to fc.string({ unit: 'grapheme', ... }), and replace fc.stringOf(arb, constraints) usages with fc.string({ unit: arb, ...constraints }); update all occurrences of fc.unicodeString, fc.fullUnicodeString, and fc.stringOf in the test file so options are preserved and the new fc.string(...) form is used consistently.packages/message-parser/package.json (1)
55-55: Consider upgrading to fast-check v4.x (current: 4.5.3)The latest published version of fast-check is 4.5.3. The pinned range
^3.22.0stays within the 3.x line, which is two major versions behind. More importantly, fast-check 3.22.0 officially deprecatedasciiString,unicodeString,fullUnicodeString,stringOf, and their single-character variants in favor of the unifiedstringarbitrary with aunitparameter. Since the test file (see details below) uses these now-deprecated APIs, they're deprecated from the first install of this exact version.Consider pinning to
^4.0.0and updating the test file to the non-deprecated API surface (see test-file comment below), or at minimum accept that the test suite starts life using deprecated calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/package.json` at line 55, Update the fast-check dependency and migrate deprecated API usage: change the package.json fast-check entry from "^3.22.0" to a 4.x range (e.g. "^4.0.0" or "^4.5.3") and then update tests that call deprecated arbitraries (asciiString, unicodeString, fullUnicodeString, stringOf and single-character variants) to use the v4 unified string arbitrary (string with an appropriate unit parameter) and any new option shapes; locate these in your test files by searching for the symbols asciiString, unicodeString, fullUnicodeString, stringOf and replace them with calls to fast-check's string(...) configured to match the previous behavior. Ensure imports still reference "fast-check" and run the test suite to confirm no further deprecated API usage remains.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (2)
packages/message-parser/package.jsonpackages/message-parser/tests/fuzz.test.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)
**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation
Files:
packages/message-parser/tests/fuzz.test.ts
🧠 Learnings (6)
📓 Common learnings
Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.
📚 Learning: 2025-12-10T21:00:54.909Z
Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.
Applied to files:
packages/message-parser/tests/fuzz.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts
Applied to files:
packages/message-parser/tests/fuzz.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files
Applied to files:
packages/message-parser/tests/fuzz.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file
Applied to files:
packages/message-parser/tests/fuzz.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.{ts,tsx,js} : Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Applied to files:
packages/message-parser/package.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: 📦 Build Packages
- GitHub Check: cubic · AI code reviewer
- GitHub Check: CodeQL-Build
🔇 Additional comments (1)
packages/message-parser/tests/fuzz.test.ts (1)
66-78: No action needed — test descriptions are accurateThe review comment's core premise is incorrect. While fast-check 3.22.0 introduced the optional
unitparameter forfc.string()to support polyglot generation, the default behavior remains unchanged:fc.string()without aunitparameter defaults tounit: 'grapheme-ascii', which generates only printable ASCII (0x20–0x7E)—identical to pre-3.22.0 behavior.Tests at lines 66–78 and 124–139 use
fc.string({ minLength: ..., maxLength: ... })without specifyingunit, so they correctly test ASCII input as their names indicate. There is no overlap with theunicodeString-based tests, and no fix is required.Likely an incorrect or invalid review comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/message-parser/tests/fuzz.test.ts`:
- Around line 234-303: The fuzz tests for emphasis are flaky because the
character set includes a space, producing inputs with leading/trailing spaces
(e.g. `**a **`, `_ b_`) which CommonMark disqualifies; fix by removing the space
from the generator character set used in the three tests that build inputs
`**${content}**`, `_${content}_`, and `~~${content}~~`: replace
fc.stringOf(fc.constantFrom(...'abcdefghij '), ...) with
fc.stringOf(fc.constantFrom(...'abcdefghij'), ...) in the tests that reference
parse and check hasBold/hasItalic/hasStrike so generated content cannot start or
end with a space.
---
Nitpick comments:
In `@packages/message-parser/package.json`:
- Line 55: Update the fast-check dependency and migrate deprecated API usage:
change the package.json fast-check entry from "^3.22.0" to a 4.x range (e.g.
"^4.0.0" or "^4.5.3") and then update tests that call deprecated arbitraries
(asciiString, unicodeString, fullUnicodeString, stringOf and single-character
variants) to use the v4 unified string arbitrary (string with an appropriate
unit parameter) and any new option shapes; locate these in your test files by
searching for the symbols asciiString, unicodeString, fullUnicodeString,
stringOf and replace them with calls to fast-check's string(...) configured to
match the previous behavior. Ensure imports still reference "fast-check" and run
the test suite to confirm no further deprecated API usage remains.
In `@packages/message-parser/tests/fuzz.test.ts`:
- Around line 39-52: The Set validTopLevelTypes is being constructed inside
isValidRoot on every call causing unnecessary allocations; move the const
validTopLevelTypes declaration out of isValidRoot to module scope so the Set is
created once, then have isValidRoot reference that top-level validTopLevelTypes
(ensure the name remains unchanged and update any imports/uses of isValidRoot
accordingly).
- Line 84: Replace deprecated fast-check arbitraries used in the fuzz tests:
change fc.unicodeString(...) calls to fc.string(...), change
fc.fullUnicodeString(...) to fc.string({ unit: 'grapheme', ... }), and replace
fc.stringOf(arb, constraints) usages with fc.string({ unit: arb, ...constraints
}); update all occurrences of fc.unicodeString, fc.fullUnicodeString, and
fc.stringOf in the test file so options are preserved and the new fc.string(...)
form is used consistently.
| 'balanced bold delimiters produce BOLD nodes', | ||
| () => { | ||
| fc.assert( | ||
| fc.property( | ||
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | ||
| (content) => { | ||
| const input = `**${content}**`; | ||
| const result = parse(input); | ||
| expect(result.length).toBeGreaterThanOrEqual(1); | ||
|
|
||
| // Verify there's at least one BOLD node somewhere in the AST | ||
| const hasBold = JSON.stringify(result).includes('"BOLD"'); | ||
| // For very short content with special meaning this might not always hold, | ||
| // but for non-empty alphanum content it should | ||
| if (content.trim().length > 0) { | ||
| expect(hasBold).toBe(true); | ||
| } | ||
| }, | ||
| ), | ||
| { numRuns: 200 }, | ||
| ); | ||
| }, | ||
| 15_000, | ||
| ); | ||
|
|
||
| it( | ||
| 'balanced italic delimiters produce ITALIC nodes', | ||
| () => { | ||
| fc.assert( | ||
| fc.property( | ||
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | ||
| (content) => { | ||
| const input = `_${content}_`; | ||
| const result = parse(input); | ||
| expect(result.length).toBeGreaterThanOrEqual(1); | ||
|
|
||
| const hasItalic = JSON.stringify(result).includes('"ITALIC"'); | ||
| if (content.trim().length > 0) { | ||
| expect(hasItalic).toBe(true); | ||
| } | ||
| }, | ||
| ), | ||
| { numRuns: 200 }, | ||
| ); | ||
| }, | ||
| 15_000, | ||
| ); | ||
|
|
||
| it( | ||
| 'balanced strike delimiters produce STRIKE nodes', | ||
| () => { | ||
| fc.assert( | ||
| fc.property( | ||
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | ||
| (content) => { | ||
| const input = `~~${content}~~`; | ||
| const result = parse(input); | ||
| expect(result.length).toBeGreaterThanOrEqual(1); | ||
|
|
||
| const hasStrike = JSON.stringify(result).includes('"STRIKE"'); | ||
| if (content.trim().length > 0) { | ||
| expect(hasStrike).toBe(true); | ||
| } | ||
| }, | ||
| ), | ||
| { numRuns: 200 }, | ||
| ); | ||
| }, | ||
| 15_000, | ||
| ); |
There was a problem hiding this comment.
Emphasis parity tests are flaky when generated content has leading/trailing spaces
The content arbitrary includes ' ' (space). Content like 'a ' or ' b' passes the content.trim().length > 0 guard, producing inputs like **a ** or _ b_. Most Markdown parsers (and CommonMark) do not recognize these as bold/italic/strike because a space adjacent to the opening or closing delimiter is disqualifying. The tests will then fail expect(hasBold/hasItalic/hasStrike).toBe(true) intermittently.
🐛 Proposed fix — remove space from content character set
-fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }),
+fc.stringOf(fc.constantFrom(...'abcdefghij'), { minLength: 1, maxLength: 30 }),Apply this to lines 238, 264, and 288. This eliminates leading/trailing-space content while still testing non-empty alphanumeric inputs. If internal spaces are desired, an alternative is \${content.trim()}`paired with a pre-check thatcontent.trim().length > 0`.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 'balanced bold delimiters produce BOLD nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `**${content}**`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| // Verify there's at least one BOLD node somewhere in the AST | |
| const hasBold = JSON.stringify(result).includes('"BOLD"'); | |
| // For very short content with special meaning this might not always hold, | |
| // but for non-empty alphanum content it should | |
| if (content.trim().length > 0) { | |
| expect(hasBold).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); | |
| it( | |
| 'balanced italic delimiters produce ITALIC nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `_${content}_`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| const hasItalic = JSON.stringify(result).includes('"ITALIC"'); | |
| if (content.trim().length > 0) { | |
| expect(hasItalic).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); | |
| it( | |
| 'balanced strike delimiters produce STRIKE nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij '), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `~~${content}~~`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| const hasStrike = JSON.stringify(result).includes('"STRIKE"'); | |
| if (content.trim().length > 0) { | |
| expect(hasStrike).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); | |
| 'balanced bold delimiters produce BOLD nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij'), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `**${content}**`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| // Verify there's at least one BOLD node somewhere in the AST | |
| const hasBold = JSON.stringify(result).includes('"BOLD"'); | |
| // For very short content with special meaning this might not always hold, | |
| // but for non-empty alphanum content it should | |
| if (content.trim().length > 0) { | |
| expect(hasBold).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); | |
| it( | |
| 'balanced italic delimiters produce ITALIC nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij'), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `_${content}_`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| const hasItalic = JSON.stringify(result).includes('"ITALIC"'); | |
| if (content.trim().length > 0) { | |
| expect(hasItalic).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); | |
| it( | |
| 'balanced strike delimiters produce STRIKE nodes', | |
| () => { | |
| fc.assert( | |
| fc.property( | |
| fc.stringOf(fc.constantFrom(...'abcdefghij'), { minLength: 1, maxLength: 30 }), | |
| (content) => { | |
| const input = `~~${content}~~`; | |
| const result = parse(input); | |
| expect(result.length).toBeGreaterThanOrEqual(1); | |
| const hasStrike = JSON.stringify(result).includes('"STRIKE"'); | |
| if (content.trim().length > 0) { | |
| expect(hasStrike).toBe(true); | |
| } | |
| }, | |
| ), | |
| { numRuns: 200 }, | |
| ); | |
| }, | |
| 15_000, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/message-parser/tests/fuzz.test.ts` around lines 234 - 303, The fuzz
tests for emphasis are flaky because the character set includes a space,
producing inputs with leading/trailing spaces (e.g. `**a **`, `_ b_`) which
CommonMark disqualifies; fix by removing the space from the generator character
set used in the three tests that build inputs `**${content}**`, `_${content}_`,
and `~~${content}~~`: replace fc.stringOf(fc.constantFrom(...'abcdefghij '),
...) with fc.stringOf(fc.constantFrom(...'abcdefghij'), ...) in the tests that
reference parse and check hasBold/hasItalic/hasStrike so generated content
cannot start or end with a space.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #38906 +/- ##
========================================
Coverage 70.55% 70.55%
========================================
Files 3189 3189
Lines 112703 112703
Branches 20429 20429
========================================
Hits 79519 79519
Misses 31123 31123
Partials 2061 2061 🚀 New features to boost your workflow:
|
Summary
Adds property-based fuzz testing for
@rocket.chat/message-parserusingfast-check.What changed
tests/fuzz.test.tswith robustness properties and randomized inputsfast-checkdev dependency inpackages/message-parser/package.jsonWhy
This increases confidence in parser correctness and crash-resilience under diverse/adversarial inputs.
Testing
npx jest --testPathPatterns='fuzz' --no-coveragepackages/message-parsertest suite passesSummary by CodeRabbit
Tests
Chores