Fix lexer::scan() and utils::impl MyStrExt for &str #21
Open
hardglitch wants to merge 5 commits intod0rianb:masterfrom
Open
Fix lexer::scan() and utils::impl MyStrExt for &str #21hardglitch wants to merge 5 commits intod0rianb:masterfrom
hardglitch wants to merge 5 commits intod0rianb:masterfrom
Conversation
d0rianb
reviewed
Oct 7, 2025
Owner
d0rianb
left a comment
There was a problem hiding this comment.
Thank for the work,
Just a few comments on linting belwow
Owner
|
This could fix #18. #[test]
fn should_lex_unicode() {
let rtf = r#"{\u21834 \u21834 }"#;
let tokens = Lexer::scan(rtf).unwrap();
assert_eq!(
tokens,
vec![OpeningBracket, ControlSymbol((Unicode, Value(21834))), PlainText(" "), ControlSymbol((Unicode, Value(21834))), ClosingBracket]
);
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix UTF-8 safety in string slicing
Problem:
The original code indexed &str using byte offsets (bytes[i] as char and slices like &s[a..b]), which breaks UTF-8 when multi-byte characters are present (e.g., Cyrillic). This caused panics like:
byte index N is not a char boundarySolution:
Effect:
The library no longer panics on RTF containing Cyrillic or other multi-byte characters, while preserving correct behavior for ASCII.