Improve raw string parsing clarity per JEP-12#62
Open
joshrotenberg wants to merge 2 commits intojmespath:masterfrom
Open
Improve raw string parsing clarity per JEP-12#62joshrotenberg wants to merge 2 commits intojmespath:masterfrom
joshrotenberg wants to merge 2 commits intojmespath:masterfrom
Conversation
Refactors raw string parsing to use explicit character-by-character processing instead of string replacement. This makes the escape handling logic clearer and matches the JEP-12 specification more directly. Behavior is unchanged: - \' produces a literal single quote - All other backslashes are preserved literally Adds comprehensive tests and documentation for raw string literals. See: https://github.com/jmespath/jmespath.jep/blob/main/proposals/0012-raw-string-literals.md
cetra3
reviewed
Jan 21, 2026
| // Read until closing quote, then process escapes | ||
| self.consume_inside(pos, '\'', |s| { | ||
| Ok(Literal(Rcvar::new(Variable::String(s.replace("\\'", "'"))))) | ||
| // Only \' is an escape sequence - replace with literal quote |
Collaborator
There was a problem hiding this comment.
How does this differ from existing behaviour?
Contributor
Author
There was a problem hiding this comment.
No functional difference, just a little bit more explicit. The PR is mostly for docs/tests/clarity.
Collaborator
There was a problem hiding this comment.
OK what about perf difference, I am going to assume this is slower than the standard replace
Contributor
Author
There was a problem hiding this comment.
You're right, it was a bit slower. Just checked in an optimization that fixes that (and seems to make it a touch faster than the original).
- Add fast path check for strings without backslashes (common case) - Remove peekable() overhead in favor of direct pattern matching - Benchmark shows 8% improvement over original replace() approach - 35% faster than initial char-by-char implementation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR refactors the raw string literal parsing to use explicit character-by-character processing instead of the current string replacement approach. This makes the escape handling logic clearer and aligns the code more directly with the JEP-12 specification.
Changes
consume_raw_stringin lexer to process characters explicitlyBehavior
No behavioral changes - this is a refactor for clarity:
\'produces a literal single quote (to avoid delimiter collision)Testing