-
Notifications
You must be signed in to change notification settings - Fork 63
Add postlexer to support multiline binary operators and ternary expressions #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+369
−18
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| """Postlexer that transforms the token stream between the Lark lexer and parser. | ||
|
|
||
| Each transformation is implemented as a private method that accepts and yields | ||
| tokens. The public ``process`` method chains them together, making it easy to | ||
| add new passes without touching existing logic. | ||
| """ | ||
|
|
||
| from typing import FrozenSet, Iterator, Optional, Tuple | ||
|
|
||
| from lark import Token | ||
|
|
||
| # Type alias for a token stream consumed and produced by each pass. | ||
| TokenStream = Iterator[Token] | ||
|
|
||
| # Operator token types that may legally follow a line-continuation newline. | ||
| # MINUS is excluded — it is also the unary negation operator, and merging a | ||
| # newline into MINUS would incorrectly consume statement-separating newlines | ||
| # before negative literals (e.g. "a = 1\nb = -2"). | ||
| OPERATOR_TYPES: FrozenSet[str] = frozenset( | ||
| { | ||
| "DOUBLE_EQ", | ||
| "NEQ", | ||
| "LT", | ||
| "GT", | ||
| "LEQ", | ||
| "GEQ", | ||
| "ASTERISK", | ||
| "SLASH", | ||
| "PERCENT", | ||
| "DOUBLE_AMP", | ||
| "DOUBLE_PIPE", | ||
| "PLUS", | ||
| "QMARK", | ||
| } | ||
| ) | ||
|
|
||
|
|
||
| class PostLexer: | ||
| """Transform the token stream before it reaches the LALR parser.""" | ||
|
|
||
| def process(self, stream: TokenStream) -> TokenStream: | ||
| """Chain all postlexer passes over the token stream.""" | ||
| yield from self._merge_newlines_into_operators(stream) | ||
|
|
||
| def _merge_newlines_into_operators(self, stream: TokenStream) -> TokenStream: | ||
| """Merge NL_OR_COMMENT tokens into immediately following operator tokens. | ||
|
|
||
| LALR parsers cannot distinguish a statement-ending newline from a | ||
| line-continuation newline before a binary operator. This pass resolves | ||
| the ambiguity by merging NL_OR_COMMENT into the operator token's value | ||
| when the next token is a binary operator or QMARK. The transformer | ||
| later extracts the newline prefix and creates a NewLineOrCommentRule | ||
| node, preserving round-trip fidelity. | ||
| """ | ||
| pending_nl: Optional[Token] = None | ||
| for token in stream: | ||
| if token.type == "NL_OR_COMMENT": | ||
| if pending_nl is not None: | ||
| yield pending_nl | ||
| pending_nl = token | ||
| else: | ||
| if pending_nl is not None: | ||
| if token.type in OPERATOR_TYPES: | ||
| token = token.update(value=str(pending_nl) + str(token)) | ||
| else: | ||
| yield pending_nl | ||
| pending_nl = None | ||
| yield token | ||
| if pending_nl is not None: | ||
| yield pending_nl | ||
|
|
||
| @property | ||
| def always_accept(self) -> Tuple[()]: | ||
| """Terminal names the parser must accept even when not expected by LALR. | ||
|
|
||
| Lark requires this property on postlexer objects. An empty tuple | ||
| means no extra terminals are injected. | ||
| """ | ||
| return () |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to fix numeric?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm weird. Numbers were automatically changed to

1.by markdown format pre-commit hook. Funnily enough it looks fine in the reading viewApparently its the recommended practice
typora/typora-issues#1188