Skip to content

Upstream merge PR#28

Merged
daniel-otero merged 40 commits into
codeefrom
feature/FinalUpgradeTreeSitterFortranAuto
Apr 15, 2026
Merged

Upstream merge PR#28
daniel-otero merged 40 commits into
codeefrom
feature/FinalUpgradeTreeSitterFortranAuto

Conversation

@d-alonso

Copy link
Copy Markdown

This PR is pre-approved, updated patches and conflicts were reviewed in this PR #27

Everything was squashed into the merge commit.

ZedThree and others added 30 commits January 7, 2026 13:10
Fix `variable_modification` and `public`/`private` in type definitions
This generates a separate node with a list of all associations
for an associate statement.
Add separate rule association_list
Change the default clause for select type statements to make it
consistent with the select case statements.

Generated nodes for the two cases are:

 case_statement
   "case"
   default `default`

 type_statement
   "class"
   default `default`
Make default clause in select statements consistent
The rule method_name is used in derived type definitions. The commit
avoids duplication of method_name as node and component of that node.

Example code:
module m

type :: x
contains
   procedure :: a
end type x

end module m

Previous tree for the "procedure :: a" part:
4:3  - 4:17           procedure_statement
4:3  - 4:12             procedure_kind
4:3  - 4:12               "procedure"
4:13 - 4:15             "::"
4:16 - 4:17             declarator: method_name
4:16 - 4:17               "method_name"

New tree:
4:3  - 4:17           procedure_statement
4:3  - 4:12             procedure_kind
4:3  - 4:12               "procedure"
4:13 - 4:15             "::"
4:16 - 4:17             declarator: method_name `a`
Both variants do not fit into the general rule procedure_statement
used in derived type definitions. Both, procedure attributes (on
left side of ::), and binding structure (on the side of ::) differ
compared to procedure statements.

A test case with line

    generic, pass :: binding_name => method_name, method_name2

has been fixed, as "pass" is not an allowed attribute for generic
statements.
Derived type procedure, generic and final statements
* Remove the tail recursion.
* Use skip_literal_continuation_sequence to handle continuation lines.
  It is almost identical to the previous code, except that we need to
  add a lookahead check for ampersand, as
  skip_literal_continuation_sequence also returns true if lookahead
  symbol is not an ampersand.
Return the integer value in case it is an integer. This is a
preparation to check for labels properly. Labels are integers,
and we cannot parse them separately. We need to parse (and consume)
once, and then decide what to do with the number depending on valid
external symbols.
Add three external tokens DO_LABEL, DO_LABEL_VIRTUAL and
DO_LABEL_CONTINUE to discover and track old style do statements, which
are not case-sensitive and cannot be handled by the grammar alone. Example

do 123, i = 1,23
   do 123, j = 1,23
      ! code
123 continue

The scanner maintains an internal stack to deal with nested do loops
with same or different do labels. If a continue (or end do) statement
closes several do loops with same label, virtual
DO_LABEL_VIRTUAL/DO_LABEL_CONTINUE and END_OF_STATEMENT tokens are
scheduled to properly close all pending do loops.

This yields proper (do_loop ...) nodes also for these old style loops,
and the scope of these loops is properly represented in the tree.

Labeled do (do_loop ...) trees can be recognised by the field do_label.
Moreover the end statement is end_do_label_loop_statement instead of
end_do_statement.

If a continue statement closes several do loops, the outermost statement
is represented by field "do_label: do_label_continue" whereas inner
loops have field "do_label: do_label_virtual".

Only the innermost do loop spans the actual label in the text. Other
do_label fields have zero width.

A testcase with five nested loops and two different do labels was added.
Also rename `do_label_continue` to `statement_label`
The standards allows block labels (do-construct-name) in conjunction with
labeled do statements, even so it seems a bit wild to actually use both in
the statement.

The commit also fixes some comments and changes do_stmt_label and
do_stmt_nonlabel loops such that they correspond to old-style and
new-style do loops, respectively. Before, it was a bit of a mess.
This parses incomplete block statements which only have "end" but not the
structure type like "end do", "end block data" etc. Instead of ERROR nodes
for otherwise properly structured blocks, the parser is able to produce a
proper tree (e.g. for smart completion of end statements).
Do no apply labelRule without structure type. This is intended to guide the parser to not try to parse a keyword like subroutine in
'end subroutine' as a 'do' for incomplete code:

subroutine sub()
   do i = 1,10

end subroutine
(1) blockStructureEnding1 is a bit more general than blockStructureEnding.

To be precise,
blockStructureEnding($, keyword) is equivalent to
blockStructureEnding1($, keyword, {labelRule: $._name, eos: true}).
Some end statements captured the _end_of_statement token and thus included the newline symbol in its span (like "end
subroutine"). The commit makes it consistent and removes the corresponding statement rules one level up.

TODO:
* Note that this might still be inconsistent, as the complete statement node now contains the final newline symbol.
* Remove eos option from blockStructureEnding functions.
Removes the now obsolete optional argument eos.
mscfd and others added 10 commits February 13, 2026 20:56
The standard requires blanks in "end enumeration type" for this new structure.

Extend blockStructureEnding2 by a new argument for whether blanks are required or not.
End statements like "end program name" allow statement labels, like in:

program test
contains
    subroutine foo
    10 end subroutine foo

    integer function bar
      bar = 20
    20 end function bar
30 end program test

The problem was found and the fix suggested by ZedThree <peter.hill@york.ac.uk>.
Refactor and extend `end` construct parsing
Uncommented highlight query and added injections, locals, and tags queries for the grammar.
Expose query for highlights in Rust
The commit adds rank(*) for assumed-sized arrays to the select rank statement.
An existing select rank test in test/corpus/statements.txt has been extended
to test the rank(*) rule.
In the scanner. Otherwise this triggers the undefined behavior sanitizer
when the array is accessed with index -1.
Fix UB when when accesing last element of stack in scanner
@d-alonso d-alonso self-assigned this Apr 14, 2026
@d-alonso d-alonso changed the title Merge PR Upstream merge PR Apr 14, 2026
@d-alonso d-alonso requested a review from daniel-otero April 14, 2026 14:32

@daniel-otero daniel-otero left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do the merge!

@daniel-otero daniel-otero merged commit 381f1b4 into codee Apr 15, 2026
@d-alonso d-alonso deleted the feature/FinalUpgradeTreeSitterFortranAuto branch April 16, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants