Skip to content

Fix typedef/variable name conflict parsing error#219

Open
Copilot wants to merge 8 commits intodevelopfrom
copilot/fix-typedef-variable-name-conflict
Open

Fix typedef/variable name conflict parsing error#219
Copilot wants to merge 8 commits intodevelopfrom
copilot/fix-typedef-variable-name-conflict

Conversation

Copy link

Copilot AI commented Mar 12, 2026

  • Create test file test/small1/typedef_varname.c for the typedef/variable name conflict
  • Add test to test/testcil.pl
  • Fix init_declarator rule in src/frontc/cparser.mly to call add_identifier via declarator_init_start helper rule between declarator and EQ
  • Fix comment style in cparser.mly to use /* (* TEXT *) */ as per file conventions
  • Fix add_identifier timing: declarator_init_start registers each initialized declarator's name before its initializer is parsed; declarator_no_init registers NO_INIT declarator names immediately (guarded by is_typedef_decl flag); typedef names use add_type only in doDeclaration — ensuring correct C11 6.2.1.7 scoping without breaking typedef scope cleanup
  • Add test/small1/typedef_varname_noinit.c to test multiple initialized declarators: verifies that the second declarator's initializer can reference the first declared variable (which shadows a typedef), exercising declarator_init_start
  • Fix CI regression in test/scope2: declarator_no_init now skips add_identifier when is_typedef_decl=true, preventing pop_context from incorrectly removing the NAMED_TYPE lexicon binding for typedef names
  • Add test for NO_INIT middle declarator in typedef_varname.c: mytype a = 1, mytype, b = sizeof mytypesizeof mytype (without parens) is only valid when mytype is a variable, proving correct C11 6.2.1.7 scoping for NO_INIT declarators in mixed declaration lists
Original prompt

This section details on the original issue you should resolve

<issue_title>typedef and variable name conflict parsing error</issue_title>
<issue_description>### Test case from goblint/bench#38 (comment)

#include <stddef.h>
#include <stdint.h>

typedef struct raxNode {
    uint32_t iskey:1;
    uint32_t isnull:1;
    uint32_t iscompr:1;
    uint32_t size:29;
    unsigned char data[];
} raxNode;

typedef struct rax {
    raxNode *head;
    uint64_t numele;
    uint64_t numnodes;
} rax;

void *malloc(size_t size);

int main() {
  rax *rax = malloc(sizeof(*rax)); // PARSING ERROR HERE
  return 0;
}

This is completely fine by GCC/Clang and rax inside the sizeof should refer to the local variable rax that has been declared and is being initialized.

Cause

The parser errors because that inner rax is a NAMED_TYPE token from the lexer instead of IDENT. This is due to the lexer hack being buggy: ours only adds a local variable to the lexer hack table after the entire declaration, but actually the initializer in the declaration may already refer to the local variable. Thus at this point, the lexer hack table considers rax to refer to the typedef instead.

Specification

It's extremely difficult to find specification for this, but I believe it might be 6.2.1.7 in N1570:

Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator.

The declaractor is the *rax part of the declaration in the above example, so rax needs to be added to the lexer hack table as identifier between the declarator and initializer, not after the whole thing.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

… register identifier before parsing initializer

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix typedef and variable name conflict parsing error Fix typedef/variable name conflict parsing error Mar 12, 2026
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz michael-schwarz marked this pull request as ready for review March 13, 2026 09:00
@michael-schwarz michael-schwarz requested a review from sim642 March 13, 2026 09:00
in the lexer hack, so that in the initializer the name shadows any typedef
with the same name (C11 6.2.1.7: scope begins after completion of declarator). *) */
declarator_init_start:
declarator EQ { let (n, _, _, _) = $1 in !Lexerhack.add_identifier n; $1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be the right place for it to happen but just adding another Lexerhack.add_identifier call is a bit suspicious.

Somewhere else there's already a call to call add_identifier on the same thing, just slightly too late.
The lexer hack management uses H.add (not H.replace) and H.remove to actually use the multiple-binding feature of OCaml hashtables (which exists precisely for this kind of namespacing use case).
But if there's an extra add_identifier here and the same one is added later by existing code, then there will only be one remove. This means that the identifier will probably remain bound outside of its intended scope somewhere.

Lexerhack has a push_context-pop_context mechanism to manage all this in a well-nested manner which this might now break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address simmo's comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The double-add was the issue — fixed in d8822cc by changing doDeclaration to skip calling add_identifier for initialized declarators (those with init != NO_INIT), since declarator_init_start already registered them before the initializer was parsed. This ensures exactly one add/remove pair per variable in the lexer context, keeping the push_context/pop_context mechanism balanced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asymmetry with NO_INIT also seems dubious because multiple identifiers can be declared together and mixing NO_INIT with initializers can be arbitrary, e.g. int a = 1, b, c = 3;. This seems like b would be added at the end, while a and c are added at their =. And I guess later initializers could refer to earlier ones, so it might matter?

The lexer hack is much more intricate than Copilot likes to believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address simmo's comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 160543c. Instead of the NO_INIT check in doDeclaration, both init_declarator branches now use dedicated helper rules that call add_identifier immediately after each declarator is parsed:

  • declarator_no_init — handles the uninitialized case, registers the name right after the declarator
  • declarator_init_start — handles the initialized case, registers the name between the declarator and =

doDeclaration no longer calls add_identifier at all. This gives correct C11 6.2.1.7 timing for every declarator in any combination, including int a = 1, b, c = 3; — with exactly one add/remove per variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, write a test that tests this specific feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test/small1/typedef_varname_noinit.c (registered as testrun/typedef_varname_noinit in testcil.pl) in d977be5.

The key test technique: sizeof T without parentheses is only valid C when T is a variable (expression operand). When T is still a type name, sizeof T is a parse error (types require sizeof(T)). So:

typedef int T;
{ T T, U = sizeof T; }          // fails to parse if declarator_no_init is absent
typedef int mytype;
{ mytype a = 1, mytype, b = sizeof mytype; }  // NO_INIT in the middle, referenced later

Both test functions compile and run only if the NO_INIT declarator is registered as an identifier immediately after its declarator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot fix the CI failures

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the CI failures and updated the test in 363114d.

Root cause of CI failure: The previous declarator_no_init helper rule called add_identifier unconditionally for all declarations — including typedef declarations. For typedef ... Z inside a block, this added "Z" to the context list. When pop_context removed it on block exit, H.remove "Z" removed the most-recently-added binding (the NAMED_TYPE from add_type), leaving the stale IDENT binding. So Z globz; at file scope then failed to parse (test/scope2).

Fix: Reverted declarator_no_init and restored add_identifier for NO_INIT names inside doDeclaration (which already guards against typedef declarations via the isTypedef branch). declarator_init_start is retained for the initialized case — this is the core fix for the original issue.

Updated test (typedef_varname_noinit.c): Now tests multiple initialized declarators, specifically that the second initializer can reference the first declared variable after it has shadowed a typedef:

typedef int T;
{ T T = 1, U = T + 1; }  /* T in "T + 1" must be the variable, not the typedef */

Without the fix, T + 1 would see T as NAMED_TYPE and fail to parse (since NAMED_TYPE is not a valid primary expression).

…initialized declarators

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… its declarator via helper rules

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…t_start; update test

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
int main() {
rax *rax = malloc(sizeof(*rax)); /* variable rax shadows typedef rax in initializer */
return rax == 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add a further test where there is no init in the middle, i.e., akin to

typedef int mytype;

mytype a = 1, mytype, b = sizeof mytype;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 5960536. The test uses sizeof mytype without parentheses — this is only valid C when mytype is a variable (an expression operand); types require sizeof(type). So the test fails to parse if the fix is absent:

typedef int mytype;
/* ... */
{
    mytype a = 1, mytype, b = sizeof mytype;
    mytype = 5;
    if (a != 1) E(2);
    if (mytype != 5) E(3);
    if (b != sizeof(int)) E(4);
}

To make this work correctly without breaking test/scope2 (the previous regression), declarator_no_init is re-added but now guarded by an is_typedef_decl flag. The flag is set when TYPEDEF is seen and reset in doDeclaration, so typedef declarators don't get add_identifier called — only non-typedef NO_INIT declarators do.

… typedef guard

Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
@michael-schwarz
Copy link
Member

@sim642: This looks reasonable now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typedef and variable name conflict parsing error

3 participants