Fix typedef/variable name conflict parsing error#219
Fix typedef/variable name conflict parsing error#219
Conversation
… register identifier before parsing initializer Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
| in the lexer hack, so that in the initializer the name shadows any typedef | ||
| with the same name (C11 6.2.1.7: scope begins after completion of declarator). *) */ | ||
| declarator_init_start: | ||
| declarator EQ { let (n, _, _, _) = $1 in !Lexerhack.add_identifier n; $1 } |
There was a problem hiding this comment.
This may be the right place for it to happen but just adding another Lexerhack.add_identifier call is a bit suspicious.
Somewhere else there's already a call to call add_identifier on the same thing, just slightly too late.
The lexer hack management uses H.add (not H.replace) and H.remove to actually use the multiple-binding feature of OCaml hashtables (which exists precisely for this kind of namespacing use case).
But if there's an extra add_identifier here and the same one is added later by existing code, then there will only be one remove. This means that the identifier will probably remain bound outside of its intended scope somewhere.
Lexerhack has a push_context-pop_context mechanism to manage all this in a well-nested manner which this might now break.
There was a problem hiding this comment.
Good catch. The double-add was the issue — fixed in d8822cc by changing doDeclaration to skip calling add_identifier for initialized declarators (those with init != NO_INIT), since declarator_init_start already registered them before the initializer was parsed. This ensures exactly one add/remove pair per variable in the lexer context, keeping the push_context/pop_context mechanism balanced.
There was a problem hiding this comment.
This asymmetry with NO_INIT also seems dubious because multiple identifiers can be declared together and mixing NO_INIT with initializers can be arbitrary, e.g. int a = 1, b, c = 3;. This seems like b would be added at the end, while a and c are added at their =. And I guess later initializers could refer to earlier ones, so it might matter?
The lexer hack is much more intricate than Copilot likes to believe.
There was a problem hiding this comment.
Fixed in 160543c. Instead of the NO_INIT check in doDeclaration, both init_declarator branches now use dedicated helper rules that call add_identifier immediately after each declarator is parsed:
declarator_no_init— handles the uninitialized case, registers the name right after the declaratordeclarator_init_start— handles the initialized case, registers the name between the declarator and=
doDeclaration no longer calls add_identifier at all. This gives correct C11 6.2.1.7 timing for every declarator in any combination, including int a = 1, b, c = 3; — with exactly one add/remove per variable.
There was a problem hiding this comment.
@copilot, write a test that tests this specific feature.
There was a problem hiding this comment.
Added test/small1/typedef_varname_noinit.c (registered as testrun/typedef_varname_noinit in testcil.pl) in d977be5.
The key test technique: sizeof T without parentheses is only valid C when T is a variable (expression operand). When T is still a type name, sizeof T is a parse error (types require sizeof(T)). So:
typedef int T;
{ T T, U = sizeof T; } // fails to parse if declarator_no_init is absenttypedef int mytype;
{ mytype a = 1, mytype, b = sizeof mytype; } // NO_INIT in the middle, referenced laterBoth test functions compile and run only if the NO_INIT declarator is registered as an identifier immediately after its declarator.
There was a problem hiding this comment.
Fixed the CI failures and updated the test in 363114d.
Root cause of CI failure: The previous declarator_no_init helper rule called add_identifier unconditionally for all declarations — including typedef declarations. For typedef ... Z inside a block, this added "Z" to the context list. When pop_context removed it on block exit, H.remove "Z" removed the most-recently-added binding (the NAMED_TYPE from add_type), leaving the stale IDENT binding. So Z globz; at file scope then failed to parse (test/scope2).
Fix: Reverted declarator_no_init and restored add_identifier for NO_INIT names inside doDeclaration (which already guards against typedef declarations via the isTypedef branch). declarator_init_start is retained for the initialized case — this is the core fix for the original issue.
Updated test (typedef_varname_noinit.c): Now tests multiple initialized declarators, specifically that the second initializer can reference the first declared variable after it has shadowed a typedef:
typedef int T;
{ T T = 1, U = T + 1; } /* T in "T + 1" must be the variable, not the typedef */Without the fix, T + 1 would see T as NAMED_TYPE and fail to parse (since NAMED_TYPE is not a valid primary expression).
…initialized declarators Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
… its declarator via helper rules Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
…t_start; update test Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
| int main() { | ||
| rax *rax = malloc(sizeof(*rax)); /* variable rax shadows typedef rax in initializer */ | ||
| return rax == 0; | ||
| } |
There was a problem hiding this comment.
@copilot add a further test where there is no init in the middle, i.e., akin to
typedef int mytype;
mytype a = 1, mytype, b = sizeof mytype;There was a problem hiding this comment.
Added in 5960536. The test uses sizeof mytype without parentheses — this is only valid C when mytype is a variable (an expression operand); types require sizeof(type). So the test fails to parse if the fix is absent:
typedef int mytype;
/* ... */
{
mytype a = 1, mytype, b = sizeof mytype;
mytype = 5;
if (a != 1) E(2);
if (mytype != 5) E(3);
if (b != sizeof(int)) E(4);
}To make this work correctly without breaking test/scope2 (the previous regression), declarator_no_init is re-added but now guarded by an is_typedef_decl flag. The flag is set when TYPEDEF is seen and reset in doDeclaration, so typedef declarators don't get add_identifier called — only non-typedef NO_INIT declarators do.
… typedef guard Co-authored-by: michael-schwarz <13812333+michael-schwarz@users.noreply.github.com>
|
@sim642: This looks reasonable now! |
test/small1/typedef_varname.cfor the typedef/variable name conflicttest/testcil.plinit_declaratorrule insrc/frontc/cparser.mlyto calladd_identifierviadeclarator_init_starthelper rule between declarator and EQcparser.mlyto use/* (* TEXT *) */as per file conventionsadd_identifiertiming:declarator_init_startregisters each initialized declarator's name before its initializer is parsed;declarator_no_initregisters NO_INIT declarator names immediately (guarded byis_typedef_declflag); typedef names useadd_typeonly indoDeclaration— ensuring correct C11 6.2.1.7 scoping without breaking typedef scope cleanuptest/small1/typedef_varname_noinit.cto test multiple initialized declarators: verifies that the second declarator's initializer can reference the first declared variable (which shadows a typedef), exercisingdeclarator_init_starttest/scope2:declarator_no_initnow skipsadd_identifierwhenis_typedef_decl=true, preventingpop_contextfrom incorrectly removing theNAMED_TYPElexicon binding for typedef namestypedef_varname.c:mytype a = 1, mytype, b = sizeof mytype—sizeof mytype(without parens) is only valid whenmytypeis a variable, proving correct C11 6.2.1.7 scoping for NO_INIT declarators in mixed declaration listsOriginal prompt
typedefand variable name conflict parsing error #114🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.