From 5c78d08583752a97147a7656ff329b75ce0e1ec6 Mon Sep 17 00:00:00 2001 From: Eric Prud'hommeaux Date: Sat, 18 Mar 2023 19:40:04 +0100 Subject: [PATCH] RE group production emits a non-captured group For deep lexer rules like [ShEx's PN_CHARS_BASE](https://github.com/shexjs/shex.js/blob/b5cb30708d7c69550a07f1329aaf97cdb8eed737/packages/shex-parser/lib/ShExJison.jison#L255), the [emitted rule](https://github.com/shexjs/shex.js/blob/b5cb30708d7c69550a07f1329aaf97cdb8eed737/packages/shex-parser/lib/ShExJison.js#L934) has an enormous number of capture groups. When parsing a large input like [FHIR.shex](https://hl7.org/fhir/R4B/fhir.schema.shex.zip) gives a stack error: ``` /home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:251 throw errors[0]; ^ RangeError: Maximum call stack size exceeded at String.match () at JisonLexer.next (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:225:37) at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:269:22) at JisonLexer.lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/lexer/lib/lexer.js:274:25) at lex (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:51:28) at JisonParser.parse (/home/eric/checkouts/shexSpec/shex.js/node_modules/@ts-jison/parser/lib/parser.js:68:30) at ShExJisonParser.runParser [as parse] (/home/eric/checkouts/shexSpec/shex.js/packages/shex-parser/shex-parser.js:231:22) at Object. (/home/eric/checkouts/shexSpec/shex.js/parseFhir.js:10:38) at Module._compile (node:internal/modules/cjs/loader:1119:14) at Module._extensions..js (node:internal/modules/cjs/loader:1173:10) { parsed: null } ``` Eliminating capture groups fixes the problem and makes parsing wayyyy faster. (I generated this grammar using ts-jison, but the same happens with jison.) --- lex.y | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lex.y b/lex.y index 599c382..4865b8f 100644 --- a/lex.y +++ b/lex.y @@ -156,7 +156,7 @@ regex_concat regex_base : '(' regex_list ')' - { $$ = '(' + $2 + ')'; } + { $$ = '(?:' + $2 + ')'; } | SPECIAL_GROUP regex_list ')' { $$ = $1 + $2 + ')'; } | regex_base '+'