Skip to content

fix: regex error with Python 3.11+#6

Open
bejean wants to merge 2 commits into
langtech-bsc:mainfrom
bejean:fix_regex
Open

fix: regex error with Python 3.11+#6
bejean wants to merge 2 commits into
langtech-bsc:mainfrom
bejean:fix_regex

Conversation

@bejean

@bejean bejean commented Mar 5, 2026

Copy link
Copy Markdown

Running this command with Python 3.11+:

python WikiExtractor.py frwiki-latest-pages-articles.xml.bz2
--json \
--discard_sections
--discard_templates
--ignore_templates
-o "output"
-b 1M

produces this error:

re.error: global flags not at the start of the expression at position 4

The inline flag (?i) (case-insensitive) is placed in the middle of the pattern (position 4), after [((. Since Python 3.11, re requires global flags like (?i) to be placed at the very beginning of the regular expression.

This error does not occur with the python:3.10-slim Docker image.
The fix was tested with the python:3.12-slim and python:3.14-slim Docker images.

bejean added 2 commits March 5, 2026 09:52
Since Python 3.11, re requires global flags like (?i) to be placed at the very beginning of the regular expression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant