-
Notifications
You must be signed in to change notification settings - Fork 55
feat: further optimize code variables regex search #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: further optimize code variables regex search #429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| def _extract_plain_substring(pattern): | ||
| # Matches inline flag groups like (?i), (?ai), (?ims), etc. that include the 'i' flag. | ||
| # Python regex flags: a=ASCII, i=IGNORECASE, L=LOCALE, m=MULTILINE, s=DOTALL, u=UNICODE, x=VERBOSE | ||
| inline_flags = re.match(r"^\(\?[aiLmsux]*i[aiLmsux]*\)", pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrings may be missed
_extract_plain_substring() rejects any pattern whose remainder contains regex metacharacters, including [ and ]. That means simple ignore-case patterns like (?i)[\s_-]*api[_-]?key (or even (?i)api[_-]?key) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/exception_utils.py
Line: 935:938
Comment:
**Substrings may be missed**
`_extract_plain_substring()` rejects any pattern whose remainder contains regex metacharacters, including `[` and `]`. That means simple ignore-case patterns like `(?i)[\s_-]*api[_-]?key` (or even `(?i)api[_-]?key`) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).
How can I resolve this? If you propose a fix, please make it concise.
posthog-python Compliance ReportDate: 2026-02-09 22:22:34 UTC ✅ All Tests Passed!29/29 tests passed Capture Tests✅ 29/29 tests passed View Details
|
Our python SDK does code variables capture in error tracking. We have 15 default regex case insensitive matches for redacting some variables like
password,api_key, etc...As it turns out, regex is extremely slow compared to
substring in string.This PR optimizes case insensitive regexes into simple substring matching which is substantially faster.