While working on #63 (fixing corrupted regex patterns in keys_extractor), Copilot's review flagged a separate pre-existing issue: remove_url_from_keys() strips ., _, and / from the content before keys_extractor() runs, which means some valid detection patterns can't match.
Affected patterns:
- Google YouTube OAuth: needs
.apps.googleusercontent.com (dots stripped)
- Google OAuth Access Token: needs
ya29. prefix (dot stripped)
- Amazon MWS: needs
amzn.mws. prefix (dots stripped)
- PayPal Braintree: needs
access_token (underscore stripped)
- Slack Webhook: needs URL slashes (slashes stripped)
The special_chars list in remove_url_from_keys() currently includes ., /, and _, which are all characters that appear in real credential formats.
Possible fix: split the preprocessing into two steps — remove URLs and emails first (as it already does), then run keys_extractor() on that result before stripping the remaining special characters. The special character stripping is needed for credential_extractor() but shouldn't run before keys_extractor().
While working on #63 (fixing corrupted regex patterns in keys_extractor), Copilot's review flagged a separate pre-existing issue:
remove_url_from_keys()strips.,_, and/from the content beforekeys_extractor()runs, which means some valid detection patterns can't match.Affected patterns:
.apps.googleusercontent.com(dots stripped)ya29.prefix (dot stripped)amzn.mws.prefix (dots stripped)access_token(underscore stripped)The special_chars list in
remove_url_from_keys()currently includes.,/, and_, which are all characters that appear in real credential formats.Possible fix: split the preprocessing into two steps — remove URLs and emails first (as it already does), then run
keys_extractor()on that result before stripping the remaining special characters. The special character stripping is needed forcredential_extractor()but shouldn't run beforekeys_extractor().