Skip to content

remove_url_from_keys() strips characters needed by secret detection patterns #64

@francose

Description

@francose

While working on #63 (fixing corrupted regex patterns in keys_extractor), Copilot's review flagged a separate pre-existing issue: remove_url_from_keys() strips ., _, and / from the content before keys_extractor() runs, which means some valid detection patterns can't match.

Affected patterns:

  • Google YouTube OAuth: needs .apps.googleusercontent.com (dots stripped)
  • Google OAuth Access Token: needs ya29. prefix (dot stripped)
  • Amazon MWS: needs amzn.mws. prefix (dots stripped)
  • PayPal Braintree: needs access_token (underscore stripped)
  • Slack Webhook: needs URL slashes (slashes stripped)

The special_chars list in remove_url_from_keys() currently includes ., /, and _, which are all characters that appear in real credential formats.

Possible fix: split the preprocessing into two steps — remove URLs and emails first (as it already does), then run keys_extractor() on that result before stripping the remaining special characters. The special character stripping is needed for credential_extractor() but shouldn't run before keys_extractor().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions