Harden access-token log masking and migrate to lazy logging#53
Conversation
7e71a57 to
8165b4d
Compare
|
@keboola-pr-reviewer review |
2 similar comments
|
@keboola-pr-reviewer review |
|
@keboola-pr-reviewer review |
Rewrites AccessTokenFilter._mask to be data-shape-aware instead of
relying on Python repr coincidences:
- Add dict branch: masks 'access_token' key at the data level, so
logger.debug("params: %s", params) is safe regardless of whether
the caller uses f-strings or %s arg form, or whether any value in
the dict contains a single quote.
- Add list branch (parity with existing tuple branch).
- Add JSON-form regex ("access_token":"...") to close the residual
leak path in raw FB error response bodies.
- Drop the dict-repr regex (superseded by the dict branch) and the
broken Exception branch (already handled via exc_text as a string).
- Drop meaningless exc_info masking (it's a (type, val, tb) tuple).
- Pre-compile regexes and factor out _TOKEN_KEY / _REDACTED constants.
Replace raw response.text dumps in page_loader with structured FB
error field logging (code + message), falling back to body-length only.
Migrate every f-string log call in src/ to lazy %s arg form and
enable ruff G (flake8-logging-format) to prevent regressions.
Add tests/test_access_token_filter.py — first coverage for the filter.
Includes a comment on the LogRecord single-Mapping-args unwrap quirk
that affects how dict args are stored on records.
8165b4d to
e50168a
Compare
Review — one blocking concernOverall this is a solid hardening pass. The shift from coincidence-based regex on Python dict repr to a real But I think dropping the Reproductionimport sys, logging
sys.path.insert(0, 'src')
from client import AccessTokenFilter
import requests
from unittest.mock import MagicMock
r = MagicMock(); r.status_code = 400; r.reason = 'Bad Request'
r.url = 'https://graph.facebook.com/v23.0/me?access_token=EAAxyz123&fields=id'
exc = requests.HTTPError(f'{r.status_code} Client Error: {r.reason} for url: {r.url}', response=r)
filt = AccessTokenFilter()
rec = logging.LogRecord('t', logging.ERROR, '', 0, 'Failed: %s', (exc,), None)
filt.filter(rec)
print('LEAK?', 'EAAxyz123' in rec.getMessage())
print('msg:', rec.getMessage())Output: Why it leaks
So the old (admittedly ugly) Suggested fixAdd one branch to if isinstance(obj, BaseException):
return self._mask(str(obj))Yes, this changes the type passed to def test_masks_exception_in_args(filt):
exc = Exception('400 for url: ...?access_token=EAAxyz')
record = make_record('failed: %s', (exc,))
filt.filter(record)
assert 'EAAxyz' not in record.getMessage()Smaller notes (non-blocking)
Happy to push the fix + test to the branch if useful. |
Summary
dictbranch toAccessTokenFilter._maskso a dict passed as a%sarg (logger.debug("params: %s", params)) is masked at the data level — previously, dicts passed as args fell through to the catch-all unchanged. The dict-repr regex still scrubs the f-string case as a defence in depth.listbranch (parity with the existingtuplebranch).access_token=...) — unchanged.'access_token': '...') — pre-existing, kept.'("access_token": "...") — newly added, closes the residual leak path on raw FB error response bodies (JSON) and on dict reprs that flip quote style.Exceptionbranch from_mask(silently downgraded the exception type tostron failure oftype(obj)(masked_str)).exc_infomasking fromfilter()—exc_infois a(type, value, traceback)tuple, not user-visible text.exc_text(the formatted string) is still masked._TOKEN_KEY/_REDACTEDconstants.response.textdumps inpage_loader.pywith structured FB error field logging (code+message), falling back to body-length only on non-JSON bodies.logger.<level>(f"...")call insrc/to lazy%-style arg form.G(flake8-logging-format) inpyproject.tomlto prevent f-string logging from creeping back in.tests/test_access_token_filter.py— first test coverage for the filter. Tests every branch including a note on theLogRecordsingle-Mapping-args unwrap quirk that affects how a dict arg is stored on the record.component.py(extracted to a local variable).No runtime behaviour changes. Masking is strictly additive (more shapes caught, none revealed). Logging output is equivalent under default formatters.
Test plan (verified locally)
uv run ruff check src/ tests/— All checks passeduv run pytest tests/test_access_token_filter.py -v— 13 passeduv run pytest tests/— 45 passedgrep -rnE 'logger\.(debug|info|warning|error|exception|critical)\(f["\x27]' src/— no matchesgrep -rn "response\.text" src/— four usages, all benign (client.py:296is for a conditional check on error text;page_loader.py:201is in error pattern matching;page_loader.py:418and:563log onlylen(response.text)in the non-JSON fallback)