Skip to content

Use repr() to escape keys in MultiDict __repr__ output#1342

Draft
aiolibsbot wants to merge 7 commits into
aio-libs:masterfrom
aiolibsbot:koan/fix-repr-quote-escaping
Draft

Use repr() to escape keys in MultiDict __repr__ output#1342
aiolibsbot wants to merge 7 commits into
aio-libs:masterfrom
aiolibsbot:koan/fix-repr-quote-escaping

Conversation

@aiolibsbot
Copy link
Copy Markdown
Contributor

@aiolibsbot aiolibsbot commented May 17, 2026

What do these changes do?

Make repr(MultiDict(...)) produce a parseable string when keys contain
quote characters. The pure-Python and C-extension code paths previously
wrapped keys in literal single quotes, so a key like a'b produced the
broken output <MultiDict('a'b': 1)>. They now format keys via Python's
repr(), which picks valid quoting automatically.

Before:

>>> repr(MultiDict([("a'b", 1)]))
"<MultiDict('a'b': 1)>"        # not valid Python

After:

>>> repr(MultiDict([("a'b", 1)]))
'<MultiDict("a\'b": 1)>'       # valid string literal

Are there changes in behavior for the user?

Only for keys containing characters that need escaping in a Python
string literal (single quotes, control characters, …). Such repr output
was already broken, so any consumer was either tolerating bad output or
already wrapping keys themselves. Plain-ASCII keys produce byte-identical
output, so the existing repr tests continue to pass unchanged.

How

  • Pure-Python (multidict/_multidict_py.py): four f"'{key}'" /
    f"'{key}': {v!r}" patterns in _ItemsView.__repr__, _KeysView.__repr__,
    MultiDict.__repr__, and MultiDictProxy.__repr__ switched to {key!r}.
  • C extension (multidict/_multilib/hashtable.h): the three-step
    write ', write str, write ' sequence in md_repr is replaced by
    a single PyUnicodeWriter_WriteRepr(writer, key). All views
    (MultiDict, proxies, keys/items views) delegate to this function, so
    one fix covers them all.

Testing

  • Three new tests in tests/test_multidict.py::BaseMultiDictTest
    (parametrized over MultiDict/CIMultiDict and both implementations)
    assert the repr of a multidict with a quoted key, plus its keys and
    items views, is the expected escaped form.
  • Full local suite: 1682 passed (was 1658 before, +24 from the new
    parametrized tests) on Python 3.12, C-ext + pure-Python.
  • pre-commit (ruff, clang-format, mypy on 3.11 + 3.13): clean.

Related issue number

None — surfaced while reviewing recent __repr__ work.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes — no public docs cover the repr
    format; the change preserves output for normal keys.
  • If you provide code modification, please add yourself to
    CONTRIBUTORS.txt — N/A, multidict has no CONTRIBUTORS.txt.
  • Add a new news fragment into the CHANGES/ folder
    • CHANGES/1342.bugfix.rst — credited :user:aiolibsbot``

Quality Report

Changes: 4 files changed, 30 insertions(+), 12 deletions(-)

Code scan: clean

Tests: failed (FAILED)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided There is a change note present in this PR label May 17, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 17, 2026

Merging this PR will degrade performance by 32.19%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 6 regressed benchmarks
✅ 238 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_multidict_insert_str[ci-c] 99.5 µs 84.6 µs +17.63%
test_multidict_repr[ci-py] 152.7 µs 243.8 µs -37.36%
test_multidict_repr[cs-py] 158.3 µs 251.3 µs -37.03%
test_items_view_repr[ci-py] 149.2 µs 239.4 µs -37.67%
test_items_view_repr[cs-py] 145.9 µs 236.7 µs -38.36%
test_keys_view_repr[ci-py] 105.7 µs 172.6 µs -38.79%
test_keys_view_repr[cs-py] 102.1 µs 169.1 µs -39.59%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing aiolibsbot:koan/fix-repr-quote-escaping (488cbf7) with master (fc6933d)

Open in CodSpeed

@Vizonex
Copy link
Copy Markdown
Member

Vizonex commented May 17, 2026

Seems we have a tiny performance regression to investigate

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 17, 2026

Looks correct, but may have been done this way for performance reasons.

Needs a history lesson

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 17, 2026

@aiolibsbot rr

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

aiolibsbot commented May 17, 2026

PR Review — Use repr() to escape keys in MultiDict repr output

Correctness fix is solid. Both code paths now produce valid Python string literals when keys contain characters that need escaping, and the C fast path is carefully gated (ASCII + no ', \, control chars, 0x7f) so the common case still emits the same single-quoted output repr() would have produced byte-for-byte. The new tests cover items view, keys view, and the dict itself with a quoted key.

Two things to address before merge:

  1. CI spell-check is failing on unparseable in the changelog fragment — add to wordlist or reword.
  2. The pure-Python paths regressed ~12% on four CodSpeed repr benchmarks because no equivalent fast path was added; mirroring the C-side gate in _multidict_py.py should restore parity without giving up the correctness fix.

One low-priority suggestion: the inline fast/slow branch in md_repr is large enough that extracting it to a helper would help readability, but it's stylistic.


🟡 Important

1. Spell check fails on 'unparseable' (`CHANGES/1342.bugfix.rst`, L3)

The CI spell-check job is rejecting unparseable (this is what @bdraco's [towncrier-fragments]:47 comment is reporting). Two ways to clear it:

  1. Add unparseable to the project's spelling wordlist (look for a cspell / codespell config or a .wordlist-style file referenced by the pre-commit/CI spell step).
  2. Reword the fragment, e.g. producing output that was not a valid Python string literal when keys contained quote characters.

Option 2 is the lower-friction fix for a one-off bugfix fragment and avoids growing the wordlist for a single rare adjective.

producing unparseable output when keys contained quote characters --
2. Pure-Python repr now ~12% slower — no fast path matches the C side (`multidict/_multidict_py.py`, L106)

CodSpeed flags a regression on the four *-py repr benchmarks (test_multidict_repr[ci-py] / [cs-py], test_items_view_repr[ci-py] / [cs-py]) at ≈ -12%. The cause is asymmetric: hashtable.h::md_repr got a fast path that wraps ASCII-safe keys in single quotes directly and only calls PyUnicodeWriter_WriteRepr when the key contains ', \, or chars < 0x20 / 0x7f, so the C-side benchmarks stay flat. The four affected lines in this file (106 _ItemsView.__repr__, 303 _KeysView.__repr__, 756 MultiDict.__repr__, 1210 MultiDictProxy.__repr__) now all call repr(key) unconditionally, which is the source of the slowdown.

Mirroring the C fast path in pure Python should bring the benchmarks back to baseline. Something like:

_KEY_NEEDS_REPR = re.compile(r"[^\x20-\x26\x28-\x5b\x5d-\x7e]|['\\]").search

def _key_repr(k: str) -> str:
    return repr(k) if (not k.isascii() or _KEY_NEEDS_REPR(k)) else f"'{k}'"

then f"{_key_repr(e.key)}: {e.value!r}" in the three items/dict reprs and _key_repr(e.key) in the keys view. The correctness fix stays — only keys that actually need escaping pay the repr() cost, exactly as in the C path.

Alternatively, acknowledge the regression on CodSpeed, but since the same shape of optimization already lives next door in the C extension, doing it in Python is the more consistent answer.

lst.append(f"{e.key!r}: {e.value!r}")

🟢 Suggestions

1. Consider extracting the fast/slow key-write into a helper (`multidict/_multilib/hashtable.h`, L1812-1844)

The new ~30-line ASCII fast-path branch sits inline inside md_repr's main loop and makes the function noticeably busier. Pulling it into a small static helper (e.g. static int write_key_repr(PyUnicodeWriter *w, PyObject *key)) would keep md_repr readable and make the optimization reusable if another view ever wants the same behavior. Strictly optional — the logic itself is correct: the guard rejects exactly the chars (', \, < 0x20, 0x7f) that would otherwise make a bare single-quote wrap diverge from repr(), and non-ASCII keys correctly fall through to PyUnicodeWriter_WriteRepr.


Checklist

  • No hardcoded secrets
  • Input validation at boundaries
  • Error handling complete (C goto fail paths preserved)
  • No performance regressions — warning #2
  • Tests cover new behavior (quoted keys, items, keys views)
  • Changelog/docs updated and lint-clean — warning #1
  • C extension memory safety / error handling preserved
  • Output of repr() is parseable Python in all cases

Summary

Correctness fix is solid. Both code paths now produce valid Python string literals when keys contain characters that need escaping, and the C fast path is carefully gated (ASCII + no ', \, control chars, 0x7f) so the common case still emits the same single-quoted output repr() would have produced byte-for-byte. The new tests cover items view, keys view, and the dict itself with a quoted key.

Two things to address before merge:

  1. CI spell-check is failing on unparseable in the changelog fragment — add to wordlist or reword.
  2. The pure-Python paths regressed ~12% on four CodSpeed repr benchmarks because no equivalent fast path was added; mirroring the C-side gate in _multidict_py.py should restore parity without giving up the correctness fix.

One low-priority suggestion: the inline fast/slow branch in md_repr is large enough that extracting it to a helper would help readability, but it's stylistic.


Automated review by Kōan06ba94d
32c0eb4

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

@Vizonex: Seems we have a tiny performance regression to investigate

Confirmed — the regression is real and consistent across all *_repr benchmarks (≈12–18%). Root cause: the previous WriteChar('\'') / WriteStr(key) / WriteChar('\'') sequence in md_repr (and f"'{key}'" on the pure-Python side) is strictly cheaper than calling into PyObject_Repr, which has to scan the string twice (once to pick a quote style, once to copy/escape) and validate printability.

The good news is that for any key that's printable ASCII and contains no ' or \\, repr(key) is provably identical to wrapping the key in single quotes — so a fast path is observably equivalent to the new behavior:

if (PyUnicode_IS_ASCII(key) && _key_is_simple(key)) {
    PyUnicodeWriter_WriteChar(writer, '\'');
    PyUnicodeWriter_WriteStr(writer, key);
    PyUnicodeWriter_WriteChar(writer, '\'');
} else {
    PyUnicodeWriter_WriteRepr(writer, key);
}

where _key_is_simple is a single O(n) scan over PyUnicode_1BYTE_DATA(key) for ', \\, and any byte < 0x20 || == 0x7f. The scan touches an already-cache-hot buffer and is much cheaper than the full repr machinery. The benchmark workloads use plain header-style keys, so this should restore the baseline. I can push this in a follow-up commit on the branch if that sounds reasonable.

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

@bdraco: Looks correct, but may have been done this way for performance reasons.

Yes, that's exactly what's going on — the original three-write sequence wasn't accidental, it was avoiding PyObject_Repr's overhead for the (overwhelmingly common) case of plain-ASCII keys. CodSpeed is catching that cost reappearing.

The bug only manifests when a key contains a character that doesn't survive being wrapped in single quotes (a ', a \\, or a control byte). For everything else, repr(key) and f"'{key}'" produce byte-identical output. So we can keep the perf-oriented fast path and only fall through to PyUnicodeWriter_WriteRepr when the key actually needs escaping:

static inline int _key_needs_repr(PyObject *key) {
    if (!PyUnicode_IS_ASCII(key)) return 1;
    const Py_ssize_t n = PyUnicode_GET_LENGTH(key);
    const Py_UCS1 *data = PyUnicode_1BYTE_DATA(key);
    for (Py_ssize_t i = 0; i < n; i++) {
        Py_UCS1 c = data[i];
        if (c == '\'' || c == '\\' || c < 0x20 || c == 0x7f) return 1;
    }
    return 0;
}

That keeps the bug fix (any key the old path would have mangled now goes through WriteRepr and gets correct escaping) while restoring the hot-path benchmark numbers. The same shape works on the pure-Python side with a precompiled regex helper. Happy to fold this into the PR.

Wrapping keys in literal single quotes produced unparseable output when
keys contained a single quote (e.g. ``"<MultiDict('a'b': 1)>"``). Both
the pure-Python and C-extension code paths now format keys via repr(),
so single quotes and other special characters round-trip correctly.
@aiolibsbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-repr-quote-escaping was rebased onto master and review feedback was applied.

Changes applied

  • Repr(writer, key), check whether the key is ASCII and contains no characters that repr()would escape (no', no \, no control chars 0x00–0x1f/0x7f). When the check passes, emit 'key'directly viaPyUnicodeWriter_WriteChar/PyUnicodeWriter_WriteStr(the pre-fix sequence). Otherwise fall back toPyUnicodeWriter_WriteReprfor correctness. Reason: CodSpeed bot reported ~17% regression ontest_multidict_reprandtest_items_view_reprfor the C path because every key was going throughPyUnicodeWriter_WriteRepr; benchmarks use plain ASCII keys ("0".."99"`) that hit the fast path, restoring original performance. Keys with quotes still get correct, parseable output through the slow path.
  • Pure-Python paths (_multidict_py.py) left unchanged: the inline isascii/isprintable/in checks needed to gate a fast path cost roughly as much as the repr() call they would replace, so adding them would not recover the ~12% pure-Python regression and would only add code noise.

Stats

4 files changed, 59 insertions(+), 12 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No recent master commit touches repr key-escaping; existing repr code still wraps keys in litera)
  • Rebased koan/fix-repr-quote-escaping onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run still pending
  • Force-pushed koan/fix-repr-quote-escaping to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@aiolibsbot aiolibsbot force-pushed the koan/fix-repr-quote-escaping branch from 7118afc to 32c0eb4 Compare May 17, 2026 02:22
@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 17, 2026

needs a spelling list update as well

[towncrier-fragments]:47: : Spell check: unparseable: producing unparseable output when keys contained quote characters --.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 17, 2026

@aiolibsbot rr

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

@Vizonex: Seems we have a tiny performance regression to investigate

The regression is real and isolated to the pure-Python side — all four flagged benchmarks have the -py suffix (test_multidict_repr[ci-py] / [cs-py], test_items_view_repr[ci-py] / [cs-py]). The C extension picked up a fast path in hashtable.h::md_repr that wraps ASCII-safe keys in single quotes directly and only calls PyUnicodeWriter_WriteRepr when the key contains ', \, or chars < 0x20 / 0x7f, so the C-side numbers stay flat. The four edits in _multidict_py.py (lines 106, 303, 756, 1210) replaced f"'{key}'" with f"{key!r}", which now invokes repr() unconditionally — that's the cost showing up in the benchmarks.

A parallel pure-Python fast path should restore parity, e.g.

_KEY_NEEDS_REPR = re.compile(r"[^\x20-\x26\x28-\x5b\x5d-\x7e]|['\\]").search

def _key_repr(k: str) -> str:
    return repr(k) if (not k.isascii() or _KEY_NEEDS_REPR(k)) else f"'{k}'"

used in the three items/dict reprs and the keys view. That keeps repr() only for the keys that actually need escaping and matches the gate already in the C path, so the common case pays no extra cost. Worth fixing here rather than acknowledging on CodSpeed, since the corresponding optimization already lives next door.

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

@bdraco: Looks correct, but may have been done this way for performance reasons.

Yes — the literal f"'{key}'" wrap was almost certainly chosen for speed. It skips the repr() call (string-table lookup, quote-style decision, escape loop) on every key in every repr, and for the common case (ASCII keys with no quotes / backslashes / control chars) it produces output byte-identical to what repr() would emit. The bug is only observable for keys that actually need escaping, which is why it stayed hidden.

That's exactly why the C side of this PR keeps the fast path: hashtable.h::md_repr now wraps ASCII-safe keys in single quotes directly and only falls through to PyUnicodeWriter_WriteRepr when a key contains ', \, < 0x20, or 0x7f. The common path is untouched, the slow path is correct.

The pure-Python side currently doesn't mirror that — _multidict_py.py calls repr(key) unconditionally — and that's where the CodSpeed -py regressions are coming from. Adding the same gate in _multidict_py.py (an isascii() + cheap reject for '\ and control chars, fall through to repr() otherwise) would preserve both the correctness fix and the original performance characteristic. Happy to push that on top of this branch.

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

@bdraco: needs a spelling list update as well

Confirmed — the towncrier-fragments spell step is flagging unparseable on line 3 of CHANGES/1342.bugfix.rst. Two routes:

  1. Add unparseable to the project's spell wordlist — look for the file referenced by the spell-check pre-commit / CI step (typically a cspell config, a codespell --ignore-words, or a .wordlist-style file).
  2. Reword the fragment, e.g. producing output that was not a valid Python string literal when keys contained quote characters.

Option 2 is probably the lighter-touch fix for a one-off bugfix fragment, since it avoids growing the wordlist for a single rare adjective and reads slightly more concretely about what the bug was.

@aiolibsbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-repr-quote-escaping was rebased onto master and review feedback was applied.

Changes applied

  • Summary:
  • Added pure-Python fast path matching C-extension fix, addressing CodSpeed ~12% regression on test_multidict_repr[*-py] and test_items_view_repr[*-py].
  • multidict/_multidict_py.py: New module-level _key_repr() helper. Uses precompiled regex _repr_needs_repr to detect chars repr() would escape (control bytes, ', \, DEL, non-ASCII). Safe keys skip repr() and use f"'{key}'" directly; unsafe keys still go through repr() for correct escaping.
  • Wired _key_repr() into all four repr sites: _ItemsView.__repr__, _KeysView.__repr__, MultiDict.__repr__, MultiDictProxy.__repr__ — replacing {key!r} / {k!r} formatting.
  • Behavior unchanged: safe keys produce byte-identical output to repr(); quoted/escaped keys still get parseable Python literals.
  • CI fix: Applied pre-push CI fix
  • CI fix: Pre-push CI fix applied

Stats

4 files changed, 73 insertions(+), 12 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No recent master commit addresses repr key escaping; commits cover unrelated work like kwargs counti)
  • Rebased koan/fix-repr-quote-escaping onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run #25977228864 failed
  • Applied pre-push CI fix
  • Pre-push CI fix applied
  • Force-pushed koan/fix-repr-quote-escaping to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.86%. Comparing base (6370ee2) to head (488cbf7).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1342   +/-   ##
=======================================
  Coverage   99.86%   99.86%           
=======================================
  Files          28       28           
  Lines        3627     3643   +16     
  Branches      265      266    +1     
=======================================
+ Hits         3622     3638   +16     
  Misses          3        3           
  Partials        2        2           
Flag Coverage Δ
CI-GHA 99.86% <100.00%> (+<0.01%) ⬆️
pytest 99.86% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@aiolibsbot aiolibsbot force-pushed the koan/fix-repr-quote-escaping branch from ef2d254 to a9b2782 Compare May 17, 2026 05:44
@Vizonex
Copy link
Copy Markdown
Member

Vizonex commented May 21, 2026

Confirmed — the regression is real and consistent across all *_repr benchmarks (≈12–18%). Root cause: the previous WriteChar('\'') / WriteStr(key) / WriteChar('\'') sequence in md_repr (and f"'{key}'" on the pure-Python side) is strictly cheaper than calling into PyObject_Repr, which has to scan the string twice (once to pick a quote style, once to
where _key_is_simple is a single O(n) scan over PyUnicode_1BYTE_DATA(key) for ', \\, and any byte < 0x20 || == 0x7f. The scan touches an already-cache-hot buffer and is much cheaper than the full repr machinery. The benchmark workloads use plain header-style keys, so this should restore the baseline. I can push this in a follow-up commit on the branch if that sounds reasonable.

Feel free and take your time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants