Use repr() to escape keys in MultiDict __repr__ output#1342
Conversation
Merging this PR will degrade performance by 32.19%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | test_multidict_insert_str[ci-c] |
99.5 µs | 84.6 µs | +17.63% |
| ❌ | test_multidict_repr[ci-py] |
152.7 µs | 243.8 µs | -37.36% |
| ❌ | test_multidict_repr[cs-py] |
158.3 µs | 251.3 µs | -37.03% |
| ❌ | test_items_view_repr[ci-py] |
149.2 µs | 239.4 µs | -37.67% |
| ❌ | test_items_view_repr[cs-py] |
145.9 µs | 236.7 µs | -38.36% |
| ❌ | test_keys_view_repr[ci-py] |
105.7 µs | 172.6 µs | -38.79% |
| ❌ | test_keys_view_repr[cs-py] |
102.1 µs | 169.1 µs | -39.59% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing aiolibsbot:koan/fix-repr-quote-escaping (488cbf7) with master (fc6933d)
|
Seems we have a tiny performance regression to investigate |
|
Looks correct, but may have been done this way for performance reasons. Needs a history lesson |
|
@aiolibsbot rr |
PR Review — Use repr() to escape keys in MultiDict repr outputCorrectness fix is solid. Both code paths now produce valid Python string literals when keys contain characters that need escaping, and the C fast path is carefully gated (ASCII + no Two things to address before merge:
One low-priority suggestion: the inline fast/slow branch in 🟡 Important1. Spell check fails on 'unparseable' (`CHANGES/1342.bugfix.rst`, L3)The CI spell-check job is rejecting
Option 2 is the lower-friction fix for a one-off bugfix fragment and avoids growing the wordlist for a single rare adjective. 2. Pure-Python repr now ~12% slower — no fast path matches the C side (`multidict/_multidict_py.py`, L106)CodSpeed flags a regression on the four Mirroring the C fast path in pure Python should bring the benchmarks back to baseline. Something like: _KEY_NEEDS_REPR = re.compile(r"[^\x20-\x26\x28-\x5b\x5d-\x7e]|['\\]").search
def _key_repr(k: str) -> str:
return repr(k) if (not k.isascii() or _KEY_NEEDS_REPR(k)) else f"'{k}'"then Alternatively, acknowledge the regression on CodSpeed, but since the same shape of optimization already lives next door in the C extension, doing it in Python is the more consistent answer. 🟢 Suggestions1. Consider extracting the fast/slow key-write into a helper (`multidict/_multilib/hashtable.h`, L1812-1844)The new ~30-line ASCII fast-path branch sits inline inside Checklist
SummaryCorrectness fix is solid. Both code paths now produce valid Python string literals when keys contain characters that need escaping, and the C fast path is carefully gated (ASCII + no Two things to address before merge:
One low-priority suggestion: the inline fast/slow branch in |
Confirmed — the regression is real and consistent across all The good news is that for any key that's printable ASCII and contains no if (PyUnicode_IS_ASCII(key) && _key_is_simple(key)) {
PyUnicodeWriter_WriteChar(writer, '\'');
PyUnicodeWriter_WriteStr(writer, key);
PyUnicodeWriter_WriteChar(writer, '\'');
} else {
PyUnicodeWriter_WriteRepr(writer, key);
}where |
Yes, that's exactly what's going on — the original three-write sequence wasn't accidental, it was avoiding The bug only manifests when a key contains a character that doesn't survive being wrapped in single quotes (a static inline int _key_needs_repr(PyObject *key) {
if (!PyUnicode_IS_ASCII(key)) return 1;
const Py_ssize_t n = PyUnicode_GET_LENGTH(key);
const Py_UCS1 *data = PyUnicode_1BYTE_DATA(key);
for (Py_ssize_t i = 0; i < n; i++) {
Py_UCS1 c = data[i];
if (c == '\'' || c == '\\' || c < 0x20 || c == 0x7f) return 1;
}
return 0;
}That keeps the bug fix (any key the old path would have mangled now goes through |
Wrapping keys in literal single quotes produced unparseable output when
keys contained a single quote (e.g. ``"<MultiDict('a'b': 1)>"``). Both
the pure-Python and C-extension code paths now format keys via repr(),
so single quotes and other special characters round-trip correctly.
Rebase with requested adjustmentsBranch Changes applied
StatsActions performed
CI statusCI will be checked asynchronously. Automated by Kōan |
7118afc to
32c0eb4
Compare
|
needs a spelling list update as well [towncrier-fragments]:47: : Spell check: unparseable: producing unparseable output when keys contained quote characters --. |
|
@aiolibsbot rr |
The regression is real and isolated to the pure-Python side — all four flagged benchmarks have the A parallel pure-Python fast path should restore parity, e.g. _KEY_NEEDS_REPR = re.compile(r"[^\x20-\x26\x28-\x5b\x5d-\x7e]|['\\]").search
def _key_repr(k: str) -> str:
return repr(k) if (not k.isascii() or _KEY_NEEDS_REPR(k)) else f"'{k}'"used in the three items/dict reprs and the keys view. That keeps |
Yes — the literal That's exactly why the C side of this PR keeps the fast path: The pure-Python side currently doesn't mirror that — |
Confirmed — the towncrier-fragments spell step is flagging
Option 2 is probably the lighter-touch fix for a one-off bugfix fragment, since it avoids growing the wordlist for a single rare adjective and reads slightly more concretely about what the bug was. |
Rebase with requested adjustmentsBranch Changes applied
StatsActions performed
CI statusCI will be checked asynchronously. Automated by Kōan |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1342 +/- ##
=======================================
Coverage 99.86% 99.86%
=======================================
Files 28 28
Lines 3627 3643 +16
Branches 265 266 +1
=======================================
+ Hits 3622 3638 +16
Misses 3 3
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ef2d254 to
a9b2782
Compare
Feel free and take your time :) |
What do these changes do?
Make
repr(MultiDict(...))produce a parseable string when keys containquote characters. The pure-Python and C-extension code paths previously
wrapped keys in literal single quotes, so a key like
a'bproduced thebroken output
<MultiDict('a'b': 1)>. They now format keys via Python'srepr(), which picks valid quoting automatically.Before:
After:
Are there changes in behavior for the user?
Only for keys containing characters that need escaping in a Python
string literal (single quotes, control characters, …). Such repr output
was already broken, so any consumer was either tolerating bad output or
already wrapping keys themselves. Plain-ASCII keys produce byte-identical
output, so the existing repr tests continue to pass unchanged.
How
multidict/_multidict_py.py): fourf"'{key}'"/f"'{key}': {v!r}"patterns in_ItemsView.__repr__,_KeysView.__repr__,MultiDict.__repr__, andMultiDictProxy.__repr__switched to{key!r}.multidict/_multilib/hashtable.h): the three-stepwrite
', write str, write'sequence inmd_repris replaced bya single
PyUnicodeWriter_WriteRepr(writer, key). All views(
MultiDict, proxies, keys/items views) delegate to this function, soone fix covers them all.
Testing
tests/test_multidict.py::BaseMultiDictTest(parametrized over
MultiDict/CIMultiDictand both implementations)assert the repr of a multidict with a quoted key, plus its keys and
items views, is the expected escaped form.
1682 passed(was 1658 before, +24 from the newparametrized tests) on Python 3.12, C-ext + pure-Python.
pre-commit(ruff, clang-format, mypy on 3.11 + 3.13): clean.Related issue number
None — surfaced while reviewing recent
__repr__work.Checklist
format; the change preserves output for normal keys.
CONTRIBUTORS.txt— N/A, multidict has noCONTRIBUTORS.txt.CHANGES/folderCHANGES/1342.bugfix.rst— credited:user:aiolibsbot``Quality Report
Changes: 4 files changed, 30 insertions(+), 12 deletions(-)
Code scan: clean
Tests: failed (FAILED)
Branch hygiene: clean
Generated by Kōan post-mission quality pipeline