gh-138270: Use `PyUnicodeWriter` in `csv.writer` by maurycy · Pull Request #138271 · python/cpython

maurycy · 2025-08-30T20:51:30Z

The purpose of this PR is not performance but using the modern https://docs.python.org/dev/c-api/unicode.html#c.PyUnicodeWriter API, similarly to gh-125196.

There's a risk that the code is slower, as it turned out in gh-133968. I'd prefer optimizing it after getting an ack that this is the correct direction.

Similarly to #138214 (comment), I'm not sure what is the best benchmarking strategy, besides a simple snippet. Perhaps we need https://github.com/nineteendo/jsonyx-performance-tests but for CSV.

I believe that csv.reader (ReaderObj) could also use PyUnicodeWriter. If my thinking is sound, if there's any interest and this code is OK, I can handle it.

Issue: csv module should use PyUnicodeWriter #138270

Modules/_csv.c

maurycy · 2025-08-30T21:59:00Z

cc @vstinner

picnixz

Please:

don't add comments for self-explanatory code;
follow PEP-7 for C code;
revert unrelated changes;
provide benchmarks to show whether this speeds things up or not.

picnixz · 2025-08-30T23:34:46Z

Modules/_csv.c

+                    c == dialect->escapechar ||
+                    c == dialect->quotechar) {
+                    if (dialect->escapechar == NOT_SET) {
+                        PyErr_SetString(self->error_obj, "need to escape, but no escapechar set");


Don't change this.

Modules/_csv.c

picnixz · 2025-08-30T23:50:34Z

Modules/_csv.c

+    bool first_field_was_empty_like = false;
+    bool first_field_was_none = false;
+    bool first_field_was_quoted_in_loop = false;


Why are those now needed?

bedevere-app · 2025-08-30T23:51:11Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

picnixz · 2025-08-30T23:52:42Z

And yes, it could be meaningful to use PyUnicodeWriter instead of manual buffer constructions, but we need to check if this really improves things or not before deciding whether we can do it.

maurycy · 2025-09-01T01:12:56Z

Marking this as a draft.

There's performance regression:

Benchmark	bench_csv_writer.main	bench_csv_writer.csv-writer-pyunicodewriter
writerows 10 integer rows	10.6 us	12.1 us: 1.14x slower
writerows 10 complex string rows	6.64 us	7.98 us: 1.20x slower
writerows 1000 integer rows	964 us	1.11 ms: 1.15x slower
writerows 1000 complex string rows	575 us	699 us: 1.22x slower
writerows 10000 integer rows	9.76 ms	11.1 ms: 1.14x slower
writerows 10000 complex string rows	5.69 ms	6.95 ms: 1.22x slower
Geometric mean	(ref)	1.18x slower

The script:

import csv
import io
import pyperf

runner = pyperf.Runner()

INT_ROW = list(range(10))
COMPLEX_STRING_ROW = ['a,b', 'c"d', 'e\nf'] * 3 + ['ghi']

def write_the_rows(rows):
    f = io.StringIO()
    writer = csv.writer(f)
    writer.writerows(rows)

for num_rows in (10, 1_000, 10_000, ):
    int_rows = [INT_ROW] * num_rows
    complex_rows = [COMPLEX_STRING_ROW] * num_rows

    runner.bench_func(
        f'writerows {num_rows} integer rows',
        write_the_rows,
        int_rows
    )

    runner.bench_func(
        f'writerows {num_rows} complex string rows',
        write_the_rows,
        complex_rows
    )

There are two obvious issues:

naive PyUnicodeWriter_WriteChar(), while PyUnicodeWriter_WriteSubstring() is possible sometimes.
two pass (present also in the current version, though).

I need some time to address it, perhaps with jumping similar to gh-138214.

vstinner · 2025-09-01T17:54:45Z

Modules/_csv.c

-    /* grow record buffer if necessary */
-    if (!join_check_rec_size(self, self->rec_len + terminator_len))
-        return 0;
+            if (PyUnicodeWriter_WriteChar(writer, c) < 0) {


It would be interesting to try calling WriteSubstring() at once rather than writing characters one by one.

maurycy · 2026-03-16T13:49:50Z

This was mostly exploratory, and the results aren't exactly promising.

There's definitely more to be done around the csv module but it shouldn't come at the expense of simplicity.

csv.writer w/ PyUnicodeWriter

efe921c

bedevere-app bot mentioned this pull request Aug 30, 2025

csv module should use PyUnicodeWriter #138270

Closed

style; need to turn some vscode ext

d5c8539

StanFromIreland reviewed Aug 30, 2025

View reviewed changes

Modules/_csv.c Outdated Show resolved Hide resolved

blurb

f6db601

maurycy marked this pull request as ready for review August 30, 2025 21:58

bedevere-app bot added the awaiting review label Aug 30, 2025

picnixz requested changes Aug 30, 2025

View reviewed changes

bedevere-app bot removed the awaiting review label Aug 30, 2025

bedevere-app bot added the awaiting changes label Aug 30, 2025

maurycy added 8 commits August 31, 2025 02:52

pep7, redundant comments

3c3f7ec

defensive check

a9e6d3e

pep7

07ad8d0

s/is_none/null_field/

66cf1b0

one-line cond

dbd0b3d

PyUnicodeWriter_Discard no/op for NULL

d677161

move error_after_iter

1836417

c'mon

88e4a18

maurycy marked this pull request as draft September 1, 2025 01:13

bedevere-app bot removed the awaiting changes label Sep 1, 2025

vstinner reviewed Sep 1, 2025

View reviewed changes

maurycy closed this Mar 16, 2026

Uh oh!

Conversation

maurycy commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maurycy commented Aug 30, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Aug 30, 2025

Uh oh!

picnixz commented Aug 30, 2025

Uh oh!

maurycy commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

maurycy commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maurycy commented Aug 30, 2025 •

edited

Loading

maurycy commented Sep 1, 2025 •

edited

Loading

maurycy commented Mar 16, 2026 •

edited

Loading