Bug report
Bug description:
The state machine:
|
parse_process_char(ReaderObj *self, _csvstate *module_state, Py_UCS4 c) |
is called for every character processed by csv.reader:
|
while (linelen--) { |
|
c = PyUnicode_READ(kind, data, pos); |
|
if (parse_process_char(self, module_state, c) < 0) { |
|
Py_DECREF(lineobj); |
|
goto err; |
|
} |
Even putting aside sophisticated SIMD or branching optimizations, it could be more efficient.
Most time is likely to be spent in a field (IN_FIELD, IN_QUOTED_FIELD). It's more efficient to find interesting characters (ie: escapes, quotes) and just copy the whole slice in between.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
The state machine:
cpython/Modules/_csv.c
Line 726 in bbcb75c
is called for every character processed by
csv.reader:cpython/Modules/_csv.c
Lines 969 to 974 in bbcb75c
Even putting aside sophisticated SIMD or branching optimizations, it could be more efficient.
Most time is likely to be spent in a field (
IN_FIELD,IN_QUOTED_FIELD). It's more efficient to find interesting characters (ie: escapes, quotes) and just copy the whole slice in between.CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
csv.readerup to 2x faster #138214