[mypyc] Add librt.strings.isidentifier codepoint primitive#21522
[mypyc] Add librt.strings.isidentifier codepoint primitive#21522VaggelisD wants to merge 2 commits into
librt.strings.isidentifier codepoint primitive#21522Conversation
True if a codepoint can start a valid identifier (XID_Start, per PEP 3131). The ASCII fast path covers `[A-Za-z_]` inline; non-ASCII codepoints round-trip through PyUnicode_FromOrdinal + PyUnicode_IsIdentifier so the answer matches str.isidentifier on a 1-character string. The non-ASCII path is the first allocating helper in this series, so its body lives out-of-line in codepoint_extra_ops.c (it would otherwise be emitted as a separate copy in every translation unit that includes the header). On OOM it swallows the exception via PyErr_Clear() and returns False, which keeps the function ERR_NEVER. Documented at the call site so callers don't get a surprising silent failure. Stack: depends on the librt.strings.isspace primitive.
add1d44 to
283b2f8
Compare
This comment has been minimized.
This comment has been minimized.
| if (s == NULL) { | ||
| // OOM. Swallow and return false to keep the function ERR_NEVER; | ||
| // callers expect a defined answer, not a propagated exception. | ||
| PyErr_Clear(); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
we usually call CPyError_OutOfMemory() in this case to abort, i think it would make sense here as well.
| ModDesc( | ||
| "librt.strings", | ||
| ["strings/librt_strings.c", "codepoint_extra_ops.c"], | ||
| ["codepoint_extra_ops.h"], | ||
| ["strings"], | ||
| ), |
There was a problem hiding this comment.
sorry for not catching this earlier but it looks like the files are dependent on each other the wrong way. the _extra_ops files should be internal to mypyc and shouldn't be needed when building librt modules.
so instead of including codepoint_extra_ops in librt_strings we need it the other way around, with the common functionality in librt_strings. like it's done with the other _extra_ops files.
There was a problem hiding this comment.
No worries, that's on me first!
There was a problem hiding this comment.
thanks! i see there's still #include "codepoint_extra_ops.h" in librt_strings.c and it would be good to clean that up by moving the functions from there to librt_strings.h.
that would make the codepoint_extra_ops files empty so maybe they aren't actually needed and we could just compile the functions statically straight from librt_strings.h? unless you're planning on adding functions later that would be better placed outside of librt_strings.h.
- codepoint_extra_ops.h: include CPy.h and move the isidentifier slow
path inline into LibRTStrings_IsIdentifier. Aborts via
CPyError_OutOfMemory on allocation failure (the helper is ERR_NEVER,
so returning a silently-wrong bool under memory pressure was the wrong
contract). Matches the pattern in the sibling _extra_ops.h files (all
bodies static inline, CPy.h included for runtime helpers).
- codepoint_extra_ops.c: reduce to a single-line shim that #includes the
header. Exists only so SourceDep("codepoint_extra_ops.c") pulls the
header into user mypyc extensions in include_runtime_files mode.
- build.py / lib-rt/setup.py: drop codepoint_extra_ops.c from the
librt.strings module sources. The _extra_ops.c files are mypyc-internal
(linked into user extensions via SourceDep in mypyc/ir/deps.py); the
librt.strings Python module shouldn't depend on them, matching how
bytes_extra_ops, str_extra_ops, etc. are organized. librt.strings now
picks up LibRTStrings_IsIdentifier via #include of the header.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
5th PR of #21418