Bug report
Bug description:
Since Python 3.13, \uXXXX escapes that create "surrogates" (values from \uD800 to \uDFFF which cannot be encoded into UTF-8) are not allowed in docstrings when compiling source code. I believe this is due to a change in #106411 where docstrings are first converted to UTF-8 and then dedented:
$ ./python
Python 3.15.0a2+ (heads/main:3db7bf2d18, Dec 8 2025, 09:38:46) [GCC 15.2.1 20251111 (Red Hat 15.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def is_ex_parrot(parrot: Parrot) -> bool:
... """Checks if the parrot is \udead"""
...
UnicodeEncodeError: 'utf-8' codec can't encode character '\udead' in position 24: surrogates not allowed
>>>
$ ./python -OO
Python 3.15.0a2+ (heads/main:3db7bf2d18, Dec 8 2025, 09:38:46) [GCC 15.2.1 20251111 (Red Hat 15.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def is_ex_parrot(parrot: Parrot) -> bool:
... """Checks if the parrot is \udead"""
...
>>> # no error because -OO turns off docstrings
A admit that this is extremely fringe, but it did break something seemingly unrelated to docstrings in IPython: ipython/ipython#15098
The compile function is documented to only raise SyntaxError when the syntax is invalid and ValueError when the source contains \x00. Perhaps this is expected behaviour, and it should just be additionally documented? (UnicodeDecodeError is a subclass off ValueError already, so maybe IPython should just catch ValueError when calling compile? Or maybe there's simply some better solution for what IPython is doing here).
CPython versions tested on:
3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
Since Python 3.13,
\uXXXXescapes that create "surrogates" (values from\uD800to\uDFFFwhich cannot be encoded into UTF-8) are not allowed in docstrings when compiling source code. I believe this is due to a change in #106411 where docstrings are first converted to UTF-8 and then dedented:A admit that this is extremely fringe, but it did break something seemingly unrelated to docstrings in IPython: ipython/ipython#15098
The
compilefunction is documented to only raiseSyntaxErrorwhen the syntax is invalid andValueErrorwhen the source contains\x00. Perhaps this is expected behaviour, and it should just be additionally documented? (UnicodeDecodeErroris a subclass offValueErroralready, so maybeIPythonshould just catchValueErrorwhen callingcompile? Or maybe there's simply some better solution for what IPython is doing here).CPython versions tested on:
3.13, CPython main branch
Operating systems tested on:
Linux
Linked PRs