Bug report
#83518 changed handling of non-ASCII characters in encodings.normalize_encoding(), but it is still inconsistent with codecs.lookup(), and not even self-consistent. For example:
>>> import encodings
>>> encodings.normalize_encoding('a¤b')
'a_b'
>>> encodings.normalize_encoding('aæb')
'ab'
>>> encodings.normalize_encoding('a-¤')
'a'
>>> encodings.normalize_encoding('a-æ')
'a_'
>>> encodings.normalize_encoding('a-¤-b')
'a_b'
>>> encodings.normalize_encoding('a-æ-b')
'a__b'
You can even get an underscore at the end or repeated underscores in the middle.
cc @malemburg, @vstinner, @shihai1991
Linked PRs
Bug report
#83518 changed handling of non-ASCII characters in
encodings.normalize_encoding(), but it is still inconsistent withcodecs.lookup(), and not even self-consistent. For example:You can even get an underscore at the end or repeated underscores in the middle.
cc @malemburg, @vstinner, @shihai1991
Linked PRs