Skip to content

Commit 819e14e

Browse files
gpsheadclaude
andcommitted
Add canonical= kwarg to base64/base32/base85/ascii85 decoders
Gate non-zero padding bits rejection behind a new canonical= keyword argument independent of strict_mode, per discussion on gh-146311. Per RFC 4648 section 3.5 ("Canonical Encoding"), decoders MAY reject encodings where pad bits are not zero. The new canonical=True flag enables this check for a2b_base64, a2b_base32, a2b_base85, and a2b_ascii85. For base85/ascii85, the canonical check also rejects single-character final groups (never produced by a conforming encoder) and verifies that partial group encodings match what the encoder would produce. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1bf5c75 commit 819e14e

File tree

7 files changed

+426
-157
lines changed

7 files changed

+426
-157
lines changed

Doc/library/base64.rst

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,8 @@ POST request.
7676
Added the *padded* and *wrapcol* parameters.
7777

7878

79-
.. function:: b64decode(s, altchars=None, validate=False, *, padded=True)
80-
b64decode(s, altchars=None, validate=True, *, ignorechars, padded=True)
79+
.. function:: b64decode(s, altchars=None, validate=False, *, padded=True, canonical=False)
80+
b64decode(s, altchars=None, validate=True, *, ignorechars, padded=True, canonical=False)
8181
8282
Decode the Base64 encoded :term:`bytes-like object` or ASCII string
8383
*s* and return the decoded :class:`bytes`.
@@ -112,10 +112,13 @@ POST request.
112112
If *validate* is true, these non-alphabet characters in the input
113113
result in a :exc:`binascii.Error`.
114114

115+
If *canonical* is true, non-zero padding bits are rejected.
116+
See :func:`binascii.a2b_base64` for details.
117+
115118
For more information about the strict base64 check, see :func:`binascii.a2b_base64`
116119

117120
.. versionchanged:: 3.15
118-
Added the *ignorechars* and *padded* parameters.
121+
Added the *ignorechars*, *padded*, and *canonical* parameters.
119122

120123
.. deprecated:: 3.15
121124
Accepting the ``+`` and ``/`` characters with an alternative alphabet
@@ -179,7 +182,7 @@ POST request.
179182
Added the *padded* and *wrapcol* parameters.
180183

181184

182-
.. function:: b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b'')
185+
.. function:: b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b'', canonical=False)
183186

184187
Decode the Base32 encoded :term:`bytes-like object` or ASCII string *s* and
185188
return the decoded :class:`bytes`.
@@ -203,12 +206,15 @@ POST request.
203206
*ignorechars* should be a :term:`bytes-like object` containing characters
204207
to ignore from the input.
205208

209+
If *canonical* is true, non-zero padding bits are rejected.
210+
See :func:`binascii.a2b_base32` for details.
211+
206212
A :exc:`binascii.Error` is raised if *s* is
207213
incorrectly padded or if there are non-alphabet characters present in the
208214
input.
209215

210216
.. versionchanged:: next
211-
Added the *ignorechars* and *padded* parameters.
217+
Added the *ignorechars*, *padded*, and *canonical* parameters.
212218

213219

214220
.. function:: b32hexencode(s, *, padded=True, wrapcol=0)
@@ -222,7 +228,7 @@ POST request.
222228
Added the *padded* and *wrapcol* parameters.
223229

224230

225-
.. function:: b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b'')
231+
.. function:: b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b'', canonical=False)
226232

227233
Similar to :func:`b32decode` but uses the Extended Hex Alphabet, as defined in
228234
:rfc:`4648`.
@@ -235,7 +241,7 @@ POST request.
235241
.. versionadded:: 3.10
236242

237243
.. versionchanged:: next
238-
Added the *ignorechars* and *padded* parameters.
244+
Added the *ignorechars*, *padded*, and *canonical* parameters.
239245

240246

241247
.. function:: b16encode(s, *, wrapcol=0)
@@ -315,7 +321,7 @@ Refer to the documentation of the individual functions for more information.
315321
.. versionadded:: 3.4
316322

317323

318-
.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v')
324+
.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v', canonical=False)
319325

320326
Decode the Ascii85 encoded :term:`bytes-like object` or ASCII string *b* and
321327
return the decoded :class:`bytes`.
@@ -332,8 +338,14 @@ Refer to the documentation of the individual functions for more information.
332338
This should only contain whitespace characters, and by
333339
default contains all whitespace characters in ASCII.
334340

341+
If *canonical* is true, non-canonical encodings are rejected.
342+
See :func:`binascii.a2b_ascii85` for details.
343+
335344
.. versionadded:: 3.4
336345

346+
.. versionchanged:: next
347+
Added the *canonical* parameter.
348+
337349

338350
.. function:: b85encode(b, pad=False, *, wrapcol=0)
339351

@@ -353,7 +365,7 @@ Refer to the documentation of the individual functions for more information.
353365
Added the *wrapcol* parameter.
354366

355367

356-
.. function:: b85decode(b, *, ignorechars=b'')
368+
.. function:: b85decode(b, *, ignorechars=b'', canonical=False)
357369

358370
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
359371
return the decoded :class:`bytes`. Padding is implicitly removed, if
@@ -362,10 +374,13 @@ Refer to the documentation of the individual functions for more information.
362374
*ignorechars* should be a :term:`bytes-like object` containing characters
363375
to ignore from the input.
364376

377+
If *canonical* is true, non-canonical encodings are rejected.
378+
See :func:`binascii.a2b_base85` for details.
379+
365380
.. versionadded:: 3.4
366381

367382
.. versionchanged:: next
368-
Added the *ignorechars* parameter.
383+
Added the *ignorechars* and *canonical* parameters.
369384

370385

371386
.. function:: z85encode(s, pad=False, *, wrapcol=0)
@@ -390,7 +405,7 @@ Refer to the documentation of the individual functions for more information.
390405
Added the *wrapcol* parameter.
391406

392407

393-
.. function:: z85decode(s, *, ignorechars=b'')
408+
.. function:: z85decode(s, *, ignorechars=b'', canonical=False)
394409

395410
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
396411
return the decoded :class:`bytes`. See `Z85 specification
@@ -399,10 +414,13 @@ Refer to the documentation of the individual functions for more information.
399414
*ignorechars* should be a :term:`bytes-like object` containing characters
400415
to ignore from the input.
401416

417+
If *canonical* is true, non-canonical encodings are rejected.
418+
See :func:`binascii.a2b_base85` for details.
419+
402420
.. versionadded:: 3.13
403421

404422
.. versionchanged:: next
405-
Added the *ignorechars* parameter.
423+
Added the *ignorechars* and *canonical* parameters.
406424

407425

408426
.. _base64-legacy:

Doc/library/binascii.rst

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ The :mod:`!binascii` module defines the following functions:
4848
Added the *backtick* parameter.
4949

5050

51-
.. function:: a2b_base64(string, /, *, padded=True, alphabet=BASE64_ALPHABET, strict_mode=False)
52-
a2b_base64(string, /, *, ignorechars, padded=True, alphabet=BASE64_ALPHABET, strict_mode=True)
51+
.. function:: a2b_base64(string, /, *, padded=True, alphabet=BASE64_ALPHABET, strict_mode=False, canonical=False)
52+
a2b_base64(string, /, *, ignorechars, padded=True, alphabet=BASE64_ALPHABET, strict_mode=True, canonical=False)
5353
5454
Convert a block of base64 data back to binary and return the binary data. More
5555
than one line may be passed at a time.
@@ -80,11 +80,15 @@ The :mod:`!binascii` module defines the following functions:
8080
* Contains no excess data after padding (including excess padding, newlines, etc.).
8181
* Does not start with a padding.
8282

83+
If *canonical* is true, non-zero padding bits in the last group are rejected
84+
with :exc:`binascii.Error`, enforcing canonical encoding as defined in
85+
:rfc:`4648` section 3.5. This check is independent of *strict_mode*.
86+
8387
.. versionchanged:: 3.11
8488
Added the *strict_mode* parameter.
8589

8690
.. versionchanged:: 3.15
87-
Added the *alphabet*, *ignorechars* and *padded* parameters.
91+
Added the *alphabet*, *ignorechars*, *padded*, and *canonical* parameters.
8892

8993

9094
.. function:: b2a_base64(data, *, padded=True, alphabet=BASE64_ALPHABET, wrapcol=0, newline=True)
@@ -110,7 +114,7 @@ The :mod:`!binascii` module defines the following functions:
110114
Added the *alphabet*, *padded* and *wrapcol* parameters.
111115

112116

113-
.. function:: a2b_ascii85(string, /, *, foldspaces=False, adobe=False, ignorechars=b'')
117+
.. function:: a2b_ascii85(string, /, *, foldspaces=False, adobe=False, ignorechars=b'', canonical=False)
114118

115119
Convert Ascii85 data back to binary and return the binary data.
116120

@@ -132,6 +136,11 @@ The :mod:`!binascii` module defines the following functions:
132136
to ignore from the input.
133137
This should only contain whitespace characters.
134138

139+
If *canonical* is true, non-canonical encodings in the final group are
140+
rejected with :exc:`binascii.Error`. This includes single-character
141+
final groups (which no conforming encoder produces) and final groups whose
142+
padding digits are not what the encoder would produce.
143+
135144
Invalid Ascii85 data will raise :exc:`binascii.Error`.
136145

137146
.. versionadded:: 3.15
@@ -160,7 +169,7 @@ The :mod:`!binascii` module defines the following functions:
160169
.. versionadded:: 3.15
161170

162171

163-
.. function:: a2b_base85(string, /, *, alphabet=BASE85_ALPHABET, ignorechars=b'')
172+
.. function:: a2b_base85(string, /, *, alphabet=BASE85_ALPHABET, ignorechars=b'', canonical=False)
164173

165174
Convert Base85 data back to binary and return the binary data.
166175
More than one line may be passed at a time.
@@ -176,6 +185,11 @@ The :mod:`!binascii` module defines the following functions:
176185
*ignorechars* should be a :term:`bytes-like object` containing characters
177186
to ignore from the input.
178187

188+
If *canonical* is true, non-canonical encodings in the final group are
189+
rejected with :exc:`binascii.Error`. This includes single-character
190+
final groups (which no conforming encoder produces) and final groups whose
191+
padding digits are not what the encoder would produce.
192+
179193
Invalid Base85 data will raise :exc:`binascii.Error`.
180194

181195
.. versionadded:: 3.15
@@ -199,7 +213,7 @@ The :mod:`!binascii` module defines the following functions:
199213
.. versionadded:: 3.15
200214

201215

202-
.. function:: a2b_base32(string, /, *, padded=True, alphabet=BASE32_ALPHABET, ignorechars=b'')
216+
.. function:: a2b_base32(string, /, *, padded=True, alphabet=BASE32_ALPHABET, ignorechars=b'', canonical=False)
203217

204218
Convert base32 data back to binary and return the binary data.
205219

@@ -228,6 +242,10 @@ The :mod:`!binascii` module defines the following functions:
228242
presented before the end of the encoded data and the excess pad characters
229243
will be ignored.
230244

245+
If *canonical* is true, non-zero padding bits in the last group are rejected
246+
with :exc:`binascii.Error`, enforcing canonical encoding as defined in
247+
:rfc:`4648` section 3.5.
248+
231249
Invalid base32 data will raise :exc:`binascii.Error`.
232250

233251
.. versionadded:: next

Lib/base64.py

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ def b64encode(s, altchars=None, *, padded=True, wrapcol=0):
6868

6969

7070
def b64decode(s, altchars=None, validate=_NOT_SPECIFIED,
71-
*, padded=True, ignorechars=_NOT_SPECIFIED):
71+
*, padded=True, ignorechars=_NOT_SPECIFIED, canonical=False):
7272
"""Decode the Base64 encoded bytes-like object or ASCII string s.
7373
7474
Optional altchars must be a bytes-like object or ASCII string of length 2
@@ -110,11 +110,13 @@ def b64decode(s, altchars=None, validate=_NOT_SPECIFIED,
110110
alphabet = binascii.BASE64_ALPHABET[:-2] + altchars
111111
return binascii.a2b_base64(s, strict_mode=validate,
112112
alphabet=alphabet,
113-
padded=padded, ignorechars=ignorechars)
113+
padded=padded, ignorechars=ignorechars,
114+
canonical=canonical)
114115
if ignorechars is _NOT_SPECIFIED:
115116
ignorechars = b''
116117
result = binascii.a2b_base64(s, strict_mode=validate,
117-
padded=padded, ignorechars=ignorechars)
118+
padded=padded, ignorechars=ignorechars,
119+
canonical=canonical)
118120
if badchar is not None:
119121
import warnings
120122
if validate:
@@ -230,7 +232,8 @@ def b32encode(s, *, padded=True, wrapcol=0):
230232
return binascii.b2a_base32(s, padded=padded, wrapcol=wrapcol)
231233
b32encode.__doc__ = _B32_ENCODE_DOCSTRING.format(encoding='base32')
232234

233-
def b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b''):
235+
def b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b'',
236+
canonical=False):
234237
s = _bytes_from_decode_data(s)
235238
# Handle section 2.4 zero and one mapping. The flag map01 will be either
236239
# False, or the character to map the digit 1 (one) to. It should be
@@ -241,7 +244,8 @@ def b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b''):
241244
s = s.translate(bytes.maketrans(b'01', b'O' + map01))
242245
if casefold:
243246
s = s.upper()
244-
return binascii.a2b_base32(s, padded=padded, ignorechars=ignorechars)
247+
return binascii.a2b_base32(s, padded=padded, ignorechars=ignorechars,
248+
canonical=canonical)
245249
b32decode.__doc__ = _B32_DECODE_DOCSTRING.format(encoding='base32',
246250
extra_args=_B32_DECODE_MAP01_DOCSTRING)
247251

@@ -250,13 +254,15 @@ def b32hexencode(s, *, padded=True, wrapcol=0):
250254
alphabet=binascii.BASE32HEX_ALPHABET)
251255
b32hexencode.__doc__ = _B32_ENCODE_DOCSTRING.format(encoding='base32hex')
252256

253-
def b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b''):
257+
def b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b'',
258+
canonical=False):
254259
s = _bytes_from_decode_data(s)
255260
# base32hex does not have the 01 mapping
256261
if casefold:
257262
s = s.upper()
258263
return binascii.a2b_base32(s, alphabet=binascii.BASE32HEX_ALPHABET,
259-
padded=padded, ignorechars=ignorechars)
264+
padded=padded, ignorechars=ignorechars,
265+
canonical=canonical)
260266
b32hexdecode.__doc__ = _B32_DECODE_DOCSTRING.format(encoding='base32hex',
261267
extra_args='')
262268

@@ -324,7 +330,8 @@ def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
324330
return binascii.b2a_ascii85(b, foldspaces=foldspaces,
325331
adobe=adobe, wrapcol=wrapcol, pad=pad)
326332

327-
def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
333+
def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v',
334+
canonical=False):
328335
"""Decode the Ascii85 encoded bytes-like object or ASCII string b.
329336
330337
foldspaces is a flag that specifies whether the 'y' short sequence should be
@@ -338,10 +345,13 @@ def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
338345
input. This should only contain whitespace characters, and by default
339346
contains all whitespace characters in ASCII.
340347
348+
If canonical is true, non-canonical encodings are rejected.
349+
341350
The result is returned as a bytes object.
342351
"""
343352
return binascii.a2b_ascii85(b, foldspaces=foldspaces,
344-
adobe=adobe, ignorechars=ignorechars)
353+
adobe=adobe, ignorechars=ignorechars,
354+
canonical=canonical)
345355

346356
def b85encode(b, pad=False, *, wrapcol=0):
347357
"""Encode bytes-like object b in base85 format and return a bytes object.
@@ -354,12 +364,15 @@ def b85encode(b, pad=False, *, wrapcol=0):
354364
"""
355365
return binascii.b2a_base85(b, wrapcol=wrapcol, pad=pad)
356366

357-
def b85decode(b, *, ignorechars=b''):
367+
def b85decode(b, *, ignorechars=b'', canonical=False):
358368
"""Decode the base85-encoded bytes-like object or ASCII string b
359369
370+
If canonical is true, non-canonical encodings are rejected.
371+
360372
The result is returned as a bytes object.
361373
"""
362-
return binascii.a2b_base85(b, ignorechars=ignorechars)
374+
return binascii.a2b_base85(b, ignorechars=ignorechars,
375+
canonical=canonical)
363376

364377
def z85encode(s, pad=False, *, wrapcol=0):
365378
"""Encode bytes-like object b in z85 format and return a bytes object.
@@ -373,12 +386,15 @@ def z85encode(s, pad=False, *, wrapcol=0):
373386
return binascii.b2a_base85(s, wrapcol=wrapcol, pad=pad,
374387
alphabet=binascii.Z85_ALPHABET)
375388

376-
def z85decode(s, *, ignorechars=b''):
389+
def z85decode(s, *, ignorechars=b'', canonical=False):
377390
"""Decode the z85-encoded bytes-like object or ASCII string b
378391
392+
If canonical is true, non-canonical encodings are rejected.
393+
379394
The result is returned as a bytes object.
380395
"""
381-
return binascii.a2b_base85(s, alphabet=binascii.Z85_ALPHABET, ignorechars=ignorechars)
396+
return binascii.a2b_base85(s, alphabet=binascii.Z85_ALPHABET,
397+
ignorechars=ignorechars, canonical=canonical)
382398

383399
# Legacy interface. This code could be cleaned up since I don't believe
384400
# binascii has any line length limitations. It just doesn't seem worth it

Lib/test/test_base64.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -383,12 +383,12 @@ def _common_test_ignorechars(self, func):
383383

384384
def test_b64decode_invalid_chars(self):
385385
# issue 1466065: Test some invalid characters.
386-
tests = ((b'%3Q==', b'\xdd', b'%$'),
387-
(b'$3Q==', b'\xdd', b'%$'),
386+
tests = ((b'%3d==', b'\xdd', b'%$'),
387+
(b'$3d==', b'\xdd', b'%$'),
388388
(b'[==', b'', b'[='),
389-
(b'YW]0=', b'am', b']'),
390-
(b'3{Q==', b'\xdd', b'{}'),
391-
(b'3Q}==', b'\xdd', b'{}'),
389+
(b'YW]3=', b'am', b']'),
390+
(b'3{d==', b'\xdd', b'{}'),
391+
(b'3d}==', b'\xdd', b'{}'),
392392
(b'@@', b'', b'@!'),
393393
(b'!', b'', b'@!'),
394394
(b"YWJj\n", b"abc", b'\n'),

0 commit comments

Comments
 (0)