Skip to content

Commit c5fcdb4

Browse files
authored
gh-146311: Reject non-canonical padding bits in base32, 64, & 85 decoding (GH-146312)
Add `canonical=False` keyword argument to `a2b_base64`, `a2b_base32`, `a2b_base85`, and `a2b_ascii85` (and their `base64` module wrappers). When `canonical=True`, non-canonical encodings are rejected per [RFC 4648 section 3.5](https://datatracker.ietf.org/doc/html/rfc4648.html#section-3.5). This is independent of `strict_mode`. For base85/ascii85, the check also rejects single-character final groups (never produced by a conforming encoder) and verifies partial group padding matches what the encoder would produce. Co-authored-by: Serhiy Storchaka via lots of great code review!
1 parent b2f126c commit c5fcdb4

12 files changed

Lines changed: 611 additions & 98 deletions

Doc/library/base64.rst

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ POST request.
7373
Added the *padded* and *wrapcol* parameters.
7474

7575

76-
.. function:: b64decode(s, altchars=None, validate=False, *, padded=True)
77-
b64decode(s, altchars=None, validate=True, *, ignorechars, padded=True)
76+
.. function:: b64decode(s, altchars=None, validate=False, *, padded=True, canonical=False)
77+
b64decode(s, altchars=None, validate=True, *, ignorechars, padded=True, canonical=False)
7878
7979
Decode the Base64 encoded :term:`bytes-like object` or ASCII string
8080
*s* and return the decoded :class:`bytes`.
@@ -112,10 +112,13 @@ POST request.
112112
If *validate* is true, these non-alphabet characters in the input
113113
result in a :exc:`binascii.Error`.
114114

115+
If *canonical* is true, non-zero padding bits are rejected.
116+
See :func:`binascii.a2b_base64` for details.
117+
115118
For more information about the strict base64 check, see :func:`binascii.a2b_base64`
116119

117120
.. versionchanged:: 3.15
118-
Added the *ignorechars* and *padded* parameters.
121+
Added the *canonical*, *ignorechars*, and *padded* parameters.
119122

120123
.. deprecated:: 3.15
121124
Accepting the ``+`` and ``/`` characters with an alternative alphabet
@@ -179,7 +182,7 @@ POST request.
179182
Added the *padded* and *wrapcol* parameters.
180183

181184

182-
.. function:: b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b'')
185+
.. function:: b32decode(s, casefold=False, map01=None, *, padded=True, ignorechars=b'', canonical=False)
183186

184187
Decode the Base32 encoded :term:`bytes-like object` or ASCII string *s* and
185188
return the decoded :class:`bytes`.
@@ -205,12 +208,15 @@ POST request.
205208
*ignorechars* should be a :term:`bytes-like object` containing characters
206209
to ignore from the input.
207210

211+
If *canonical* is true, non-zero padding bits are rejected.
212+
See :func:`binascii.a2b_base32` for details.
213+
208214
A :exc:`binascii.Error` is raised if *s* is
209215
incorrectly padded or if there are non-alphabet characters present in the
210216
input.
211217

212218
.. versionchanged:: 3.15
213-
Added the *ignorechars* and *padded* parameters.
219+
Added the *canonical*, *ignorechars*, and *padded* parameters.
214220

215221

216222
.. function:: b32hexencode(s, *, padded=True, wrapcol=0)
@@ -224,7 +230,7 @@ POST request.
224230
Added the *padded* and *wrapcol* parameters.
225231

226232

227-
.. function:: b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b'')
233+
.. function:: b32hexdecode(s, casefold=False, *, padded=True, ignorechars=b'', canonical=False)
228234

229235
Similar to :func:`b32decode` but uses the Extended Hex Alphabet, as defined in
230236
:rfc:`4648`.
@@ -237,7 +243,7 @@ POST request.
237243
.. versionadded:: 3.10
238244

239245
.. versionchanged:: 3.15
240-
Added the *ignorechars* and *padded* parameters.
246+
Added the *canonical*, *ignorechars*, and *padded* parameters.
241247

242248

243249
.. function:: b16encode(s, *, wrapcol=0)
@@ -317,7 +323,7 @@ Refer to the documentation of the individual functions for more information.
317323
.. versionadded:: 3.4
318324

319325

320-
.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v')
326+
.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v', canonical=False)
321327

322328
Decode the Ascii85 encoded :term:`bytes-like object` or ASCII string *b* and
323329
return the decoded :class:`bytes`.
@@ -334,8 +340,16 @@ Refer to the documentation of the individual functions for more information.
334340
This should only contain whitespace characters, and by
335341
default contains all whitespace characters in ASCII.
336342

343+
If *canonical* is true, non-canonical encodings are rejected.
344+
See :func:`binascii.a2b_ascii85` for details.
345+
337346
.. versionadded:: 3.4
338347

348+
.. versionchanged:: next
349+
Added the *canonical* parameter.
350+
Single-character final groups are now always rejected as encoding
351+
violations.
352+
339353

340354
.. function:: b85encode(b, pad=False, *, wrapcol=0)
341355

@@ -355,7 +369,7 @@ Refer to the documentation of the individual functions for more information.
355369
Added the *wrapcol* parameter.
356370

357371

358-
.. function:: b85decode(b, *, ignorechars=b'')
372+
.. function:: b85decode(b, *, ignorechars=b'', canonical=False)
359373

360374
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
361375
return the decoded :class:`bytes`. Padding is implicitly removed, if
@@ -364,10 +378,15 @@ Refer to the documentation of the individual functions for more information.
364378
*ignorechars* should be a :term:`bytes-like object` containing characters
365379
to ignore from the input.
366380

381+
If *canonical* is true, non-canonical encodings are rejected.
382+
See :func:`binascii.a2b_base85` for details.
383+
367384
.. versionadded:: 3.4
368385

369386
.. versionchanged:: 3.15
370-
Added the *ignorechars* parameter.
387+
Added the *canonical* and *ignorechars* parameters.
388+
Single-character final groups are now always rejected as encoding
389+
violations.
371390

372391

373392
.. function:: z85encode(s, pad=False, *, wrapcol=0)
@@ -392,7 +411,7 @@ Refer to the documentation of the individual functions for more information.
392411
Added the *wrapcol* parameter.
393412

394413

395-
.. function:: z85decode(s, *, ignorechars=b'')
414+
.. function:: z85decode(s, *, ignorechars=b'', canonical=False)
396415

397416
Decode the Z85-encoded :term:`bytes-like object` or ASCII string *s* and
398417
return the decoded :class:`bytes`. See `Z85 specification
@@ -401,10 +420,15 @@ Refer to the documentation of the individual functions for more information.
401420
*ignorechars* should be a :term:`bytes-like object` containing characters
402421
to ignore from the input.
403422

423+
If *canonical* is true, non-canonical encodings are rejected.
424+
See :func:`binascii.a2b_base85` for details.
425+
404426
.. versionadded:: 3.13
405427

406428
.. versionchanged:: 3.15
407-
Added the *ignorechars* parameter.
429+
Added the *canonical* and *ignorechars* parameters.
430+
Single-character final groups are now always rejected as encoding
431+
violations.
408432

409433

410434
.. _base64-legacy:

Doc/library/binascii.rst

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ The :mod:`!binascii` module defines the following functions:
4848
Added the *backtick* parameter.
4949

5050

51-
.. function:: a2b_base64(string, /, *, padded=True, alphabet=BASE64_ALPHABET, strict_mode=False)
52-
a2b_base64(string, /, *, ignorechars, padded=True, alphabet=BASE64_ALPHABET, strict_mode=True)
51+
.. function:: a2b_base64(string, /, *, padded=True, alphabet=BASE64_ALPHABET, strict_mode=False, canonical=False)
52+
a2b_base64(string, /, *, ignorechars, padded=True, alphabet=BASE64_ALPHABET, strict_mode=True, canonical=False)
5353
5454
Convert a block of base64 data back to binary and return the binary data. More
5555
than one line may be passed at a time.
@@ -83,11 +83,15 @@ The :mod:`!binascii` module defines the following functions:
8383
* Contains no excess data after padding (including excess padding, newlines, etc.).
8484
* Does not start with a padding.
8585

86+
If *canonical* is true, non-zero padding bits in the last group are rejected
87+
with :exc:`binascii.Error`, enforcing canonical encoding as defined in
88+
:rfc:`4648` section 3.5. This check is independent of *strict_mode*.
89+
8690
.. versionchanged:: 3.11
8791
Added the *strict_mode* parameter.
8892

8993
.. versionchanged:: 3.15
90-
Added the *alphabet*, *ignorechars* and *padded* parameters.
94+
Added the *alphabet*, *canonical*, *ignorechars*, and *padded* parameters.
9195

9296

9397
.. function:: b2a_base64(data, *, padded=True, alphabet=BASE64_ALPHABET, wrapcol=0, newline=True)
@@ -113,7 +117,7 @@ The :mod:`!binascii` module defines the following functions:
113117
Added the *alphabet*, *padded* and *wrapcol* parameters.
114118

115119

116-
.. function:: a2b_ascii85(string, /, *, foldspaces=False, adobe=False, ignorechars=b'')
120+
.. function:: a2b_ascii85(string, /, *, foldspaces=False, adobe=False, ignorechars=b'', canonical=False)
117121

118122
Convert Ascii85 data back to binary and return the binary data.
119123

@@ -122,7 +126,8 @@ The :mod:`!binascii` module defines the following functions:
122126
characters). Each group encodes 32 bits of binary data in the range from
123127
``0`` to ``2 ** 32 - 1``, inclusive. The special character ``z`` is
124128
accepted as a short form of the group ``!!!!!``, which encodes four
125-
consecutive null bytes.
129+
consecutive null bytes. A single-character final group is always rejected
130+
as an encoding violation.
126131

127132
*foldspaces* is a flag that specifies whether the 'y' short sequence
128133
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
@@ -135,6 +140,12 @@ The :mod:`!binascii` module defines the following functions:
135140
to ignore from the input.
136141
This should only contain whitespace characters.
137142

143+
If *canonical* is true, non-canonical encodings are rejected with
144+
:exc:`binascii.Error`. Here "canonical" means the encoding that
145+
:func:`b2a_ascii85` would produce: the ``z`` abbreviation must be used
146+
for all-zero groups (rather than ``!!!!!``), and partial final groups
147+
must use the same padding digits as the encoder.
148+
138149
Invalid Ascii85 data will raise :exc:`binascii.Error`.
139150

140151
.. versionadded:: 3.15
@@ -163,22 +174,28 @@ The :mod:`!binascii` module defines the following functions:
163174
.. versionadded:: 3.15
164175

165176

166-
.. function:: a2b_base85(string, /, *, alphabet=BASE85_ALPHABET, ignorechars=b'')
177+
.. function:: a2b_base85(string, /, *, alphabet=BASE85_ALPHABET, ignorechars=b'', canonical=False)
167178

168179
Convert Base85 data back to binary and return the binary data.
169180
More than one line may be passed at a time.
170181

171182
Valid Base85 data contains characters from the Base85 alphabet in groups
172183
of five (except for the final group, which may have from two to five
173184
characters). Each group encodes 32 bits of binary data in the range from
174-
``0`` to ``2 ** 32 - 1``, inclusive.
185+
``0`` to ``2 ** 32 - 1``, inclusive. A single-character final group is
186+
always rejected as an encoding violation.
175187

176188
Optional *alphabet* must be a :class:`bytes` object of length 85 which
177189
specifies an alternative alphabet.
178190

179191
*ignorechars* should be a :term:`bytes-like object` containing characters
180192
to ignore from the input.
181193

194+
If *canonical* is true, non-canonical encodings are rejected with
195+
:exc:`binascii.Error`. Here "canonical" means the encoding that
196+
:func:`b2a_base85` would produce: partial final groups must use the
197+
same padding digits as the encoder.
198+
182199
Invalid Base85 data will raise :exc:`binascii.Error`.
183200

184201
.. versionadded:: 3.15
@@ -202,7 +219,7 @@ The :mod:`!binascii` module defines the following functions:
202219
.. versionadded:: 3.15
203220

204221

205-
.. function:: a2b_base32(string, /, *, padded=True, alphabet=BASE32_ALPHABET, ignorechars=b'')
222+
.. function:: a2b_base32(string, /, *, padded=True, alphabet=BASE32_ALPHABET, ignorechars=b'', canonical=False)
206223

207224
Convert base32 data back to binary and return the binary data.
208225

@@ -231,6 +248,10 @@ The :mod:`!binascii` module defines the following functions:
231248
presented before the end of the encoded data and the excess pad characters
232249
will be ignored.
233250

251+
If *canonical* is true, non-zero padding bits in the last group are rejected
252+
with :exc:`binascii.Error`, enforcing canonical encoding as defined in
253+
:rfc:`4648` section 3.5.
254+
234255
Invalid base32 data will raise :exc:`binascii.Error`.
235256

236257
.. versionadded:: 3.15

Doc/whatsnew/3.15.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,6 +729,15 @@ base64
729729
:func:`~base64.z85decode`.
730730
(Contributed by Serhiy Storchaka in :gh:`144001` and :gh:`146431`.)
731731

732+
* Added the *canonical* parameter in
733+
:func:`~base64.b32decode`, :func:`~base64.b32hexdecode`,
734+
:func:`~base64.b64decode`, :func:`~base64.urlsafe_b64decode`,
735+
:func:`~base64.a85decode`, :func:`~base64.b85decode`, and
736+
:func:`~base64.z85decode`,
737+
to reject encodings with non-zero padding bits or other non-canonical
738+
forms.
739+
(Contributed by Gregory P. Smith in :gh:`146311`.)
740+
732741

733742
binascii
734743
--------
@@ -762,6 +771,10 @@ binascii
762771
:func:`~binascii.unhexlify`, and :func:`~binascii.a2b_base64`.
763772
(Contributed by Serhiy Storchaka in :gh:`144001` and :gh:`146431`.)
764773

774+
* Added the *canonical* parameter in :func:`~binascii.a2b_base64`,
775+
to reject encodings with non-zero padding bits.
776+
(Contributed by Gregory P. Smith in :gh:`146311`.)
777+
765778

766779
calendar
767780
--------

Include/internal/pycore_global_objects_fini_generated.h

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_global_strings.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,7 @@ struct _Py_global_strings {
359359
STRUCT_FOR_ID(callable)
360360
STRUCT_FOR_ID(callback)
361361
STRUCT_FOR_ID(cancel)
362+
STRUCT_FOR_ID(canonical)
362363
STRUCT_FOR_ID(capath)
363364
STRUCT_FOR_ID(capitals)
364365
STRUCT_FOR_ID(category)

Include/internal/pycore_runtime_init_generated.h

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_unicodeobject_generated.h

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)