Skip to content

Commit 9e82bb7

Browse files
committed
Revise doc for repack
1 parent 725b1a3 commit 9e82bb7

File tree

1 file changed

+24
-22
lines changed

1 file changed

+24
-22
lines changed

Doc/library/zipfile.rst

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -544,38 +544,40 @@ ZipFile Objects
544544
.. method:: ZipFile.repack(removed=None, *, \
545545
strict_descriptor=False[, chunk_size])
546546

547-
Rewrites the archive to remove stale local file entries, shrinking its file
548-
size. The archive must be opened with mode ``'a'``.
547+
Rewrites the archive to remove unreferenced local file entries, shrinking
548+
its file size. The archive must be opened with mode ``'a'``.
549549

550550
If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects
551-
representing removed entries; only their corresponding local file entries
552-
will be removed.
553-
554-
If *removed* is not provided, the archive is scanned to identify and remove
555-
local file entries that are no longer referenced in the central directory.
556-
The algorithm assumes that local file entries (and the central directory,
557-
which is mostly treated as the "last entry") are stored consecutively:
558-
559-
#. Data before the first referenced entry is removed only when it appears to
560-
be a sequence of consecutive entries with no extra following bytes; extra
561-
preceding bytes are preserved.
562-
#. Data between referenced entries is removed only when it appears to
563-
be a sequence of consecutive entries with no extra preceding bytes; extra
564-
following bytes are preserved.
565-
#. Entries must not overlap. If any entry's data overlaps with another, a
566-
:exc:`BadZipFile` error is raised and no changes are made.
551+
representing the recently removed members, and only their corresponding
552+
local file entries will be removed. Otherwise, the archive is scanned to
553+
locate and remove local file entries that are no longer referenced in the
554+
central directory.
567555

568556
When scanning, setting ``strict_descriptor=True`` disables detection of any
569-
entry using an unsigned data descriptor (deprecated in the ZIP specification
570-
since version 6.3.0, released on 2006-09-29, and used only by some legacy
571-
tools). This improves performance, but may cause some stale entries to be
572-
preserved.
557+
entry using an unsigned data descriptor (a format deprecated by the ZIP
558+
specification since version 6.3.0, released on 2006-09-29, and used only by
559+
some legacy tools), which is significantly slower to scan (around 100 to
560+
1000 times). This does not affect performance on entries without such
561+
feature.
573562

574563
*chunk_size* may be specified to control the buffer size when moving
575564
entry data (default is 1 MiB).
576565

577566
Calling :meth:`repack` on a closed ZipFile will raise a :exc:`ValueError`.
578567

568+
.. note::
569+
The scanning algorithm is heuristic-based and assumes that the ZIP file
570+
is normally structured—for example, with local file entries stored
571+
consecutively, without overlap or interleaved binary data. Prepended
572+
binary data, such as a self-extractor stub, is recognized and preserved
573+
unless it happens to contain bytes that coincidentally resemble a valid
574+
local file entry in multiple respects—an extremely rare case. Embedded
575+
ZIP payloads are also handled correctly, as long as they follow normal
576+
structure. However, the algorithm does not guarantee correctness or
577+
safety on untrusted or intentionally crafted input. It is generally
578+
recommended to provide the *removed* argument for better reliability and
579+
performance.
580+
579581
.. versionadded:: next
580582

581583

0 commit comments

Comments
 (0)