@@ -544,38 +544,40 @@ ZipFile Objects
544544.. method :: ZipFile.repack(removed=None, *, \
545545 strict_descriptor=False[, chunk_size])
546546
547- Rewrites the archive to remove stale local file entries, shrinking its file
548- size. The archive must be opened with mode ``'a' ``.
547+ Rewrites the archive to remove unreferenced local file entries, shrinking
548+ its file size. The archive must be opened with mode ``'a' ``.
549549
550550 If *removed * is provided, it must be a sequence of :class: `ZipInfo ` objects
551- representing removed entries; only their corresponding local file entries
552- will be removed.
553-
554- If *removed * is not provided, the archive is scanned to identify and remove
555- local file entries that are no longer referenced in the central directory.
556- The algorithm assumes that local file entries (and the central directory,
557- which is mostly treated as the "last entry") are stored consecutively:
558-
559- #. Data before the first referenced entry is removed only when it appears to
560- be a sequence of consecutive entries with no extra following bytes; extra
561- preceding bytes are preserved.
562- #. Data between referenced entries is removed only when it appears to
563- be a sequence of consecutive entries with no extra preceding bytes; extra
564- following bytes are preserved.
565- #. Entries must not overlap. If any entry's data overlaps with another, a
566- :exc: `BadZipFile ` error is raised and no changes are made.
551+ representing the recently removed members, and only their corresponding
552+ local file entries will be removed. Otherwise, the archive is scanned to
553+ locate and remove local file entries that are no longer referenced in the
554+ central directory.
567555
568556 When scanning, setting ``strict_descriptor=True `` disables detection of any
569- entry using an unsigned data descriptor (deprecated in the ZIP specification
570- since version 6.3.0, released on 2006-09-29, and used only by some legacy
571- tools). This improves performance, but may cause some stale entries to be
572- preserved.
557+ entry using an unsigned data descriptor (a format deprecated by the ZIP
558+ specification since version 6.3.0, released on 2006-09-29, and used only by
559+ some legacy tools), which is significantly slower to scan (around 100 to
560+ 1000 times). This does not affect performance on entries without such
561+ feature.
573562
574563 *chunk_size * may be specified to control the buffer size when moving
575564 entry data (default is 1 MiB).
576565
577566 Calling :meth: `repack ` on a closed ZipFile will raise a :exc: `ValueError `.
578567
568+ .. note ::
569+ The scanning algorithm is heuristic-based and assumes that the ZIP file
570+ is normally structured—for example, with local file entries stored
571+ consecutively, without overlap or interleaved binary data. Prepended
572+ binary data, such as a self-extractor stub, is recognized and preserved
573+ unless it happens to contain bytes that coincidentally resemble a valid
574+ local file entry in multiple respects—an extremely rare case. Embedded
575+ ZIP payloads are also handled correctly, as long as they follow normal
576+ structure. However, the algorithm does not guarantee correctness or
577+ safety on untrusted or intentionally crafted input. It is generally
578+ recommended to provide the *removed * argument for better reliability and
579+ performance.
580+
579581 .. versionadded :: next
580582
581583
0 commit comments