fix(s3): guard _put_many_files and _read_many_files against empty input#3194
Conversation
Greptile SummaryThis PR fixes a crash in
Confidence Score: 5/5Safe to merge — the fix is minimal, targeted, and backward-compatible with all existing callers. Both guards are correct: No files require special attention. Important Files Changed
Reviews (2): Last reviewed commit: "fix(s3): guard _put_many_files and _read..." | Re-trigger Greptile |
Both methods called s3op with an empty input file when given an empty
iterator. s3op exits 0 in this case but writes an informational message
to stderr ("Uploading 0 files.."). The `if stderr:` check treated this
as an error and the error handler then crashed with IndexError trying to
access url_info[0] or prefixes_and_ranges[0] on an empty list.
Trigger: the content-addressed store skips uploading blobs that already
exist in S3 (deduplication). When all blobs from a step are already
present, put_many is called with an empty iterator, hitting this bug on
every re-run of the same flow with the same inputs.
Fix: return early from both methods when the input list is empty.
There is nothing to do, so no subprocess needs to be spawned.
Add nine unit tests covering:
- Correct empty return value for both methods
- s3op not called on empty input
- Regression: no IndexError when s3op would have written info to stderr
- All s3op modes (get/list/info/put) safe with empty input
- Non-empty input still takes the normal code path
cc62a1c to
29370d7
Compare
Both methods called s3op with an empty input file when given an empty iterator. s3op exits 0 in this case but writes an informational message to stderr ("Uploading 0 files.."). The
if stderr:check treated this as an error and the error handler then crashed with IndexError trying to access url_info[0] or prefixes_and_ranges[0] on an empty list.Trigger: the content-addressed store skips uploading blobs that already exist in S3 (deduplication). When all blobs from a step are already present, put_many is called with an empty iterator, hitting this bug on every re-run of the same flow with the same inputs.
Fix: return early from both methods when the input list is empty. There is nothing to do, so no subprocess needs to be spawned.
Add nine unit tests covering: