fix(s3): return ListObjectsV2 keys in lexicographic order for general purpose buckets#63
Conversation
…eral purpose buckets Per the S3 API spec, general purpose buckets return keys sorted in lexicographic (UTF-8 byte) order. Directory buckets (--x-s3 suffix) do not guarantee ordering and are left unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the S3 ListObjectsV2-style listing behavior to return object keys in lexicographic order for general purpose buckets (to align with the S3 API spec), while explicitly skipping ordering guarantees for “directory buckets” identified by a --x-s3 suffix.
Changes:
- Sort
listObjects(...)results for non-directory buckets. - Add
S3Service.isDirectoryBucket(String)helper for--x-s3suffix detection. - Add unit tests asserting ordering behavior (including after prefix filtering and
maxKeystruncation).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/main/java/io/github/hectorvent/floci/services/s3/S3Service.java |
Adds sorting for listObjects results and introduces isDirectoryBucket to skip ordering for directory buckets. |
src/test/java/io/github/hectorvent/floci/services/s3/S3ServiceTest.java |
Adds tests for sorted listing behavior, maxKeys behavior after sorting, and --x-s3 bucket detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ting Java String.compareTo uses UTF-16 code unit order, which differs from the S3 API spec's required UTF-8 unsigned byte lexicographic order for characters above U+D800 (e.g. supplementary plane characters). Switch to comparing keys by their UTF-8 byte representation using Arrays.compareUnsigned. Add a non-ASCII test case (U+E000 vs U+10000) that exposes the ordering difference between UTF-16 and UTF-8. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/io/github/hectorvent/floci/services/s3/S3Service.java
Outdated
Show resolved
Hide resolved
Use imported StandardCharsets and Arrays instead of fully-qualified java.nio.charset.StandardCharsets and java.util.Arrays for consistency with the rest of the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/io/github/hectorvent/floci/services/s3/S3Service.java
Outdated
Show resolved
Hide resolved
…ory bucket test - Precompute UTF-8 byte arrays per key before sorting to avoid redundant allocations during O(n log n) comparisons - Extract "--x-s3" magic string into DIRECTORY_BUCKET_SUFFIX constant - Add test verifying that directory buckets skip lexicographic sorting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/test/java/io/github/hectorvent/floci/services/s3/S3ServiceTest.java
Outdated
Show resolved
Hide resolved
Replace the weak contains-based assertion with a deterministic test using a LinkedHashMap-backed OrderedStorage that preserves insertion order. This ensures the test actually verifies that sorting is skipped for directory buckets, rather than passing regardless of sort behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hello @fukayatsu, Thanks for the PR! Could you please update the commit history to remove the Co-authored-by: Claude <...> line from the commit message? We prefer to keep the attribution focused on human contributors for this project. Also take a look to the conflicts Once that's cleaned up, I'm happy to take another look! |
Summary
ListObjectsV2results in lexicographic (UTF-8 unsigned byte) order for general purpose buckets, matching the S3 API specificationArrays.compareUnsignedon UTF-8 byte arrays instead of JavaString.compareTo(UTF-16 code unit order), which produces incorrect ordering for supplementary plane characters (U+10000+)--x-s3suffix), which do not guarantee key orderingisDirectoryBucket()helper to detect directory buckets by naming conventionTest plan
listObjectsReturnsKeysInLexicographicOrder— objects inserted out of order are returned sortedlistObjectsWithPrefixReturnsKeysInLexicographicOrder— sorting works correctly with prefix filteringlistObjectsMaxKeysRespectsLexicographicOrder— maxKeys truncation applies after sortinglistObjectsReturnsNonAsciiKeysInUtf8LexicographicOrder— verifies correct UTF-8 byte ordering using U+E000 vs U+10000 (where UTF-16 and UTF-8 order differ)isDirectoryBucket— unit test for--x-s3suffix detectionS3ServiceTestpass🤖 Generated with Claude Code