Skip to content

fix(s3): return ListObjectsV2 keys in lexicographic order for general purpose buckets#63

Open
fukayatsu wants to merge 5 commits intohectorvent:mainfrom
fukayatsu:fix/s3-list-objects-lexicographic-order
Open

fix(s3): return ListObjectsV2 keys in lexicographic order for general purpose buckets#63
fukayatsu wants to merge 5 commits intohectorvent:mainfrom
fukayatsu:fix/s3-list-objects-lexicographic-order

Conversation

@fukayatsu
Copy link
Copy Markdown

@fukayatsu fukayatsu commented Mar 25, 2026

Summary

  • Sort ListObjectsV2 results in lexicographic (UTF-8 unsigned byte) order for general purpose buckets, matching the S3 API specification
  • Use Arrays.compareUnsigned on UTF-8 byte arrays instead of Java String.compareTo (UTF-16 code unit order), which produces incorrect ordering for supplementary plane characters (U+10000+)
  • Skip sorting for directory buckets (--x-s3 suffix), which do not guarantee key ordering
  • Add isDirectoryBucket() helper to detect directory buckets by naming convention

Test plan

  • listObjectsReturnsKeysInLexicographicOrder — objects inserted out of order are returned sorted
  • listObjectsWithPrefixReturnsKeysInLexicographicOrder — sorting works correctly with prefix filtering
  • listObjectsMaxKeysRespectsLexicographicOrder — maxKeys truncation applies after sorting
  • listObjectsReturnsNonAsciiKeysInUtf8LexicographicOrder — verifies correct UTF-8 byte ordering using U+E000 vs U+10000 (where UTF-16 and UTF-8 order differ)
  • isDirectoryBucket — unit test for --x-s3 suffix detection
  • All 30 tests in S3ServiceTest pass

🤖 Generated with Claude Code

…eral purpose buckets

Per the S3 API spec, general purpose buckets return keys sorted in
lexicographic (UTF-8 byte) order. Directory buckets (--x-s3 suffix)
do not guarantee ordering and are left unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 25, 2026 02:28
@fukayatsu fukayatsu changed the title fix: return ListObjectsV2 keys in lexicographic order for general purpose buckets fix(s3): return ListObjectsV2 keys in lexicographic order for general purpose buckets Mar 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the S3 ListObjectsV2-style listing behavior to return object keys in lexicographic order for general purpose buckets (to align with the S3 API spec), while explicitly skipping ordering guarantees for “directory buckets” identified by a --x-s3 suffix.

Changes:

  • Sort listObjects(...) results for non-directory buckets.
  • Add S3Service.isDirectoryBucket(String) helper for --x-s3 suffix detection.
  • Add unit tests asserting ordering behavior (including after prefix filtering and maxKeys truncation).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/main/java/io/github/hectorvent/floci/services/s3/S3Service.java Adds sorting for listObjects results and introduces isDirectoryBucket to skip ordering for directory buckets.
src/test/java/io/github/hectorvent/floci/services/s3/S3ServiceTest.java Adds tests for sorted listing behavior, maxKeys behavior after sorting, and --x-s3 bucket detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ting

Java String.compareTo uses UTF-16 code unit order, which differs from
the S3 API spec's required UTF-8 unsigned byte lexicographic order for
characters above U+D800 (e.g. supplementary plane characters). Switch
to comparing keys by their UTF-8 byte representation using
Arrays.compareUnsigned.

Add a non-ASCII test case (U+E000 vs U+10000) that exposes the
ordering difference between UTF-16 and UTF-8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Use imported StandardCharsets and Arrays instead of fully-qualified
java.nio.charset.StandardCharsets and java.util.Arrays for consistency
with the rest of the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ory bucket test

- Precompute UTF-8 byte arrays per key before sorting to avoid
  redundant allocations during O(n log n) comparisons
- Extract "--x-s3" magic string into DIRECTORY_BUCKET_SUFFIX constant
- Add test verifying that directory buckets skip lexicographic sorting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace the weak contains-based assertion with a deterministic test
using a LinkedHashMap-backed OrderedStorage that preserves insertion
order. This ensures the test actually verifies that sorting is skipped
for directory buckets, rather than passing regardless of sort behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hectorvent hectorvent added the s3 label Mar 25, 2026
@hectorvent
Copy link
Copy Markdown
Owner

Hello @fukayatsu,

Thanks for the PR!

Could you please update the commit history to remove the Co-authored-by: Claude <...> line from the commit message? We prefer to keep the attribution focused on human contributors for this project.

Also take a look to the conflicts

Once that's cleaned up, I'm happy to take another look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants