AWS: Add chunked encoding configuration for S3 requests#15242
AWS: Add chunked encoding configuration for S3 requests#15242singhpk234 merged 13 commits intoapache:mainfrom
Conversation
|
c8e62ba to
3e6556b
Compare
|
Actually, I've just noticed that Until that's in, RestCatalog access via S3FileIO isn't going to work. It'd be OK for direct S3 IO though |
85bdad8 to
1e545f4
Compare
Thanks for you advises, I've added testMultipartUploadWithChunkedEncoding, lease help me take a look, thanks! |
steveloughran
left a comment
There was a problem hiding this comment.
commented
test looks good -I always test against real S3 stores so it's nice to see a unit test pushing up a few MB without problems.
| */ | ||
| public static final String CHUNKED_ENCODING_ENABLED = "s3.chunked-encoding-enabled"; | ||
|
|
||
| public static final boolean CHUNKED_ENCODING_ENABLED_DEFAULT = true; |
There was a problem hiding this comment.
this must be false by default because org.apache.iceberg.aws.s3.signer.S3V4RestSignerClient will not sign chunked encoding operations.
There was a problem hiding this comment.
this must be false by default because
org.apache.iceberg.aws.s3.signer.S3V4RestSignerClientwill not sign chunked encoding operations.
Thanks! done.
steveloughran
left a comment
There was a problem hiding this comment.
thanks. My concerns are all addressed.
Afraid you now need to await a review by an iceberg committer
1a71787 to
a553eb7
Compare
a553eb7 to
13aa984
Compare
|
@amogh-jahagirdar could you help to review this pr? |
|
@singhpk234 @huaxingao Can you help us take a look? Just added a configuration. |
open-api/rest-catalog-open-api.py
Outdated
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
| - `s3.cross-region-access-enabled`: if `true`, S3 Cross-Region bucket access is enabled | ||
| - `s3.chunked-encoding-enabled`: if `true`, chunked encoding is enabled for S3 requests |
There was a problem hiding this comment.
do we want rest catalogs to returns this ? for now can this a client side catalog property ?
There was a problem hiding this comment.
do we want rest catalogs to returns this ? for now can this a client side catalog property ?
Yes, when the server returns storage credentials, the related S3 properties are expected to be applied in S3FileIO directly, the client doesn't need to be aware of or manage S3-specific configurations on its own.
There was a problem hiding this comment.
purely client side; all that comes back from the catalog is signed access to a single file and a header to say cache/no-no cache. Now, that's not enough to deal with SSE-C encryption where you'd need to say (encryption: SSE-C, secret="...."), but that's OK because SSE-C is an utter management nightmare and everyone uses SSE-KMS. Which doesn't need any information from the catalog.
There was a problem hiding this comment.
purely client side; all that comes back from the catalog is signed access to a single file and a header to say cache/no-no cache. Now, that's not enough to deal with SSE-C encryption where you'd need to say (encryption: SSE-C, secret="...."), but that's OK because SSE-C is an utter management nightmare and everyone uses SSE-KMS. Which doesn't need any information from the catalog.
makes sense, i will remove it, thanks for the advise!
There was a problem hiding this comment.
do we want rest catalogs to returns this ? for now can this a client side catalog property ?
Agreed, keep this as a client-side catalog property. I've removed the change.
| assertThat(testIo.newInputFile(testObjectUri).getLength()) | ||
| .isEqualTo(parts * (long) S3FileIOProperties.MULTIPART_SIZE_MIN); |
There was a problem hiding this comment.
i wonder if we can also assert that contents of files are correct ?
steveloughran
left a comment
There was a problem hiding this comment.
commented. I'm making that new test stricter on the basis you've done most of the work and it's good to round off.
FWIW we once hit a prefetch bug in abfs client where under specific race conditions you could get a block from another stream in the same process. This test wouldn't replicate that condition either (2+ GB files, multiple readers, etc), but it'd be a bit stricter.
| verifyInts(testIo, intObjectUri, () -> expectedInt); | ||
|
|
||
| // Verify write and read with bytes | ||
| byte expectedByte = 42; |
There was a problem hiding this comment.
nice test, but it this doesn't detect parts committed out of order. What incrementing the expectedByte with each part and validating on the same ranges?
There was a problem hiding this comment.
nice test, but it this doesn't detect parts committed out of order. What incrementing the expectedByte with each part and validating on the same ranges?
Good point! I've updated the test.
f0b3852 to
4f81dd0
Compare
steveloughran
left a comment
There was a problem hiding this comment.
thanks for the changes. Lost a weekend to a good horror story with a prefetch race condition that shipped. validation of test payloads weren't enough, that payload had to be a 2+GB + CSV file, but still, every safety check helps
singhpk234
left a comment
There was a problem hiding this comment.
LGTM, thanks @plusplusjiajia
can you please double confirm the integ tests pass ? asking because they are not run as part of CI
4f81dd0 to
d8bcb4f
Compare
d8bcb4f to
496e8f5
Compare
Thanks for the reminder, I've verified them manually. |
|
Hi @steveloughran , can you take a look again, do you have any other concerns? |
|
I have no concerns but I'm not a committer. I'll give a finall review and mark if I'm happy with it, but you'll need approval from an iceberg committer. |
| public static final String CHUNKED_ENCODING_ENABLED = "s3.chunked-encoding-enabled"; | ||
|
|
||
| public static final boolean CHUNKED_ENCODING_ENABLED_DEFAULT = false; |
There was a problem hiding this comment.
sorry for the last moment, seems like SDK default is true ? if yes this might be different than what we have been doing so far ? i wonder if we should just pass it to builder if the above propery is not empty and not have defaults ?
There was a problem hiding this comment.
@singhpk234 the CatalogSigner doesn't sign chunked encoding
L255
if (signerParams.enableChunkedEncoding()) {
throw new UnsupportedOperationException("Chunked encoding not supported");
}
Setting it to false ensures that if a client is using it to sign for uploads (polaris are working on/have done that) then the requests are being signed.
It's likely a failure condition which hasn't yet been encountered
There was a problem hiding this comment.
I agree for those scenarios we can prompt user to disable it themselves. The way I am approaching is lets say iceberg v 1.10 shipped with chunked encoding true, 1.11 will be shipping ti for false ? this can be regression for no signing use cases ? Hence the recommendation to keep the sdk default but at the same time give ways to override these, wdyt
There was a problem hiding this comment.
I agree for those scenarios we can prompt user to disable it themselves. The way I am approaching is lets say iceberg v 1.10 shipped with chunked encoding true, 1.11 will be shipping ti for false ? this can be regression for no signing use cases ? Hence the recommendation to keep the sdk default but at the same time give ways to override these, wdyt
Thanks for the feedback @singhpk234, that's a fair point. Changing the default to false would indeed be a regression for users who don't use the REST catalog signer.
@steveloughran I agree the CatalogSigner incompatibility is a real concern, but I think we can address it without changing the SDK default for everyone. Users who rely on CatalogSigner can explicitly set
s3.chunked-encoding-enabled=false, and we can document this clearly. This way we avoid a silent performance regression for the majority of users while still providing a clear path for signing use cases.
I'll update the PR to keep the default as true.
| public void testMultipartUploadWithChunkedEncoding(boolean chunkedEncodingEnabled) | ||
| throws IOException { | ||
| // Create a new S3FileIO with specified chunked encoding setting | ||
| S3FileIO testIo = new S3FileIO(() -> s3); |
There was a problem hiding this comment.
can we wrap with try-with resource , so the testIO is closed ?
There was a problem hiding this comment.
can we wrap with try-with resource , so the testIO is closed ?
Thanks for the catch! Updated to use try-with-resources to ensure testIo is properly closed.
|
Thanks for the change @plusplusjiajia ! Thanks for review @JingsongLi @steveloughran ! |
|
Thank you all for taking the time to review this PR! |
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Configuration.html#chunkedEncodingEnabled()