[ADH-8124] Add S3 Gateway ozone objects cache#10
[ADH-8124] Add S3 Gateway ozone objects cache#10tigrulya-exe wants to merge 1 commit into2.2.0-developfrom
Conversation
f306dde to
f844ae5
Compare
iamlapa
left a comment
There was a problem hiding this comment.
Also, BucketEndpoint::handleGetRequest(S3RequestContext context, String bucketName)
ozoneKeyIterator = bucket.listKeys(prefix, prevKey, shallow);This path still ignores the S3 max-keys value when choosing how much to fetch from OM; the iterator falls back to ozone.client.list.cache, which is 1000 by default, so max-keys=1 can still over-fetch badly. We should add an S3-specific list path that passes a backend fetch size like min(maxKeys + 1, configuredCap).
| } | ||
| S3ObjectCacheKey that = (S3ObjectCacheKey) o; | ||
| // deliberately use only the immutable part of the S3Auth | ||
| return Objects.equals(s3Auth.getAccessID(), that.s3Auth.getAccessID()); |
There was a problem hiding this comment.
Before we let enable this, we need a real auth gate in front of metadata cache hits. Right now the cache key only uses the access ID, so a warmed entry can be reused by a later request with the same access ID but a bad stringToSign/signature, and OM never gets a chance to validate SigV4. Either validate SigV4 locally in S3G with a secure short-lived S3 secret cache, or do an auth-only OM check before serving cached metadata, but cached metadata cannot be returned just because the access ID matches.
There was a problem hiding this comment.
Tbh, I think the second suggested option will conflict with the main goal of this cache — reducing the number of calls to OM. Regarding the first one - cache itself is a kind of trade-off between latency and consistency, because it won't be strictly consistent in any case (changes can be made via the ozone CLI or other clients, even if we add a subscription mechanism, it will receive events only eventually, not immediately). In some cases this cache can return even non-existent buckets/keys, so in my opinion a small chance of returning an entry for an invalid stringToSign/signature during short key TTL isn't worth adding extra layers of complexity, like a local S3 secret cache or an additional OM call.
| } | ||
|
|
||
| @Override | ||
| public void deleteKey(String volumeName, String bucketName, String keyName, boolean recursive) throws IOException { |
There was a problem hiding this comment.
this only covers part of the write surface. Before the cache can be enabled, key metadata needs to be evicted for createKey*, rewriteKey*, completeMultipartUpload, putObjectTagging, deleteKey(s), and renameKey(s) for both source and destination. For stream writes, I’d also evict before opening the write and again after successful close/commit, so we never keep serving an old ETag or metadata while a replacement object is being written.
There was a problem hiding this comment.
Makes sense, but currently this methods aren't used in S3G and therefore I decided not to wrap them. However, future releases can use them/internal logic can change, so I added implementation for these methods
| } | ||
|
|
||
| @Override | ||
| public void deleteBucket(String volumeName, String bucketName) throws IOException { |
There was a problem hiding this comment.
bucket details are cached, but this only evicts on deleteBucket. We also need invalidation after createBucket, setBucketOwner, setBucketVersioning, setBucketStorageType, setBucketQuota, setReplicationConfig, setEncryptionKey, and setAcl; otherwise owner checks, acl decisions, default replication, encryption, and quota-related behavior can be stale until TTL.
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hadoop.ozone.client; |
There was a problem hiding this comment.
this puts S3 Gateway implementation classes into the org.apache.hadoop.ozone.client package so they can call package-private OzoneClientFactory hooks from the client module. That makes an internal client-module helper an implicit cross-module API. I’d rather expose a small intentional extension point from the client module and keep the S3-specific builder/cache under org.apache.hadoop.ozone.s3.*; otherwise future client refactors or upstream merges can break S3G in a pretty non-obvious way.
There was a problem hiding this comment.
Exposed the method in OzoneClientFactory and made it a bit more readable
f844ae5 to
e96d57d
Compare
Yeah, it was already discussed with Alexander and we decided to skip this requirement for now and check if further work is needed in a separate task. Unfortunately, we can't simply pass the value of |
Added caching OM client for S3 gateway component. Currently, it caches metadata for S3 volume, buckets, keys (both regular and extended versions) in local per-user Caffeine caches. Cache entries are evicted on corresponding write operations (delete, rename) or by TTL.
Related options:
ozone.s3g.object.cache.enabledfalseozone.s3g.object.cache.metrics.enabledtrueozone.s3g.object.cache.volume.size10ozone.s3g.object.cache.volume.ttl.ms3000ozone.s3g.object.cache.bucket.size100ozone.s3g.object.cache.bucket.ttl.ms3000ozone.s3g.object.cache.key.size1000ozone.s3g.object.cache.key.ttl.ms3000Also a
Hadoop Metrics2source is added to collect and publish the following statistics for all S3 object caches (s3VolumeCache,s3BucketCache,s3HeadKeyCache,s3KeyInfoCache):hitCounthitRatemissCountloadSuccessCountloadFailureCountloadFailureRateevictionCountaverageLoadPenaltyExample of metrics if Prometheus MetricsSink is used: