Replies: 1 comment
-
|
maybe one idea could be to think of lance format as tier on S3, see https://github.com/lance-format/lance and https://lance.org/ possible advantages:
thinking these together one may also look at apache fluss, the streaming lake house, some blog post about their tiering https://fluss.apache.org/blog/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I closed issue #1419 and this discussion is logical continuation. Implementation is deferred until after VSR (Viewstamped Replication) lands, since the two are tightly coupled and the design only becomes concrete with VSR in place. Until then, this thread is where we refine the design.
Motivation
Local disk alone does not scale to terabyte workloads, and is overkill when most reads target the tail. Tiered storage keeps hot data on fast media and offloads cold data to cheap object storage (S3, GCS, Azure Blob, MinIO).
Proposed tiers: RAM (cache, active tail), NVMe (hot segments), object storage (bulk cold data). Each independently enabled.
Open Design Areas
0..9999on S3,10000..19999on disk,20000..20990in RAM). RAM + disk already works.reqwestwhich is tokio-bound. Needs a plan (separate executor, IPC sidecar, or upstream HTTP client abstraction).Open questions
Note: separate from this, an S3 sink connector PR is already open at #3103 - that targets the connector framework (sink data into S3 from a stream), not server-side tiered persistence. Different layer, but worth tracking as prior art for S3 plumbing.
Thoughts?
Sources
executors-tokiofeature,services-compfs/services-monoiofsservicesBeta Was this translation helpful? Give feedback.
All reactions