refactor(object_store): lazy fetch returns PayloadView#103
Merged
Conversation
facontidavide
requested changes
May 28, 2026
…uses std::any The lazy resolver registered with ObjectStore::pushLazy now returns a PayloadView (Span + BufferAnchor) instead of a std::vector<uint8_t>. This lets producers hand off bytes they already hold behind a shared_ptr (e.g. a streaming buffer being reused across stores) without copying — the returned anchor extends the buffer's lifetime through the read. For producers whose only payload is a plain std::vector<uint8_t>, the new sdk::makePayloadView() helper in pj_base wraps the vector into a shared_ptr (which becomes both owner and type-erased anchor) and returns a PayloadView pointing at its contents. This is also the contract resolveEntry() relies on: it recovers the shared_ptr<const vector> from the anchor via static_pointer_cast. ObjectEntry::payload now stores a std::any instead of a std::variant<shared_ptr, std::function>. The dispatch in ObjectStore::resolveEntry uses std::any_cast for each branch. Rationale: the variant alternatives are not part of any public discriminator (callers go through pushOwned/pushLazy), so the variant tag was carrying no semantic value — std::any keeps the storage slot flexible without constraining future producer shapes.
…ed variant
Builds on the prior commit's PayloadView-returning pushLazy signature, but
fixes two anchor-discarding bugs and restores a typed payload variant.
Bugs in the prior implementation:
1. resolveEntry did static_pointer_cast<const vector<uint8_t>>(pv.anchor),
forcing every producer's anchor to be exactly a shared_ptr<vector>. Any
other anchor type (chunk cache, mmap, parquet column slice) was UB. This
negated the entire point of BufferAnchor = shared_ptr<const void>.
2. resolveEntry discarded PayloadView::bytes (the Span) and returned the
whole anchor's vector. Producers can no longer publish a sub-range of
a larger backing buffer — which is the canonical zero-copy use case
(one decompressed MCAP chunk anchored once, many message Spans into it).
Both bugs traced to a single root cause: ResolvedObjectEntry::data was still
shared_ptr<const vector<uint8_t>>, so resolution had to materialize a vector
somehow. Fixed by making ResolvedObjectEntry carry {BufferAnchor anchor,
Span<const uint8_t> view} — type-erased anchor + producer-published span.
No static_pointer_cast anywhere.
Variant + named aliases:
ObjectEntry::payload returns to std::variant; the prior std::any traded
compile-time exhaustiveness for nothing (still two alternatives, still
typed access via the cast). Two named aliases capture the two payload
shapes at the type level:
using SharedBuffer = std::shared_ptr<const std::vector<uint8_t>>;
using LazyCallback = std::function<sdk::PayloadView()>;
Trampolines (plugin_data_host.cpp):
ObjectBytesBox now holds {BufferAnchor, Span}; toolboxObjectGetBytes
returns view.data()/view.size() — no shared_ptr<vector> in the read path.
Misc consistency:
- pushOwned/pushLazy now return Status (alias for Expected<void, string>);
the one other Expected<void, string> use at service_registry_builder.hpp
also switched.
- PayloadView(shared_ptr<vector>) constructor takes shared_ptr<const vector>
to match SharedBuffer and makePayloadView.
Tests:
- All resolved->data accessors migrated to resolved->view / resolved->anchor
across object_store_test, plugin_data_host_object_test,
plugin_parser_object_write_test.
- Two regression tests added:
PushLazyPreservesAnchorType — anchor is shared_ptr<TestBuffer> (not
vector). Bug-1 regression: ASAN would catch the prior static cast.
PushLazyHonorsSpanSubview — anchor is a 100-byte vector with Span
[20,30). Bug-2 regression: verifies view.data()==chunk+20.
./build.sh --debug && ./test.sh → 62/62 passing under ASAN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…x on PayloadView
ResolvedObjectEntry and ObjectBytesBox both held an inline {anchor, view}
pair — the same shape as sdk::PayloadView. Collapse both onto a single
PayloadView field; one named point of truth for "bytes + their lifetime
anchor" across producer, store, and consumer-handle layers.
Also tightens the doc comments touched in the prior commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Source-incompatible changes in pj_datastore/object_store.hpp: - ObjectEntry::payload variant tag types - pushLazy fetch signature (returns PayloadView) - ResolvedObjectEntry field layout (carries a PayloadView) Per the pre-1.0 convention, this warrants a MINOR bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
494dac7 to
24ecc1e
Compare
OBJECT_STORE_DESIGN.md still described the pre-refactor API: a vector-returning lazy callback and a ResolvedObjectEntry carrying a shared_ptr<const vector<uint8_t>> data field. Bring it in line with the code already on this branch: - ObjectEntry::payload is std::variant<SharedBuffer, LazyCallback>; document the two named aliases. - pushLazy's callable returns sdk::PayloadView (Span + type-erased BufferAnchor), enabling zero-copy sub-range views; the store never copies on resolve. - ResolvedObjectEntry carries an sdk::PayloadView; bytes/anchor replace the old data field, and an empty anchor means "no bytes". Docs-only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Change the lazy ObjectStore payload contract so producers publish bytes through a
sdk::PayloadView(Span<const uint8_t>+BufferAnchor), and the store preserves the anchor end-to-end. No bytes copied across the lazy path; no assumption on the anchor's concrete type.pushLazysignatureObjectStore::pushLazynow takesstd::function<sdk::PayloadView()>(aliasLazyCallback) instead ofstd::function<std::vector<uint8_t>()>. Producers that already hold bytes behind ashared_ptr— a chunk cache, anmmapregion, a parquet column slice, anyshared_ptr<T>— capture it in the closure and return a view backed by it. Zero copy.Type-erased anchor preserved
ResolvedObjectEntry,ObjectBytesBox, and the lazy variant alternative all carry asdk::PayloadView.resolveEntrypropagates the anchor verbatim — nostatic_pointer_cast, no anchor-type assumption, no Span-vs-anchor mismatch. Producers anchor on anyshared_ptr<T>; consumers read through the Span unchanged.ObjectEntry::payloadtyped variantTwo named aliases for the two payload shapes. Compile-time exhaustive dispatch via
std::get_if.Convenience helper
sdk::makePayloadView(std::vector<uint8_t>)inpj_base/buffer_anchor.hpp: wraps a fresh vector as both anchor and Span. For producers with no upstream anchor (e.g. raw bytes from a C-ABI fetch); when an upstream allocation already exists, constructPayloadViewdirectly to skip the helper's copy.Minor consistency cleanups
pushOwned/pushLazyreturnStatus(alias forExpected<void, std::string>); the one other site inservice_registry_builder.hppwas switched too.PayloadView(std::shared_ptr<std::vector<uint8_t>>)constructor takesshared_ptr<const vector>to matchSharedBufferandmakePayloadView. Implicitshared_ptr<T>→shared_ptr<const T>conversion makes this non-breaking.API impact
Source-incompatible for direct consumers of:
ObjectEntry::payload— switch fromstd::get_if<shared_ptr<...>>/std::get_if<function<vector<...>()>>tostd::get_if<SharedBuffer>/std::get_if<LazyCallback>.pushLazy'sfetchsignature — closures now returnsdk::PayloadView. Usesdk::makePayloadView(vec)for the trivial vector case.ResolvedObjectEntry— was{Timestamp, shared_ptr<const vector<uint8_t>> data}, now{Timestamp, sdk::PayloadView payload}. Bytes viaentry->payload.bytes.data() / .size(); anchor viaentry->payload.anchor.pushOwnedis unchanged.Tests
./build.sh --debug && ./test.sh→ 62/62 passing under ASAN.Two regression tests added in
pj_datastore/tests/object_store_test.cpp:PushLazyPreservesAnchorType— pushes ashared_ptr<TestBuffer>anchor (not a vector). Resolution must preserve type erasure; astatic_pointer_cast<const vector>would UB here.PushLazyHonorsSpanSubview— anchor holds 100 bytes; Span covers[20, 30). Resolution must propagate the Span verbatim, not the anchor's full extent.Version bump
Source-incompatible SDK change in a public header (
pj_datastore/object_store.hpp), so per the pre-1.0 versioning convention this warrants a MINOR bump0.4.2→0.5.0, bumped in this PR.Downstream
ResolvedObjectEntryshape and theLazyCallback-returningpushLazy.