Skip to content

refactor(object_store): lazy fetch returns PayloadView#103

Merged
facontidavide merged 5 commits into
mainfrom
refactor/object-store-lazy-payload-view
May 29, 2026
Merged

refactor(object_store): lazy fetch returns PayloadView#103
facontidavide merged 5 commits into
mainfrom
refactor/object-store-lazy-payload-view

Conversation

@pabloinigoblasco
Copy link
Copy Markdown
Collaborator

@pabloinigoblasco pabloinigoblasco commented May 28, 2026

Summary

Change the lazy ObjectStore payload contract so producers publish bytes through a sdk::PayloadView (Span<const uint8_t> + BufferAnchor), and the store preserves the anchor end-to-end. No bytes copied across the lazy path; no assumption on the anchor's concrete type.

pushLazy signature

ObjectStore::pushLazy now takes std::function<sdk::PayloadView()> (alias LazyCallback) instead of std::function<std::vector<uint8_t>()>. Producers that already hold bytes behind a shared_ptr — a chunk cache, an mmap region, a parquet column slice, any shared_ptr<T> — capture it in the closure and return a view backed by it. Zero copy.

Type-erased anchor preserved

ResolvedObjectEntry, ObjectBytesBox, and the lazy variant alternative all carry a sdk::PayloadView. resolveEntry propagates the anchor verbatim — no static_pointer_cast, no anchor-type assumption, no Span-vs-anchor mismatch. Producers anchor on any shared_ptr<T>; consumers read through the Span unchanged.

ObjectEntry::payload typed variant

using SharedBuffer = std::shared_ptr<const std::vector<uint8_t>>;
using LazyCallback = std::function<sdk::PayloadView()>;
std::variant<SharedBuffer, LazyCallback> payload;

Two named aliases for the two payload shapes. Compile-time exhaustive dispatch via std::get_if.

Convenience helper

sdk::makePayloadView(std::vector<uint8_t>) in pj_base/buffer_anchor.hpp: wraps a fresh vector as both anchor and Span. For producers with no upstream anchor (e.g. raw bytes from a C-ABI fetch); when an upstream allocation already exists, construct PayloadView directly to skip the helper's copy.

Minor consistency cleanups

  • pushOwned / pushLazy return Status (alias for Expected<void, std::string>); the one other site in service_registry_builder.hpp was switched too.
  • PayloadView(std::shared_ptr<std::vector<uint8_t>>) constructor takes shared_ptr<const vector> to match SharedBuffer and makePayloadView. Implicit shared_ptr<T>shared_ptr<const T> conversion makes this non-breaking.

API impact

Source-incompatible for direct consumers of:

  • ObjectEntry::payload — switch from std::get_if<shared_ptr<...>> / std::get_if<function<vector<...>()>> to std::get_if<SharedBuffer> / std::get_if<LazyCallback>.
  • pushLazy's fetch signature — closures now return sdk::PayloadView. Use sdk::makePayloadView(vec) for the trivial vector case.
  • ResolvedObjectEntry — was {Timestamp, shared_ptr<const vector<uint8_t>> data}, now {Timestamp, sdk::PayloadView payload}. Bytes via entry->payload.bytes.data() / .size(); anchor via entry->payload.anchor.

pushOwned is unchanged.

Tests

./build.sh --debug && ./test.sh → 62/62 passing under ASAN.

Two regression tests added in pj_datastore/tests/object_store_test.cpp:

  • PushLazyPreservesAnchorType — pushes a shared_ptr<TestBuffer> anchor (not a vector). Resolution must preserve type erasure; a static_pointer_cast<const vector> would UB here.
  • PushLazyHonorsSpanSubview — anchor holds 100 bytes; Span covers [20, 30). Resolution must propagate the Span verbatim, not the anchor's full extent.

Version bump

Source-incompatible SDK change in a public header (pj_datastore/object_store.hpp), so per the pre-1.0 versioning convention this warrants a MINOR bump 0.4.20.5.0, bumped in this PR.

Downstream

  • pj4 consumers updated in parallel for the new ResolvedObjectEntry shape and the LazyCallback-returning pushLazy.

Comment thread pj_base/include/pj_base/buffer_anchor.hpp
Comment thread pj_datastore/include/pj_datastore/object_store.hpp Outdated
Comment thread pj_datastore/include/pj_datastore/object_store.hpp Outdated
Comment thread pj_datastore/src/object_store.cpp Outdated
@facontidavide facontidavide changed the title refactor(object_store): lazy fetch returns PayloadView; payload slot uses std::any refactor(object_store): lazy fetch returns PayloadView May 29, 2026
pabloinigoblasco and others added 4 commits May 29, 2026 09:27
…uses std::any

The lazy resolver registered with ObjectStore::pushLazy now returns a
PayloadView (Span + BufferAnchor) instead of a std::vector<uint8_t>. This
lets producers hand off bytes they already hold behind a shared_ptr (e.g.
a streaming buffer being reused across stores) without copying — the
returned anchor extends the buffer's lifetime through the read.

For producers whose only payload is a plain std::vector<uint8_t>, the new
sdk::makePayloadView() helper in pj_base wraps the vector into a
shared_ptr (which becomes both owner and type-erased anchor) and returns
a PayloadView pointing at its contents. This is also the contract
resolveEntry() relies on: it recovers the shared_ptr<const vector> from
the anchor via static_pointer_cast.

ObjectEntry::payload now stores a std::any instead of a
std::variant<shared_ptr, std::function>. The dispatch in
ObjectStore::resolveEntry uses std::any_cast for each branch. Rationale:
the variant alternatives are not part of any public discriminator
(callers go through pushOwned/pushLazy), so the variant tag was carrying
no semantic value — std::any keeps the storage slot flexible without
constraining future producer shapes.
…ed variant

Builds on the prior commit's PayloadView-returning pushLazy signature, but
fixes two anchor-discarding bugs and restores a typed payload variant.

Bugs in the prior implementation:

1. resolveEntry did static_pointer_cast<const vector<uint8_t>>(pv.anchor),
   forcing every producer's anchor to be exactly a shared_ptr<vector>. Any
   other anchor type (chunk cache, mmap, parquet column slice) was UB. This
   negated the entire point of BufferAnchor = shared_ptr<const void>.

2. resolveEntry discarded PayloadView::bytes (the Span) and returned the
   whole anchor's vector. Producers can no longer publish a sub-range of
   a larger backing buffer — which is the canonical zero-copy use case
   (one decompressed MCAP chunk anchored once, many message Spans into it).

Both bugs traced to a single root cause: ResolvedObjectEntry::data was still
shared_ptr<const vector<uint8_t>>, so resolution had to materialize a vector
somehow. Fixed by making ResolvedObjectEntry carry {BufferAnchor anchor,
Span<const uint8_t> view} — type-erased anchor + producer-published span.
No static_pointer_cast anywhere.

Variant + named aliases:

ObjectEntry::payload returns to std::variant; the prior std::any traded
compile-time exhaustiveness for nothing (still two alternatives, still
typed access via the cast). Two named aliases capture the two payload
shapes at the type level:

  using SharedBuffer = std::shared_ptr<const std::vector<uint8_t>>;
  using LazyCallback = std::function<sdk::PayloadView()>;

Trampolines (plugin_data_host.cpp):

ObjectBytesBox now holds {BufferAnchor, Span}; toolboxObjectGetBytes
returns view.data()/view.size() — no shared_ptr<vector> in the read path.

Misc consistency:

- pushOwned/pushLazy now return Status (alias for Expected<void, string>);
  the one other Expected<void, string> use at service_registry_builder.hpp
  also switched.
- PayloadView(shared_ptr<vector>) constructor takes shared_ptr<const vector>
  to match SharedBuffer and makePayloadView.

Tests:

- All resolved->data accessors migrated to resolved->view / resolved->anchor
  across object_store_test, plugin_data_host_object_test,
  plugin_parser_object_write_test.
- Two regression tests added:
    PushLazyPreservesAnchorType — anchor is shared_ptr<TestBuffer> (not
      vector). Bug-1 regression: ASAN would catch the prior static cast.
    PushLazyHonorsSpanSubview — anchor is a 100-byte vector with Span
      [20,30). Bug-2 regression: verifies view.data()==chunk+20.

./build.sh --debug && ./test.sh → 62/62 passing under ASAN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…x on PayloadView

ResolvedObjectEntry and ObjectBytesBox both held an inline {anchor, view}
pair — the same shape as sdk::PayloadView. Collapse both onto a single
PayloadView field; one named point of truth for "bytes + their lifetime
anchor" across producer, store, and consumer-handle layers.

Also tightens the doc comments touched in the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Source-incompatible changes in pj_datastore/object_store.hpp:
- ObjectEntry::payload variant tag types
- pushLazy fetch signature (returns PayloadView)
- ResolvedObjectEntry field layout (carries a PayloadView)

Per the pre-1.0 convention, this warrants a MINOR bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@facontidavide facontidavide force-pushed the refactor/object-store-lazy-payload-view branch from 494dac7 to 24ecc1e Compare May 29, 2026 07:28
OBJECT_STORE_DESIGN.md still described the pre-refactor API: a
vector-returning lazy callback and a ResolvedObjectEntry carrying a
shared_ptr<const vector<uint8_t>> data field. Bring it in line with the
code already on this branch:

- ObjectEntry::payload is std::variant<SharedBuffer, LazyCallback>;
  document the two named aliases.
- pushLazy's callable returns sdk::PayloadView (Span + type-erased
  BufferAnchor), enabling zero-copy sub-range views; the store never
  copies on resolve.
- ResolvedObjectEntry carries an sdk::PayloadView; bytes/anchor replace
  the old data field, and an empty anchor means "no bytes".

Docs-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@facontidavide facontidavide merged commit 2768118 into main May 29, 2026
4 checks passed
@facontidavide facontidavide deleted the refactor/object-store-lazy-payload-view branch May 29, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants