feat: blob v2 write support by geruh · Pull Request #560 · lance-format/lance-spark

geruh · 2026-05-26T04:52:40Z

Stack PR that depends on #548, which adds Blob v2 descriptor reads.

This PR, allows users to create a table with file_format_version >= 2.2 and <column>.lance.encoding = blob, write BINARY values through SQL or DataFrames, and have Lance store them using Blob v2. Reads still expose the descriptor struct from #548.

The hard part here is that the read schema and write schema are logically different and spark doesnt like this (as far as i can tell. Reads expose Blob v2 as a descriptor struct, but writes still need to accept BINARY.

I tried an analyzer rule first, but it runs after Spark resolves the table schema, so the write has already failed by the time the rule gets a chance to do anything.

So this uses ACCEPT_ANY_SCHEMA, but only for tables with Blob v2 columns. It is basically a scoped bypass for this one schema mismatch, not a free pass for all writes. Lance still validates the incoming schema in newWriteBuilder: Blob v2 columns must be binary, column order/names must match, Spark SQL VALUES columns are handled, and nested structs still follow the normal checks.

At encode time, Spark still gives us BINARY, and the connector maps that into the Lance Blob v2 write struct. This PR only supports inline binary writes. URI-based blob writes are not included here and can be added separately.

Testing

./mvnw test -pl lance-spark-base_2.12 -Dtest=BlobV2StructWriterTest,LanceWriteSchemaValidatorTest,SchemaConverterTest
Java blob v2 create/insert tests in BaseBlobCreateTableTest
make docker-build-test-base && make docker-build-test && make docker-test TEST_BACKENDS=local

github-actions · 2026-05-26T04:52:57Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

geruh added 2 commits May 25, 2026 18:37

feat: blob v2 descriptor read support

1f819d6

feat: blob v2 write support

7144fe6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: blob v2 write support #560

feat: blob v2 write support #560
geruh wants to merge 2 commits into
lance-format:mainfrom
geruh:v2write-descriptor

geruh commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

geruh commented May 26, 2026

Testing

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant