Skip to content

feat: blob v2 write support #560

Open
geruh wants to merge 2 commits into
lance-format:mainfrom
geruh:v2write-descriptor
Open

feat: blob v2 write support #560
geruh wants to merge 2 commits into
lance-format:mainfrom
geruh:v2write-descriptor

Conversation

@geruh
Copy link
Copy Markdown
Collaborator

@geruh geruh commented May 26, 2026

Stack PR that depends on #548, which adds Blob v2 descriptor reads.

This PR, allows users to create a table with file_format_version >= 2.2 and <column>.lance.encoding = blob, write BINARY values through SQL or DataFrames, and have Lance store them using Blob v2. Reads still expose the descriptor struct from #548.

The hard part here is that the read schema and write schema are logically different and spark doesnt like this (as far as i can tell. Reads expose Blob v2 as a descriptor struct, but writes still need to accept BINARY.

I tried an analyzer rule first, but it runs after Spark resolves the table schema, so the write has already failed by the time the rule gets a chance to do anything.

So this uses ACCEPT_ANY_SCHEMA, but only for tables with Blob v2 columns. It is basically a scoped bypass for this one schema mismatch, not a free pass for all writes. Lance still validates the incoming schema in newWriteBuilder: Blob v2 columns must be binary, column order/names must match, Spark SQL VALUES columns are handled, and nested structs still follow the normal checks.

At encode time, Spark still gives us BINARY, and the connector maps that into the Lance Blob v2 write struct. This PR only supports inline binary writes. URI-based blob writes are not included here and can be added separately.

Testing

  • ./mvnw test -pl lance-spark-base_2.12 -Dtest=BlobV2StructWriterTest,LanceWriteSchemaValidatorTest,SchemaConverterTest
  • Java blob v2 create/insert tests in BaseBlobCreateTableTest
  • make docker-build-test-base && make docker-build-test && make docker-test TEST_BACKENDS=local

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant