Skip to content

bugfix #4 - validate_records drops col if first row cell null with list#5

Open
sbatchelder wants to merge 1 commit into
mainfrom
fix/null-first-row
Open

bugfix #4 - validate_records drops col if first row cell null with list#5
sbatchelder wants to merge 1 commit into
mainfrom
fix/null-first-row

Conversation

@sbatchelder

Copy link
Copy Markdown

PR adds two tests and fixes #4 .

Problem

When validate_records receives list[dict] input, it built the Arrow table with pa.Table.from_pylist(processed) — without passing the target schema. PyArrow then inferred the columns from the records themselves. If the first record omitted a nullable list column, the column was inferred as all-null and every later record's list value was silently dropped. The result was order-dependent: the same records in a different order produced different output.

Fix

Build the table against the declared schema instead of letting PyArrow infer it:

present_names = {key for record in records for key in record}
build_schema = pa.schema([f for f in schema if f.name in present_names])
table = pa.Table.from_pylist(processed, schema=build_schema)

Passing schema=build_schema is the key change — column types now come from the schema, not from whichever record happens to be inferred first.

build_schema is restricted to the schema fields actually present across the input rather than the full schema. This preserves the existing downstream behavior: missing required columns still raise ValueError, and missing nullable columns are still null-filled.

Tests

Adds two regression tests covering

  • row orderings (first record missing the list column
  • first record containing it) to guard against the order-dependent behavior.

@sbatchelder sbatchelder changed the title bugfix #4 - validte_records drops col if first row cell null with list bugfix #4 - validate_records drops col if first row cell null with list Jun 30, 2026
@sbatchelder sbatchelder requested a review from joefutrelle June 30, 2026 19:18
@sbatchelder sbatchelder self-assigned this Jun 30, 2026
@sbatchelder sbatchelder requested a review from johnwaalsh June 30, 2026 19:18

@johnwaalsh johnwaalsh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validate_records drops list values when the first row omits a nullable list column

2 participants