Skip to content

Bad Vector Fill Behavior: Docs Say Only Bad Elements Are Replaced, But Code Replaces the Entire Vector #134

@oqoqo-bot

Description

@oqoqo-bot

Documentation Gap

Documentation claims [1.0, NaN, 3.0] becomes [1.0, 0.0, 3.0] (element-wise) but code replaces the ENTIRE vector with [0.0, 0.0, 0.0] (whole-vector replacement).

Description

The docs incorrectly describe on_bad_vectors='fill' as element-wise NaN replacement — the actual code replaces the ENTIRE vector.

  • Docs claim [1.0, NaN, 3.0] becomes [1.0, 0.0, 3.0] with fill_value=0.0, but the code at table.py:3177-3181 replaces the entire vector with [0.0, 0.0, 0.0] — the is_bad flag is per-vector, not per-element
  • Users lose ALL valid elements in partially-bad vectors without knowing it
  • Zero fill vectors cause downstream issues: undefined cosine similarity (division by zero) and L2 results clustering near the origin

How to Validate

Affected Files

  • python/python/lancedb/table.py
  • python/python/lancedb/db.py
  • docs/tables/consistency.mdx

Created by Oqoqo

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions