Skip to content

fix: decode binary fields as UTF-8 text when possible#16

Merged
gordonmurray merged 1 commit intolance-format:mainfrom
gordonmurray:fix/binary-utf8-decode
Feb 24, 2026
Merged

fix: decode binary fields as UTF-8 text when possible#16
gordonmurray merged 1 commit intolance-format:mainfrom
gordonmurray:fix/binary-utf8-decode

Conversation

@gordonmurray
Copy link
Collaborator

Summary

  • Binary/large_binary PyArrow columns containing valid UTF-8 are now displayed as readable text instead of base64
  • Falls back to base64 encoding for actual binary data (non-UTF-8)
  • Handles newer PyArrow versions (21.x) where .as_py() returns str directly instead of bytes

Test plan

  • Verified string fields now show readable text (e.g. "New task") instead of base64
  • Verified UUIDs, timestamps, and booleans render correctly
  • Confirmed fallback to base64 is preserved for non-UTF-8 binary data

Binary and large_binary PyArrow columns that contain valid UTF-8 are
now displayed as readable text instead of base64. Falls back to base64
for actual binary data. Also handles newer PyArrow versions where
.as_py() returns str directly.
@gordonmurray gordonmurray merged commit 0546cf9 into lance-format:main Feb 24, 2026
10 checks passed
@gordonmurray gordonmurray deleted the fix/binary-utf8-decode branch February 24, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant