Conversation
# What changes are included in this PR? - Introduced support for Avro custom logical types under the `avro_custom_types` feature. Added mappings for: - Int8, Int16, UInt8, UInt16, UInt32, UInt64. - Float16. - Interval (YearMonth, DayTime). - Custom logical types for Time32, Time64, Timestamps, and Date64. - Updated schema handling to generate appropriate Avro JSON based on feature flag. - Added specialized encoders/decoders to handle custom types, ensuring compatibility with Avro's logical types. - Adjusted `Codec` enum and related encoding paths for precise storage (e.g., UInt64 stored as fixed(8), Float16 as fixed(2)). # Are these changes tested? Yes, new unit tests verify: - Schema and type mappings. - Avro serialization and deserialization for custom logical types. - Default value handling and boundary cases for custom types. # Are there any user-facing changes? Yes: - New feature flag (`avro_custom_types`) enabling advanced logical types. - Enhanced custom type support for integration with extended Avro schemas.
49cc7b8 to
395d3f4
Compare
…custom_types` feature flag. Updates schema handling, encoders, and readers to leverage Arrow-native fixed(16) representation for custom logical type, preserving full range and signed values. Adds unit tests for round-trip serialization/deserialization.
395d3f4 to
c6b988d
Compare
| let months = u32::from_le_bytes([b[0], b[1], b[2], b[3]]); | ||
| let days = u32::from_le_bytes([b[4], b[5], b[6], b[7]]); | ||
| let millis = u32::from_le_bytes([b[8], b[9], b[10], b[11]]); |
There was a problem hiding this comment.
Made this update to align with the newer code.
| DataType::Null => Value::String("null".into()), | ||
| DataType::Boolean => Value::String("boolean".into()), | ||
| DataType::Int8 | DataType::Int16 | DataType::UInt8 | DataType::UInt16 | DataType::Int32 => { | ||
| #[cfg(not(feature = "avro_custom_types"))] |
There was a problem hiding this comment.
This was added because these are not native Avro types and now when #[cfg(feature = "avro_custom_types")] we are annotating a custom logicalType to the metadata. This enables easier round-tripping and optimal compatibility with Arrow DataType's.
| assert_eq!(expected_str, actual_str); | ||
| Ok(()) | ||
| } | ||
|
|
There was a problem hiding this comment.
Existing e2e tests are preserved to ensure backwards compatibility is maintained.
|
@alamb @nathaniel-d-ef @mzabaluev @EmilyMatt @getChan I came across some challenges with non-implemented Arrow Most of this PR involves ensuring all Arrow DataType's (except for sparse Unions) are implemented and--when the ~ Half of this PR is tests, but I know it's large. Any help with reviews would be huge! |
Which issue does this PR close?
avro_custom_typesround-trip + non-custom fallbacks #9290Rationale for this change
NOTE TO REVIEWERS: Over 1500 lines of this diff are tests.
arrow-avrocurrently cannot encode/decode a number of ArrowDataTypes, and some types have schema/encoding mismatches that can lead to incorrect data (even when encoding succeeds).The goal is:
ArrowError::NotYetImplemented(or similar) when writing/reading an ArrowRecordBatchcontaining supported Arrow types, excluding Sparse Unions (will be handled separately).feature = "avro_custom_types": Arrow to Avro to Arrow should round-trip the ArrowDataType(including width/signedness/time units and relevant metadata using Arrow-specific custom logical types following the establishedarrow.*pattern.avro_custom_types: Arrow types should be encoded to the closest standard Avro primitive / logical type, with any necessary lossy conversions documented and consistently applied.What changes are included in this PR?
Implementation of all existing missing
arrow-avrotypes except for Sparse UnionsAre these changes tested?
Yes
Are there any user-facing changes?
Yes, additional type support is being added which is user-facing.