Skip to content

feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps#9280

Open
mzabaluev wants to merge 2 commits intoapache:mainfrom
mzabaluev:configurable-utc-tz-id
Open

feat(arrow-avro): Configurable Arrow timezone ID for Avro timestamps#9280
mzabaluev wants to merge 2 commits intoapache:mainfrom
mzabaluev:configurable-utc-tz-id

Conversation

@mzabaluev
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Enable an alternative representation of UTC timestamp data types with the "UTC" timezone ID, which is useful for interoperability with applications preferring that form.

What changes are included in this PR?

In the ReaderBuilder API, add a new method with_tz that allows users to specify the timezone ID for Avro logical types that represent UTC timestamps. The choices are between "+00:00" and "UTC" and can be selected by the new Tz enumeration.

Are these changes tested?

Added unit tests to verify the representation with different Tz parameter values.

Are there any user-facing changes?

A new with_tz method is added to arrow_avro::reader::Builder.

In the ReaderBuilder API, add a new method `with_tz` that allows users
to specify the timezone ID for Avro logical types that represent
UTC timestamps. The choices are between "+00:00" and "UTC"
and can be selected by the new Tz enumeration.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Jan 27, 2026
@EmilyMatt
Copy link
Contributor

This makes more sense in my opinion than using "+00:00" exclusively

Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzabaluev LMGTM. Really like this improvement to arrow-avro!

For me the big things are:

  1. An e2e test in reader/mod.rs using the new with_tz.
  2. Better documentation on the public with_tz method.

Comment on lines +1170 to +1175
/// Sets the timezone representation for Avro timestamp fields.
pub fn with_tz(mut self, tz: Tz) -> Self {
self.use_tz = tz;
self
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding an e2e test using this?

Also may help to explain in the doc comment what the default value is.

Comment on lines +990 to +998
Codec::TimestampMillis(Some(Tz::OffsetZero)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(Some(Tz::Utc)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(Tz::OffsetZero)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(Some(Tz::Utc)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(Tz::OffsetZero)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(Some(Tz::Utc)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe a bit cleaner.

Suggested change
Codec::TimestampMillis(Some(Tz::OffsetZero)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(Some(Tz::Utc)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(Tz::OffsetZero)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(Some(Tz::Utc)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(Tz::OffsetZero)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(Some(Tz::Utc)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,
Codec::TimestampMillis(Some(_)) => Self::TimestampMillisUtc,
Codec::TimestampMillis(None) => Self::TimestampMillisLocal,
Codec::TimestampMicros(Some(_)) => Self::TimestampMicrosUtc,
Codec::TimestampMicros(None) => Self::TimestampMicrosLocal,
Codec::TimestampNanos(Some(_)) => Self::TimestampNanosUtc,
Codec::TimestampNanos(None) => Self::TimestampNanosLocal,

Comment on lines +636 to +643
impl Display for Tz {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Tz::OffsetZero => write!(f, "+00:00"),
Tz::Utc => write!(f, "UTC"),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about doing something like this?

Suggested change
impl Display for Tz {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Tz::OffsetZero => write!(f, "+00:00"),
Tz::Utc => write!(f, "UTC"),
}
}
}
impl Tz {
pub fn as_str(&self) -> &'static str {
match self {
Self::OffsetZero => "+00:00",
Self::Utc => "UTC",
}
}
}
impl Display for Tz {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(self.as_str())
}
}

as_str() can be used to convert directly to Arc<str> without intermediate allocation. Then we can use it both in fmt and again below in Codec::data_type.

Meanwhile we can use f.write_str to avoid the overhead of the write! macro when parsing a format string for simple string literals.

Comment on lines +749 to +760
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then down here you could do:

Suggested change
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.to_string().into()),
),
Self::TimestampMillis(tz) => DataType::Timestamp(
TimeUnit::Millisecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),
Self::TimestampMicros(tz) => DataType::Timestamp(
TimeUnit::Microsecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),
Self::TimestampNanos(tz) => DataType::Timestamp(
TimeUnit::Nanosecond,
tz.as_ref().map(|tz| tz.as_str().into()),
),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configurable Arrow representation of UTC timestamps for Avro reader

3 participants