Support round trip reading / writing Arrow Duration type to parquet#7482
Support round trip reading / writing Arrow Duration type to parquet#7482alamb merged 5 commits intoapache:mainfrom
Duration type to parquet#7482Conversation
|
CI Failure should be fixed by @crepererum 's PR: |
alamb
left a comment
There was a problem hiding this comment.
Thank you @Liyixin95 -- this looks really nice. I poked around and actually found that there were some pre-existing tests ❤️
I also tested this PR manually using the duration.parquet file from apache/parquet-testing#80 and the following code:
Details
fn main() {
let file = File::open("/Users/andrewlamb/Downloads/duration.parquet").unwrap();
let reader = ParquetRecordBatchReaderBuilder::try_new(file)
.unwrap()
.build()
.unwrap();
for batch in reader {
let batch = batch.unwrap();
println!("Read batch schema: {:?}", batch.schema());
println!("Read batch num rows: {:?}", batch.num_rows());
println!("Read batch pretty:\n{}", pretty_format_batches(&[batch]).unwrap());
}
}Before this PR:
Field { name: "a", data_type: Int64:
warning: `parquet_test` (bin "parquet_test") generated 1 warning
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/parquet_test`
Read batch schema: Schema { fields: [Field { name: "a", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }
Read batch num rows: 100
Read batch pretty:
+-------------+
| a |
+-------------+
| 86400000000 |
...
After this PR (the schema shows duration):
Field { name: "a", data_type: Duration(Microsecond),
Read batch schema: Schema { fields: [Field { name: "a", data_type: Duration(Microsecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }
Read batch num rows: 100
Read batch pretty:
+----------+
| a |
+----------+
| PT86400S |
|
|
||
| #[test] | ||
| #[should_panic(expected = "Converting Duration to parquet not supported")] | ||
| fn duration_second_single_column() { |
There was a problem hiding this comment.
I found there were already existing tests for the feature ❤️ (thanks @carols10cents )
Duration type to parquet
Duration type to parquetDuration type to parquet
Oh, Thanks. I didn't notice there were some pre-existing tests. I just create the pr and go to sleep. |
I didn't realize there were any tests either until I checked out the branch locally and they failed for me 😆 |
continue from previous pr. @alamb