Add implementation status for cuDF#99
Conversation
| | External column data (1) | | | | | (W) | | ||
| | Row group "Sorting column" metadata (2) | | | | | (W) | | ||
| | Row group pruning using statistics | | | | | ✅ | | ||
| | Row group pruning using bloom filter | | | | | ✅ | |
There was a problem hiding this comment.
Please correct me if I am wrong but I believe the bloom filters are used to prune row groups instead of pages.
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
| The value in each box means: | ||
| * ✅: supported | ||
| * ❌: not supported | ||
| * (R/W): partial reader/writer only support |
There was a problem hiding this comment.
Added an extra piece in legend to allow partial reader- or writer-only support. Happy to remove it and leave the corresponding boxes blank if needed
| * `Java`: [parquet-java](https://github.com/apache/parquet-java) | ||
| * `Go`: [parquet-go](https://github.com/apache/arrow-go/tree/main/parquet) | ||
| * `Rust`: [parquet-rs](https://github.com/apache/arrow-rs/blob/main/parquet/README.md) | ||
| * `CUDA C++`: [cudf](https://github.com/rapidsai/cudf) |
There was a problem hiding this comment.
Should this be cuDF? Or CUDA C++ is a more official name of it?
There was a problem hiding this comment.
cuDF is the name of the implementing dataframes library and CUDA C++ is the language being used for implementation. Isn't the convention here like:
* `language`: [impl name](link)
There was a problem hiding this comment.
I would prefer cuDF here. I think the original intention was to include implementations governed by the Parquet community or the Apache Software Foundation. It would be better to use the library name to encourage other Parquet implementations to appear here. WDYT? @alamb
There was a problem hiding this comment.
I also recall that the idea here was to list library names (so this would be better as cuDF) not languages.
It just so happens that we only had one example library for each language so there was (before this PR) a 1-1 correspondence.
Does that make sense @mhaseeb123 ?
There was a problem hiding this comment.
Sounds good. I will update this
etseidl
left a comment
There was a problem hiding this comment.
Thanks for getting the party started @mhaseeb123!
AMAZING! Thank you @mhaseeb123 I wonder if you had any program / script / definition of what "support" means (mostly so I can crib / copy that and file a ticket in the arrow-rs repository to get this column filled out) |
Certainly, the Does that make sense? |
|
|
||
| ### Physical types | ||
|
|
||
| | Data type | C++ | Java | Go | Rust | |
There was a problem hiding this comment.
Simply removed one space in the Java column so all cols have a consistent width for aesthetic purposes.
Yes for sure -- I guess i was hoping for some sort of script / example data that I could used when filling this out for arrow-rs. Not required, I was just asking |
We have relevant gtests and pytests in cudf for most if not all the features but collecting them along with input/output files wouldn't be feasible. Sorry! |
|
🚀 |
This PR adds the implementation status for cuDF to Parquet site.