-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Error handling in various places #9365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| impl<T: DataType> Decoder<T> for DictDecoder<T> { | ||
| fn set_data(&mut self, data: Bytes, num_values: usize) -> Result<()> { | ||
| if data.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a repetition of the check directly below?
| let dict_idx = self.current_value.unwrap() as usize; | ||
| let dict_value = dict[dict_idx].clone(); | ||
|
|
||
| let dict_value = dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For large rle runs or small buffers it might be more efficient to check this inside reload, but the difference in practice is probably minimal. Checking it here is also consistent with the bitpacked runs.
| .iter_mut() | ||
| .zip(index_buf[..num_values].iter()) | ||
| .for_each(|(b, i)| b.clone_from(&dict[*i as usize])); | ||
| for i in 0..num_values { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the zipped iterator as before avoids the bounds checks on index_buf and buffer, that might make a measurable difference.
|
|
||
| /// Returns the offset and length in bytes of the column chunk within the file | ||
| pub fn byte_range(&self) -> (u64, u64) { | ||
| pub fn byte_range(&self) -> Result<(u64, u64)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most other accessors for this struct return signed values and validation seems to be left to the caller. I think it would be more consistent if byte_range also returned a (i64, i64). Even an Ok result here still needs to be validated to be actually in bounds for the given file.
| )); | ||
| } | ||
| } else if !is_root_node { | ||
| return Err(general_err!("Repetition level must be defined for non-root types")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this error relates to the precision of decimal types
Which issue does this PR close?
Rationale for this change
We've been doing some internal work with Parquet and found a bunch of places where error messages could be improved or panics could be avoided.
I can split this up into multiple PRs if need be. I know it's an assortment of things.
What changes are included in this PR?
Are these changes tested?
Tests should continue to pass.
Are there any user-facing changes?