Error handling in various places #9365

rambleraptor · 2026-02-05T19:45:01Z

Which issue does this PR close?

Part of [DISCUSS] Remove panics #7806

Rationale for this change

We've been doing some internal work with Parquet and found a bunch of places where error messages could be improved or panics could be avoided.

I can split this up into multiple PRs if need be. I know it's an assortment of things.

What changes are included in this PR?

Are these changes tested?

Tests should continue to pass.

Are there any user-facing changes?

…ror handling

jhorstmann · 2026-02-05T22:52:41Z

parquet/src/encodings/decoding.rs


 impl<T: DataType> Decoder<T> for DictDecoder<T> {
    fn set_data(&mut self, data: Bytes, num_values: usize) -> Result<()> {
+        if data.is_empty() {


Isn't this a repetition of the check directly below?

jhorstmann · 2026-02-05T22:59:33Z

parquet/src/encodings/rle.rs

                let dict_idx = self.current_value.unwrap() as usize;
-                let dict_value = dict[dict_idx].clone();
+
+                let dict_value = dict


For large rle runs or small buffers it might be more efficient to check this inside reload, but the difference in practice is probably minimal. Checking it here is also consistent with the bitpacked runs.

jhorstmann · 2026-02-05T23:01:07Z

parquet/src/encodings/rle.rs

-                        .iter_mut()
-                        .zip(index_buf[..num_values].iter())
-                        .for_each(|(b, i)| b.clone_from(&dict[*i as usize]));
+                    for i in 0..num_values {


Using the zipped iterator as before avoids the bounds checks on index_buf and buffer, that might make a measurable difference.

jhorstmann · 2026-02-05T23:14:50Z

parquet/src/file/metadata/mod.rs


    /// Returns the offset and length in bytes of the column chunk within the file
-    pub fn byte_range(&self) -> (u64, u64) {
+    pub fn byte_range(&self) -> Result<(u64, u64)> {


Most other accessors for this struct return signed values and validation seems to be left to the caller. I think it would be more consistent if byte_range also returned a (i64, i64). Even an Ok result here still needs to be validated to be actually in bounds for the given file.

jhorstmann · 2026-02-05T23:19:55Z

parquet/src/schema/types.rs

                    ));
                }
+            } else if !is_root_node {
+                return Err(general_err!("Repetition level must be defined for non-root types"));


I don't understand how this error relates to the precision of decimal types

Improve robustness with comprehensive bounds checking and graceful er…

072d26d

…ror handling

github-actions bot added the parquet Changes to the parquet crate label Feb 5, 2026

jhorstmann reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling in various places #9365

Error handling in various places #9365

Uh oh!

rambleraptor commented Feb 5, 2026 •

edited

Loading

Uh oh!

jhorstmann Feb 5, 2026

Uh oh!

jhorstmann Feb 5, 2026

Uh oh!

jhorstmann Feb 5, 2026

Uh oh!

jhorstmann Feb 5, 2026

Uh oh!

jhorstmann Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Error handling in various places #9365

Are you sure you want to change the base?

Error handling in various places #9365

Uh oh!

Conversation

rambleraptor commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

jhorstmann Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rambleraptor commented Feb 5, 2026 •

edited

Loading