Remove primitive map key assertion on record reader by njaremko · Pull Request #7769 · apache/arrow-rs

njaremko · 2025-06-24T15:54:07Z

Note:

This PR has a test that requires a file that needs to be upstreamed

Which issue does this PR close?

Closes Unable to parse parquet files with maps with complex keys #7768.

Rationale for this change

There's no such requirement in the parquet logical type specification for map types. Spark used to have a similar assertion, that they've since removed

[SPARK-32639][SQL]Support GroupType parquet mapkey field spark#29451

What changes are included in this PR?

Getting rid of the assertion, and adding a test

Are there any user-facing changes?

No

If there are any breaking changes to public APIs, please call them out.

etseidl

Thanks for correcting this oversight @njaremko. FWIW the arrow API handles the test file as expected. Just a few nits with the test.

etseidl · 2025-06-24T16:29:18Z

parquet/src/record/reader.rs

    }

+    #[test]
+    fn test_file_reader_rows_nullable1() {


A more descriptive name here would be nice...test_compound_map_key perhaps?

lol, whoops, yeah, definitely need to fix this

etseidl · 2025-06-24T16:29:47Z

parquet/src/record/reader.rs


+    #[test]
+    fn test_file_reader_rows_nullable1() {
+        let rows = test_file_reader_rows("databricks.parquet", None).unwrap();


Likewise, something more descriptive than "databricks".

It was named this because it's a file generated by databricks with every supported column type, I'll look into regenerating it

etseidl · 2025-06-24T16:31:05Z

parquet/src/record/reader.rs

+                ]
+            ),
+            (
+                "map_nested".to_string(),


This is the meat of the PR; would it be possible to only have this column in the test file? This is a lot to slog through and obscures the point of the test.

njaremko · 2025-06-27T19:10:22Z

I'll update this PR with the requested changes once this gets merged: apache/parquet-testing#87

…_assertion_on_record_reader

alamb · 2025-07-13T11:08:09Z

This PR has a test that requires a file that needs to be upstreamed

THanks again @njaremko

Given that we don't really control the timeline of the upstream parquet-testing repo, if you want to unblock this PR you could potentially copy the test file into the arrow-rs repo with a TODO comment that says to change to use the copy upstream in parquet-testing when apache/parquet-testing#87 is merged

alamb · 2025-09-08T16:58:25Z

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

njaremko changed the title ~~Nathan 06 24 remove primitive map key assertion on record reader~~ Remove primitive map key assertion on record reader Jun 24, 2025

njaremko force-pushed the nathan_06-24-remove_primitive_map_key_assertion_on_record_reader branch from 2d8b0b9 to ea6d5f6 Compare June 24, 2025 15:56

github-actions bot added the parquet Changes to the parquet crate label Jun 24, 2025

Remove primitive map key assertion on record reader

edd9b70

njaremko force-pushed the nathan_06-24-remove_primitive_map_key_assertion_on_record_reader branch from ea6d5f6 to edd9b70 Compare June 24, 2025 15:57

njaremko mentioned this pull request Jun 24, 2025

Add databricks direct unload file containing complex map key apache/parquet-testing#87

Open

etseidl requested changes Jun 24, 2025

View reviewed changes

etseidl added the bug label Jun 24, 2025

Merge branch 'apache:main' into nathan_06-24-remove_primitive_map_key…

54858bf

…_assertion_on_record_reader

alamb marked this pull request as draft September 8, 2025 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove primitive map key assertion on record reader#7769

Remove primitive map key assertion on record reader#7769
njaremko wants to merge 2 commits intoapache:mainfrom
njaremko:nathan_06-24-remove_primitive_map_key_assertion_on_record_reader

njaremko commented Jun 24, 2025 •

edited by alamb

Loading

Uh oh!

etseidl left a comment

Uh oh!

etseidl Jun 24, 2025

Uh oh!

njaremko Jun 25, 2025

Uh oh!

etseidl Jun 24, 2025

Uh oh!

njaremko Jun 25, 2025

Uh oh!

etseidl Jun 24, 2025

Uh oh!

njaremko commented Jun 27, 2025

Uh oh!

alamb commented Jul 13, 2025

Uh oh!

alamb commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

njaremko commented Jun 24, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note:

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

etseidl left a comment

Choose a reason for hiding this comment

Uh oh!

etseidl Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

njaremko Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

etseidl Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

njaremko Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

etseidl Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

njaremko commented Jun 27, 2025

Uh oh!

alamb commented Jul 13, 2025

Uh oh!

alamb commented Sep 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njaremko commented Jun 24, 2025 •

edited by alamb

Loading