Fix MAT v5 reader: miCOMPRESSED elements are not 8-byte padded#13
Open
jyyen wants to merge 1 commit into
Open
Conversation
read_element padded every element to an 8-byte boundary, but scipy and MATLAB do not pad miCOMPRESSED elements. The padded advance over-shot into the next element's zlib stream and mis-read its tag, failing on real StaMPS/scipy .mat files with "small data element at byte N has <garbage> bytes". For MI_COMPRESSED, advance to data_end and skip any trailing zero padding (top-level element tags are never 0x00, so this also tolerates writers that do pad). All other element types keep the 8-byte padding. Verified: reads real StaMPS ps2.mat (75 MB v5, bperp -> 279x1) and parms.mat, plus a 4-variable scipy compressed file. Adds regression test reads_multiple_unpadded_compressed_elements; the existing reads_compressed_mat_v5_elements modeled a single padded blob and so never exercised the multi-element walk that real per-variable compression produces. Fixes ESA-PhiLab#12 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019PjQ6amGpaSTEbAv368SUc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #12.
Problem
MatData::readfails on real StaMPS/scipy.matfiles with errors likesmall data element at byte 2240 has 55552 bytes. The parser already handles zlib (miCOMPRESSED) and small elements, so this is a parsing/offset bug, not a missing feature.Root cause
read_elementpads every element to an 8-byte boundary, but scipy and MATLAB do not pad miCOMPRESSED elements. They write each variable as its own miCOMPRESSED element, so the padded advance over-shoots into the next element's zlib stream and mis-reads its tag.Instrumented walk on a 4-variable scipy file:
Variable 1 is
8 (tag) + 43 (data) = 51bytes; the code jumps to 56, but the next variable's zlib tag is actually at byte 51 (bytes 51..56 are the next tag + start of its stream, not padding).Fix
For
MI_COMPRESSED, advance todata_endand skip any trailing zero padding (top-level element tags are never0x00, so this also tolerates writers that do pad). All other element types keep the 8-byte padding.Verification
ps2.mat(75 MB, MATLAB v5 compressed): reads,bperp-> (279, 1).parms.mat(v5 struct): reads.savemat(..., do_compression=True): reads all four.cargo test -p pystamps-mat: 12/12 green, including a new regression testreads_multiple_unpadded_compressed_elements. The existingreads_compressed_mat_v5_elementsmodeled a single padded blob, so it never exercised the multi-element walk that real per-variable compression produces.Note:
native_stage6.rs::mat_v5_variable_shapereturnsOk(None)on compressed top-level elements, so the compressed read path goes throughMatData::read(fixed here); itsmat_element_headercarries the same padding assumption and would need the same fix if it ever walks compressed elements.🤖 Generated with Claude Code
https://claude.ai/code/session_019PjQ6amGpaSTEbAv368SUc