Skip to content

Fix MAT v5 reader: miCOMPRESSED elements are not 8-byte padded#13

Open
jyyen wants to merge 1 commit into
ESA-PhiLab:mainfrom
jyyen:fix/micompressed-mat-v5-padding
Open

Fix MAT v5 reader: miCOMPRESSED elements are not 8-byte padded#13
jyyen wants to merge 1 commit into
ESA-PhiLab:mainfrom
jyyen:fix/micompressed-mat-v5-padding

Conversation

@jyyen

@jyyen jyyen commented Jun 25, 2026

Copy link
Copy Markdown

Fixes #12.

Problem

MatData::read fails on real StaMPS/scipy .mat files with errors like small data element at byte 2240 has 55552 bytes. The parser already handles zlib (miCOMPRESSED) and small elements, so this is a parsing/offset bug, not a missing feature.

Root cause

read_element pads every element to an 8-byte boundary, but scipy and MATLAB do not pad miCOMPRESSED elements. They write each variable as its own miCOMPRESSED element, so the padded advance over-shoots into the next element's zlib stream and mis-reads its tag.

Instrumented walk on a 4-variable scipy file:

off=0/248   type=15 (miCOMPRESSED) size=43   <- variable 1 (decompresses + reads fine)
off=56/248  type=0  size=30720  <- ERROR

Variable 1 is 8 (tag) + 43 (data) = 51 bytes; the code jumps to 56, but the next variable's zlib tag is actually at byte 51 (bytes 51..56 are the next tag + start of its stream, not padding).

Fix

For MI_COMPRESSED, advance to data_end and skip any trailing zero padding (top-level element tags are never 0x00, so this also tolerates writers that do pad). All other element types keep the 8-byte padding.

Verification

  • Real StaMPS ps2.mat (75 MB, MATLAB v5 compressed): reads, bperp -> (279, 1).
  • StaMPS parms.mat (v5 struct): reads.
  • 4-variable scipy savemat(..., do_compression=True): reads all four.
  • cargo test -p pystamps-mat: 12/12 green, including a new regression test reads_multiple_unpadded_compressed_elements. The existing reads_compressed_mat_v5_elements modeled a single padded blob, so it never exercised the multi-element walk that real per-variable compression produces.

Note: native_stage6.rs::mat_v5_variable_shape returns Ok(None) on compressed top-level elements, so the compressed read path goes through MatData::read (fixed here); its mat_element_header carries the same padding assumption and would need the same fix if it ever walks compressed elements.

🤖 Generated with Claude Code

https://claude.ai/code/session_019PjQ6amGpaSTEbAv368SUc

read_element padded every element to an 8-byte boundary, but scipy and
MATLAB do not pad miCOMPRESSED elements. The padded advance over-shot into
the next element's zlib stream and mis-read its tag, failing on real
StaMPS/scipy .mat files with "small data element at byte N has <garbage> bytes".

For MI_COMPRESSED, advance to data_end and skip any trailing zero padding
(top-level element tags are never 0x00, so this also tolerates writers that
do pad). All other element types keep the 8-byte padding.

Verified: reads real StaMPS ps2.mat (75 MB v5, bperp -> 279x1) and parms.mat,
plus a 4-variable scipy compressed file. Adds regression test
reads_multiple_unpadded_compressed_elements; the existing
reads_compressed_mat_v5_elements modeled a single padded blob and so never
exercised the multi-element walk that real per-variable compression produces.

Fixes ESA-PhiLab#12

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019PjQ6amGpaSTEbAv368SUc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native v5 .mat parser fails on standard MATLAB/scipy-saved files (small-element offset mis-walk)

1 participant