[203_26]: fix LaTeX export images in inaccessible formats#2958
[203_26]: fix LaTeX export images in inaccessible formats#2958divyansharma001 wants to merge 2 commits intoMoganLab:mainfrom
Conversation
divyansharma001
commented
Mar 7, 2026
- Add magic header detection for images with unknown suffixes
- Add jpeg to fast-path suffix list in tmtex-as-eps
- Copy files with correct extension when format is LaTeX-compatible
- Add fallback to print-snippet when convert-to-file fails
- Register postscript-file converters for jpeg, gif, tif via ImageMagick
- Add magic header detection for images with unknown suffixes - Add jpeg to fast-path suffix list in tmtex-as-eps - Copy files with correct extension when format is LaTeX-compatible - Add fallback to print-snippet when convert-to-file fails - Register postscript-file converters for jpeg, gif, tif via ImageMagick
There was a problem hiding this comment.
Pull request overview
This PR improves TeXmacs’ LaTeX export pipeline for embedded images, especially when the image suffix is missing/incorrect or when intermediate conversions are needed to reach a LaTeX-compatible format.
Changes:
- Add magic-header based format detection for unknown-suffix images (PDF/PNG/JPEG) and copy them with a LaTeX-friendly extension when possible.
- Improve conversion robustness by handling
convert-to-filefailures (fallback rendering) and expanding the “fast path” suffix list to includejpeg. - Register additional ImageMagick-based converters (JPEG/GIF/TIF →
postscript-file) to complete conversion chains toward PDF.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| devel/203_26.md | Adds a dev note describing the problem, rationale, and manual test steps for LaTeX export image fixes. |
| TeXmacs/progs/convert/latex/tmtex.scm | Implements magic-header detection, suffix fast-path tweaks, copy-with-correct-extension behavior, and conversion failure fallback. |
| TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm | Mirrors the same LaTeX export image handling changes for the LaTeX plugin. |
| TeXmacs/plugins/image/progs/image/jpeg.scm | Adds ImageMagick-based jpeg-file → postscript-file converter. |
| TeXmacs/plugins/image/progs/image/gif.scm | Adds ImageMagick-based gif-file → postscript-file converter. |
| TeXmacs/plugins/image/progs/image/tif.scm | Adds ImageMagick-based tif-file → postscript-file converter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (define (tmtex-guess-format u suffix) | ||
| "Detect image format via magic header when suffix is unknown." | ||
| (if (and (url-exists? u) (== (format-from-suffix suffix) "generic")) | ||
| (with data (string-load u) | ||
| (cond ((and (> (string-length data) 8) | ||
| (== (char->integer (string-ref data 0)) #xff) | ||
| (== (char->integer (string-ref data 1)) #xd8)) | ||
| "jpeg") | ||
| ((and (> (string-length data) 8) | ||
| (string-starts? data "\x89PNG")) | ||
| "png") | ||
| ((and (> (string-length data) 5) | ||
| (string-starts? data "%PDF-")) | ||
| "pdf") | ||
| (else #f))) | ||
| #f)) |
There was a problem hiding this comment.
tmtex-guess-format uses string-load which loads the entire file into memory just to inspect a few header bytes. This can be very expensive for large images during LaTeX export. Prefer reading only a small prefix (e.g., open the file and read the first ~8–16 bytes) and run the magic-header checks on that prefix.
| (if (and (url-exists? u) | ||
| (in? suffix (list "eps" "pdf" "png" "jpg" "jpeg"))) | ||
| ;; Fast path: image already in LaTeX-compatible format |
There was a problem hiding this comment.
The fast-path suffix check uses suffix as returned by url-suffix, which is case-sensitive, so files like IMAGE.JPG will miss the fast path and be unnecessarily converted/fallback-rendered. Consider using a normalized suffix (e.g., locase-all) for the in? comparison (and similarly for any other suffix comparisons in this function).
| ;; Conversion failed: fall back to rendering via TeXmacs | ||
| (begin | ||
| (print-snippet name-url | ||
| `(image ,name "0.618par" "" "" "") #t) |
There was a problem hiding this comment.
The convert-to-file failure fallback renders an (image ...) snippet with a hard-coded width of 0.618par, which will change the exported image size (and may not match the original document’s image sizing). Consider preserving the original image dimensions/magnification, or at least avoid forcing an arbitrary width in the fallback so the PDF’s natural size is used.
| `(image ,name "0.618par" "" "" "") #t) | |
| `(image ,name "" "" "" "") #t) |
| (let* ((u (url-relative current-save-target (unix->url name))) | ||
| (suffix (url-suffix u)) | ||
| (fm (string-append (format-from-suffix suffix) "-file"))) | ||
| (if (and (url-exists? u) (in? suffix (list "eps" "pdf" "png" "jpg"))) | ||
| (detected (tmtex-guess-format u suffix)) | ||
| (fm (if detected | ||
| (string-append detected "-file") | ||
| (string-append (format-from-suffix suffix) "-file")))) |
There was a problem hiding this comment.
tmtex-guess-format reads the image using u computed relative to current-save-target, but u is only corrected for paths starting with ".." later (inside the conversion branch). This breaks magic-header detection (and the new copy-with-correct-extension path) for images referenced via "../...". Consider normalizing u (apply the existing ".." handling) before calling tmtex-guess-format, and reuse that corrected u in both the copy and convert paths.
| (with data (string-load u) | ||
| (cond ((and (> (string-length data) 8) | ||
| (== (char->integer (string-ref data 0)) #xff) | ||
| (== (char->integer (string-ref data 1)) #xd8)) | ||
| "jpeg") | ||
| ((and (> (string-length data) 8) | ||
| (string-starts? data "\x89PNG")) | ||
| "png") | ||
| ((and (> (string-length data) 5) | ||
| (string-starts? data "%PDF-")) |
There was a problem hiding this comment.
tmtex-guess-format uses string-load which loads the entire file into memory just to inspect a few header bytes. This can be very expensive for large images during LaTeX export. Prefer reading only a small prefix (e.g., open the file and read the first ~8–16 bytes) and run the magic-header checks on that prefix.
| (with data (string-load u) | |
| (cond ((and (> (string-length data) 8) | |
| (== (char->integer (string-ref data 0)) #xff) | |
| (== (char->integer (string-ref data 1)) #xd8)) | |
| "jpeg") | |
| ((and (> (string-length data) 8) | |
| (string-starts? data "\x89PNG")) | |
| "png") | |
| ((and (> (string-length data) 5) | |
| (string-starts? data "%PDF-")) | |
| (let* ((path (url->string u)) | |
| (header | |
| (with-input-from-file path | |
| (lambda () | |
| ;; Read at most 16 characters from the file for magic-header checks | |
| (let loop ((i 0) (chars '())) | |
| (if (or (>= i 16) (eof-object? (peek-char))) | |
| (list->string (reverse chars)) | |
| (loop (+ i 1) (cons (read-char) chars)))))))) | |
| (cond ((and (> (string-length header) 8) | |
| (== (char->integer (string-ref header 0)) #xff) | |
| (== (char->integer (string-ref header 1)) #xd8)) | |
| "jpeg") | |
| ((and (> (string-length header) 8) | |
| (string-starts? header "\x89PNG")) | |
| "png") | |
| ((and (> (string-length header) 5) | |
| (string-starts? header "%PDF-")) |
| (if (and (url-exists? u) | ||
| (in? suffix (list "eps" "pdf" "png" "jpg" "jpeg"))) | ||
| ;; Fast path: image already in LaTeX-compatible format |
There was a problem hiding this comment.
The fast-path suffix check uses suffix as returned by url-suffix, which is case-sensitive, so files like IMAGE.JPG will miss the fast path and be unnecessarily converted/fallback-rendered. Consider using a normalized suffix (e.g., locase-all) for the in? comparison (and similarly for any other suffix comparisons in this function).
| ;; Conversion failed: fall back to rendering via TeXmacs | ||
| (begin | ||
| (print-snippet name-url | ||
| `(image ,name "0.618par" "" "" "") #t) |
There was a problem hiding this comment.
The convert-to-file failure fallback renders an (image ...) snippet with a hard-coded width of 0.618par, which will change the exported image size (and may not match the original document’s image sizing). Consider preserving the original image dimensions/magnification, or at least avoid forcing an arbitrary width in the fallback so the PDF’s natural size is used.
| `(image ,name "0.618par" "" "" "") #t) | |
| `(image ,name "" "" "" "") #t) |
| (let* ((u (url-relative current-save-target (unix->url name))) | ||
| (suffix (url-suffix u)) | ||
| (fm (string-append (format-from-suffix suffix) "-file"))) | ||
| (if (and (url-exists? u) (in? suffix (list "eps" "pdf" "png" "jpg"))) | ||
| (detected (tmtex-guess-format u suffix)) | ||
| (fm (if detected | ||
| (string-append detected "-file") | ||
| (string-append (format-from-suffix suffix) "-file")))) |
There was a problem hiding this comment.
tmtex-guess-format reads the image using u computed relative to current-save-target, but u is only corrected for paths starting with ".." later (inside the conversion branch). This breaks magic-header detection (and the new copy-with-correct-extension path) for images referenced via "../...". Consider normalizing u (apply the existing ".." handling) before calling tmtex-guess-format, and reuse that corrected u in both the copy and convert paths.
- object_l1.cpp: decode base64 in tmscm_to_content for RAW_DATA nodes so Scheme->C++ path always stores binary in RAW_DATA tree - embedded-edit.scm: encode binary image data as base64 before storing in (raw-data ...) to match the decode in object_l1.cpp Invariant: Scheme layer always has base64, C++ layer always has binary
|
Thank you for the detailed analysis! You were right, the root cause was double base64 encoding. I've pushed two additional fixes: Root cause: object_l1.cpp::tmscm_to_content was not decoding base64 when receiving (raw-data ...) from Scheme, so the C++ RAW_DATA tree ended up storing base64 text instead of binary. This base64 string then got written to the temp file via get_from_ramdisc, causing Qt/MuPDF to fail reading it. Fixes: moebius/Scheme/L1/object_l1.cpp: Added base64 decode in tmscm_to_content for RAW_DATA nodes, so the Scheme→C++ boundary always produces binary in RAW_DATA Regarding your second point, I've also kept jpg and png in the fast path from the previous commit. Happy to prioritize those formats further if needed. |
|
please resolve the conflicts. |
|
I recommend splitting this PR into two separate PRs. One PR would address the fix for the double Base64 encoding issue during LaTeX conversion. I think it only involves changes to two files: The other PR would cover the magic header detection and the other features currently included in this PR. This separation would make the review and testing process clearer and more manageable. |
