[203_26]: fix LaTeX export images in inaccessible formats by divyansharma001 · Pull Request #2958 · MoganLab/mogan

divyansharma001 · 2026-03-07T18:16:43Z

Add magic header detection for images with unknown suffixes
Add jpeg to fast-path suffix list in tmtex-as-eps
Copy files with correct extension when format is LaTeX-compatible
Add fallback to print-snippet when convert-to-file fails
Register postscript-file converters for jpeg, gif, tif via ImageMagick

- Add magic header detection for images with unknown suffixes - Add jpeg to fast-path suffix list in tmtex-as-eps - Copy files with correct extension when format is LaTeX-compatible - Add fallback to print-snippet when convert-to-file fails - Register postscript-file converters for jpeg, gif, tif via ImageMagick

Copilot

Pull request overview

This PR improves TeXmacs’ LaTeX export pipeline for embedded images, especially when the image suffix is missing/incorrect or when intermediate conversions are needed to reach a LaTeX-compatible format.

Changes:

Add magic-header based format detection for unknown-suffix images (PDF/PNG/JPEG) and copy them with a LaTeX-friendly extension when possible.
Improve conversion robustness by handling convert-to-file failures (fallback rendering) and expanding the “fast path” suffix list to include jpeg.
Register additional ImageMagick-based converters (JPEG/GIF/TIF → postscript-file) to complete conversion chains toward PDF.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
devel/203_26.md	Adds a dev note describing the problem, rationale, and manual test steps for LaTeX export image fixes.
TeXmacs/progs/convert/latex/tmtex.scm	Implements magic-header detection, suffix fast-path tweaks, copy-with-correct-extension behavior, and conversion failure fallback.
TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm	Mirrors the same LaTeX export image handling changes for the LaTeX plugin.
TeXmacs/plugins/image/progs/image/jpeg.scm	Adds ImageMagick-based `jpeg-file` → `postscript-file` converter.
TeXmacs/plugins/image/progs/image/gif.scm	Adds ImageMagick-based `gif-file` → `postscript-file` converter.
TeXmacs/plugins/image/progs/image/tif.scm	Adds ImageMagick-based `tif-file` → `postscript-file` converter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-07T18:24:31Z

TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm

+(define (tmtex-guess-format u suffix)
+  "Detect image format via magic header when suffix is unknown."
+  (if (and (url-exists? u) (== (format-from-suffix suffix) "generic"))
+      (with data (string-load u)
+        (cond ((and (> (string-length data) 8)
+                    (== (char->integer (string-ref data 0)) #xff)
+                    (== (char->integer (string-ref data 1)) #xd8))
+               "jpeg")
+              ((and (> (string-length data) 8)
+                    (string-starts? data "\x89PNG"))
+               "png")
+              ((and (> (string-length data) 5)
+                    (string-starts? data "%PDF-"))
+               "pdf")
+              (else #f)))
+      #f))


tmtex-guess-format uses string-load which loads the entire file into memory just to inspect a few header bytes. This can be very expensive for large images during LaTeX export. Prefer reading only a small prefix (e.g., open the file and read the first ~8–16 bytes) and run the magic-header checks on that prefix.

Copilot · 2026-03-07T18:24:32Z

TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm

+    (if (and (url-exists? u)
+             (in? suffix (list "eps" "pdf" "png" "jpg" "jpeg")))
+        ;; Fast path: image already in LaTeX-compatible format


The fast-path suffix check uses suffix as returned by url-suffix, which is case-sensitive, so files like IMAGE.JPG will miss the fast path and be unnecessarily converted/fallback-rendered. Consider using a normalized suffix (e.g., locase-all) for the in? comparison (and similarly for any other suffix comparisons in this function).

Copilot · 2026-03-07T18:24:32Z

TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm

+                    ;; Conversion failed: fall back to rendering via TeXmacs
+                    (begin
+                      (print-snippet name-url
+                                     `(image ,name "0.618par" "" "" "") #t)


The convert-to-file failure fallback renders an (image ...) snippet with a hard-coded width of 0.618par, which will change the exported image size (and may not match the original document’s image sizing). Consider preserving the original image dimensions/magnification, or at least avoid forcing an arbitrary width in the fallback so the PDF’s natural size is used.

Suggested change

`(image ,name "0.618par" "" "" "") #t)

`(image ,name "" "" "" "") #t)

Copilot · 2026-03-07T18:24:32Z

TeXmacs/progs/convert/latex/tmtex.scm

  (let* ((u (url-relative current-save-target (unix->url name)))
         (suffix (url-suffix u))
-         (fm (string-append (format-from-suffix suffix) "-file")))
-    (if (and (url-exists? u) (in? suffix (list "eps" "pdf" "png" "jpg")))
+         (detected (tmtex-guess-format u suffix))
+         (fm (if detected
+                 (string-append detected "-file")
+                 (string-append (format-from-suffix suffix) "-file"))))


tmtex-guess-format reads the image using u computed relative to current-save-target, but u is only corrected for paths starting with ".." later (inside the conversion branch). This breaks magic-header detection (and the new copy-with-correct-extension path) for images referenced via "../...". Consider normalizing u (apply the existing ".." handling) before calling tmtex-guess-format, and reuse that corrected u in both the copy and convert paths.

Copilot · 2026-03-07T18:24:32Z

TeXmacs/progs/convert/latex/tmtex.scm

+      (with data (string-load u)
+        (cond ((and (> (string-length data) 8)
+                    (== (char->integer (string-ref data 0)) #xff)
+                    (== (char->integer (string-ref data 1)) #xd8))
+               "jpeg")
+              ((and (> (string-length data) 8)
+                    (string-starts? data "\x89PNG"))
+               "png")
+              ((and (> (string-length data) 5)
+                    (string-starts? data "%PDF-"))


tmtex-guess-format uses string-load which loads the entire file into memory just to inspect a few header bytes. This can be very expensive for large images during LaTeX export. Prefer reading only a small prefix (e.g., open the file and read the first ~8–16 bytes) and run the magic-header checks on that prefix.

Suggested change

(with data (string-load u)

(cond ((and (> (string-length data) 8)

(== (char->integer (string-ref data 0)) #xff)

(== (char->integer (string-ref data 1)) #xd8))

"jpeg")

((and (> (string-length data) 8)

(string-starts? data "\x89PNG"))

"png")

((and (> (string-length data) 5)

(string-starts? data "%PDF-"))

(let* ((path (url->string u))

(header

(with-input-from-file path

(lambda ()

;; Read at most 16 characters from the file for magic-header checks

(let loop ((i 0) (chars '()))

(if (or (>= i 16) (eof-object? (peek-char)))

(list->string (reverse chars))

(loop (+ i 1) (cons (read-char) chars))))))))

(cond ((and (> (string-length header) 8)

(== (char->integer (string-ref header 0)) #xff)

(== (char->integer (string-ref header 1)) #xd8))

"jpeg")

((and (> (string-length header) 8)

(string-starts? header "\x89PNG"))

"png")

((and (> (string-length header) 5)

(string-starts? header "%PDF-"))

Copilot · 2026-03-07T18:24:32Z

TeXmacs/progs/convert/latex/tmtex.scm

+    (if (and (url-exists? u)
+             (in? suffix (list "eps" "pdf" "png" "jpg" "jpeg")))
+        ;; Fast path: image already in LaTeX-compatible format


The fast-path suffix check uses suffix as returned by url-suffix, which is case-sensitive, so files like IMAGE.JPG will miss the fast path and be unnecessarily converted/fallback-rendered. Consider using a normalized suffix (e.g., locase-all) for the in? comparison (and similarly for any other suffix comparisons in this function).

Copilot · 2026-03-07T18:24:33Z

TeXmacs/progs/convert/latex/tmtex.scm

+                    ;; Conversion failed: fall back to rendering via TeXmacs
+                    (begin
+                      (print-snippet name-url
+                                     `(image ,name "0.618par" "" "" "") #t)


The convert-to-file failure fallback renders an (image ...) snippet with a hard-coded width of 0.618par, which will change the exported image size (and may not match the original document’s image sizing). Consider preserving the original image dimensions/magnification, or at least avoid forcing an arbitrary width in the fallback so the PDF’s natural size is used.

Suggested change

`(image ,name "0.618par" "" "" "") #t)

`(image ,name "" "" "" "") #t)

Copilot · 2026-03-07T18:24:33Z

TeXmacs/plugins/latex/progs/convert/latex/tmtex.scm

  (let* ((u (url-relative current-save-target (unix->url name)))
         (suffix (url-suffix u))
-         (fm (string-append (format-from-suffix suffix) "-file")))
-    (if (and (url-exists? u) (in? suffix (list "eps" "pdf" "png" "jpg")))
+         (detected (tmtex-guess-format u suffix))
+         (fm (if detected
+                 (string-append detected "-file")
+                 (string-append (format-from-suffix suffix) "-file"))))


tmtex-guess-format reads the image using u computed relative to current-save-target, but u is only corrected for paths starting with ".." later (inside the conversion branch). This breaks magic-header detection (and the new copy-with-correct-extension path) for images referenced via "../...". Consider normalizing u (apply the existing ".." handling) before calling tmtex-guess-format, and reuse that corrected u in both the copy and convert paths.

AXeonV · 2026-03-08T06:20:03Z

I insert a jpeg and export the document as LaTeX. Then it throws error above.

Firstly, I think the key point for the LaTeX exporting bug is that, when inserting a embedded image and converting the document into LaTeX version, the raw data of the image will be base64-coded so that the data flow will be transformed between C++ and Scheme.

I've digged into the bug and find that the image data had been base64-coded twice in the memory somwhere. When converting, it will call

url
get_from_ramdisc (url u) {
  if (!is_ramdisc (u)) return url_none ();
  url res= get_cache (u);
  if (!is_none (res)) return (res);
  url tmp= url_temp (suffix (u));
  save_string (tmp, u[1][2]->t->label);
  return set_cache (u, tmp);
}

and save the coded image data into system temp path. But I locate that file and find that it had been base64-coded twice. So when converting this image to pdf version, it crashed.

It seems that your fix haven't fix the bad data in the temp path when LaTeX converting.

Secondly, I think jpg and png formats are more common than jpeg, tif and gif. You could fix jpg and png embedded image when LaTeX converting first.

Lastly, thank you for this PR. This bug is very critical.

- object_l1.cpp: decode base64 in tmscm_to_content for RAW_DATA nodes so Scheme->C++ path always stores binary in RAW_DATA tree - embedded-edit.scm: encode binary image data as base64 before storing in (raw-data ...) to match the decode in object_l1.cpp Invariant: Scheme layer always has base64, C++ layer always has binary

divyansharma001 · 2026-03-15T09:49:20Z

Thank you for the detailed analysis! You were right, the root cause was double base64 encoding.

I've pushed two additional fixes:

Root cause: object_l1.cpp::tmscm_to_content was not decoding base64 when receiving (raw-data ...) from Scheme, so the C++ RAW_DATA tree ended up storing base64 text instead of binary. This base64 string then got written to the temp file via get_from_ramdisc, causing Qt/MuPDF to fail reading it.

Fixes:

moebius/Scheme/L1/object_l1.cpp: Added base64 decode in tmscm_to_content for RAW_DATA nodes, so the Scheme→C++ boundary always produces binary in RAW_DATA
TeXmacs/progs/generic/embedded-edit.scm: Encode binary image data as base64 before storing into (raw-data ...) when embedding an image, to match the decode above
This establishes a clear invariant: Scheme layer always holds base64 in raw-data, C++ layer always holds binary in RAW_DATA.

Regarding your second point, I've also kept jpg and png in the fast path from the previous commit.

Happy to prioritize those formats further if needed.

JackYansongLi · 2026-03-18T08:28:39Z

please resolve the conflicts.

AXeonV · 2026-03-19T09:13:08Z

I recommend splitting this PR into two separate PRs.

One PR would address the fix for the double Base64 encoding issue during LaTeX conversion. I think it only involves changes to two files: object_l1.cpp and embedded-edit.scm.

The other PR would cover the magic header detection and the other features currently included in this PR.

This separation would make the review and testing process clearer and more manageable.

Copilot AI review requested due to automatic review settings March 7, 2026 18:16

Copilot started reviewing on behalf of divyansharma001 March 7, 2026 18:17 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

JackYansongLi assigned AXeonV Mar 8, 2026

	`(image ,name "0.618par" "" "" "") #t)
	`(image ,name "" "" "" "") #t)

-      (with data (string-load u)
-        (cond ((and (> (string-length data) 8)
-                    (== (char->integer (string-ref data 0)) #xff)
-                    (== (char->integer (string-ref data 1)) #xd8))
-               "jpeg")
-              ((and (> (string-length data) 8)
-                    (string-starts? data "\x89PNG"))
-               "png")
-              ((and (> (string-length data) 5)
-                    (string-starts? data "%PDF-"))
+      (let* ((path (url->string u))
+             (header
+              (with-input-from-file path
+                (lambda ()
+                  ;; Read at most 16 characters from the file for magic-header checks
+                  (let loop ((i 0) (chars '()))
+                    (if (or (>= i 16) (eof-object? (peek-char)))
+                        (list->string (reverse chars))
+                        (loop (+ i 1) (cons (read-char) chars))))))))
+        (cond ((and (> (string-length header) 8)
+                    (== (char->integer (string-ref header 0)) #xff)
+                    (== (char->integer (string-ref header 1)) #xd8))
+               "jpeg")
+              ((and (> (string-length header) 8)
+                    (string-starts? header "\x89PNG"))
+               "png")
+              ((and (> (string-length header) 5)
+                    (string-starts? header "%PDF-"))

Conversation

divyansharma001 commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

AXeonV commented Mar 8, 2026

Uh oh!

divyansharma001 commented Mar 15, 2026

Uh oh!

JackYansongLi commented Mar 18, 2026

Uh oh!

AXeonV commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants