From 0ac346e516311398ebc6de372d8b7dbf142a9f97 Mon Sep 17 00:00:00 2001 From: eebette Date: Tue, 28 Apr 2026 16:20:03 +0900 Subject: [PATCH 1/8] Captioned m.image upload via markdown image links MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Markdown image references in the body that point at http(s) URLs (`![alt](https://...)`) now cause the bot to fetch the URL, upload to the homeserver's media repo, and emit a single `m.image` event whose `url` is the resulting `mxc://` URI. The remaining body — with all markdown image links stripped — becomes the caption (`body` and `formatted_body`). The `filename` field carries the URL's basename so that MSC4193-aware clients render the message as image-with-caption rather than treating `body` as the file name. Bodies without image links continue to send as `m.text` events unchanged. If the fetch or upload fails, the original body is sent as `m.text` and a warning is logged, so a flaky upstream CDN never drops the message. The previous inline `` approach in m.text formatted_body rendered correctly on Element Web but was a known no-op on Element X Android (missing feature, see element-hq/element-x-android#1874). MSC2530/MSC4193-style captioned m.image is now the standard way both clients render image-with-text in a single event. New module `matrix_webhook/media.py` exposes: - `upload_from_url(url)` — async fetch + media-repo upload, returns the mxc URI plus mimetype, size, and filename. - `captioned_image_or_text(body)` — returns an m.image event content dict if `body` has any markdown image references, else None. `handler.py` calls `captioned_image_or_text` between body parse and send; if it returns content, that is what is sent. Otherwise the existing m.text composition is used. The `formatted_body` escape hatch is unchanged. Tests cover four cases: image-with-caption, image-only-no-caption, no-image-falls-back-to-text, and failed-upload-falls-back-to-text. Tests use a stdlib threaded HTTPServer fixture to avoid event-loop deadlocks with the bot's async client. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 6 ++ README.md | 22 +++++ matrix_webhook/handler.py | 31 +++++-- matrix_webhook/media.py | 105 +++++++++++++++++++++ tests/test_image.py | 187 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 342 insertions(+), 9 deletions(-) create mode 100644 matrix_webhook/media.py create mode 100644 tests/test_image.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ce298d..62d44a7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 in [#169](https://github.com/nim65s/matrix-webhook/pull/243) by [@nim65s](https://github.com/nim65s) - setup mergify +- captioned image support: markdown image links pointing to http(s) URLs in + the body (`![alt](https://...)`) cause the bot to fetch the URL, upload + to the homeserver media repo, and emit a single `m.image` event with the + remaining body as caption and a `filename` field per MSC4193. Bodies + without image links continue to send as `m.text`. Failed uploads fall + back to `m.text` with the original body unchanged, with a warning logged. ## [v3.9.1] - 2024-03-09 diff --git a/README.md b/README.md index 65afe39..7f6205d 100644 --- a/README.md +++ b/README.md @@ -105,6 +105,28 @@ curl -d '{"body":"new contrib from toto: [44](http://radio.localhost/map/#44)", (or localhost:4785 without docker) +### Captioned images + +When the body contains a markdown image link with an http(s) URL +(`![alt](https://...)`), the bot fetches the URL, uploads the bytes to +the homeserver media repo, and sends a single `m.image` event whose +`url` is the resulting `mxc://` URI. The rest of the body — with all +markdown image links stripped — becomes the caption (`body` and +`formatted_body`), and the original URL's basename is preserved in the +`filename` field so [MSC4193](https://github.com/matrix-org/matrix-spec-proposals/pull/4193)-aware +clients render the image with a caption rather than treating `body` as +the file name. + +``` +curl -d '{"body":"**Title**\n\n![poster](https://example.com/poster.png)\n\nDescription text.", "key":"secret"}' \ + 'http://matrixwebhook.localhost/!DPrUlnwOhBEfYwsDLh:matrix.org' +``` + +Bodies with no image links continue to send as `m.text` events +unchanged. If the fetch or upload fails, the original body — including +the markdown image markup — is sent as `m.text` and a warning is +logged, so a flaky upstream CDN never drops a message. + ### For Github Add a JSON webhook with `?formatter=github`, and put the `API_KEY` as secret diff --git a/matrix_webhook/handler.py b/matrix_webhook/handler.py index ed40449..dad4f31 100644 --- a/matrix_webhook/handler.py +++ b/matrix_webhook/handler.py @@ -7,7 +7,7 @@ from markdown import markdown -from . import conf, formatters, utils +from . import conf, formatters, media, utils LOGGER = logging.getLogger("matrix_webhook.handler") @@ -80,20 +80,33 @@ async def matrix_webhook(request): if data["key"] != conf.API_KEY: return utils.create_json_response(HTTPStatus.UNAUTHORIZED, "Invalid API key") + # If a `formatted_body` is supplied directly, the caller is taking full + # control of HTML rendering: pass through as `m.text` unchanged. + # Otherwise, look for markdown image links in `body`. The presence of + # any such link upgrades the event to `m.image` with the (stripped) + # body as caption, so MSC4193-aware clients render image-with-caption + # inline. The fallback for no images / failed upload is `m.text`. if "formatted_body" in data: - formatted_body = data["formatted_body"] + content = { + "msgtype": "m.text", + "body": data["body"], + "format": "org.matrix.custom.html", + "formatted_body": data["formatted_body"], + } else: - formatted_body = markdown(str(data["body"]), extensions=["extra"]) + body = str(data["body"]) + content = await media.captioned_image_or_text(body) + if content is None: + content = { + "msgtype": "m.text", + "body": body, + "format": "org.matrix.custom.html", + "formatted_body": markdown(body, extensions=["extra"]), + } # try to join room first -> non none response means error resp = await utils.join_room(data["room_id"]) if resp is not None: return resp - content = { - "msgtype": "m.text", - "body": data["body"], - "format": "org.matrix.custom.html", - "formatted_body": formatted_body, - } return await utils.send_room_message(data["room_id"], content) diff --git a/matrix_webhook/media.py b/matrix_webhook/media.py new file mode 100644 index 0000000..85c68ad --- /dev/null +++ b/matrix_webhook/media.py @@ -0,0 +1,105 @@ +"""Matrix Webhook media upload helpers.""" + +import logging +import re +from http import HTTPStatus +from pathlib import PurePosixPath +from urllib.parse import urlparse + +import aiohttp +from markdown import markdown +from nio.responses import UploadError + +from . import utils + +LOGGER = logging.getLogger("matrix_webhook.media") + +# Markdown image syntax with an http(s) URL: ``![alt](url)``. +_MD_IMG_RE = re.compile(r"!\[([^\]]*)\]\((https?://[^)\s]+)\)") + + +async def upload_from_url(url): + """Fetch ``url`` and upload it to the homeserver media repo. + + Returns ``(mxc_uri, mimetype, size, filename)``. Raises ``ValueError`` + if either the fetch or the upload fails. + """ + msg = f"Fetching image from {url=}" + LOGGER.debug(msg) + + async with aiohttp.ClientSession() as session, session.get(url) as resp: + if resp.status != HTTPStatus.OK: + msg = f"Failed to fetch {url}: HTTP {resp.status}" + raise ValueError(msg) + content_type = resp.headers.get("Content-Type", "application/octet-stream") + image_bytes = await resp.read() + + filename = PurePosixPath(urlparse(url).path).name or "image" + + msg = f"Uploading {len(image_bytes)} bytes as {filename=} ({content_type=})" + LOGGER.debug(msg) + + upload_resp, _ = await utils.CLIENT.upload( + lambda got_429, got_timeouts: image_bytes, + content_type=content_type, + filename=filename, + filesize=len(image_bytes), + ) + + if isinstance(upload_resp, UploadError): + msg = f"Failed to upload {url}: {upload_resp.message}" + raise ValueError(msg) + + return upload_resp.content_uri, content_type, len(image_bytes), filename + + +async def captioned_image_or_text(body): + """Build an ``m.image`` event content from markdown image links in ``body``. + + If ``body`` contains at least one ``![alt](http(s)://...)`` reference, + fetches and uploads the FIRST one to the homeserver media repo and + returns an ``m.image`` event content dict with the URL set to the + resulting ``mxc://`` URI. The caption is the rest of ``body`` with all + markdown image links stripped, rendered as both plain text (``body``) + and HTML (``formatted_body``). The ``filename`` field is set to the + original URL's basename so MSC4193-aware clients render the image with + a caption rather than treating ``body`` as the filename. + + Returns ``None`` if ``body`` has no image references, or if the upload + fails. The caller is responsible for falling back to an ``m.text`` + event in either case. + """ + matches = list(_MD_IMG_RE.finditer(body)) + if not matches: + return None + + first = matches[0] + try: + mxc, mimetype, size, filename = await upload_from_url(first.group(2)) + except ValueError as e: + msg = f"Image upload skipped, falling back to text: {e}" + LOGGER.warning(msg) + return None + + # Strip ALL image refs from the body — the first becomes the m.image + # url, additional ones (rare) would require multiple events which we do + # not emit; leaving them in the caption as raw markdown would render as + # plain-text URLs in Element X anyway. Collapse the blank-line gap that + # the strip leaves around the image link. + caption = _MD_IMG_RE.sub("", body) + caption = re.sub(r"\n{3,}", "\n\n", caption).strip() + + content = { + "msgtype": "m.image", + "url": mxc, + "filename": filename, + "info": {"mimetype": mimetype, "size": size}, + } + if caption: + content["body"] = caption + content["format"] = "org.matrix.custom.html" + content["formatted_body"] = markdown(caption, extensions=["extra"]) + else: + content["body"] = filename + + return content diff --git a/tests/test_image.py b/tests/test_image.py new file mode 100644 index 0000000..b6e672e --- /dev/null +++ b/tests/test_image.py @@ -0,0 +1,187 @@ +"""Test module for the captioned ``m.image`` upload feature.""" + +import struct +import threading +import unittest +import zlib +from http.server import BaseHTTPRequestHandler, HTTPServer + +import httpx +import nio + +from .start import BOT_URL, FULL_ID, KEY, MATRIX_ID, MATRIX_PW, MATRIX_URL + + +def _tiny_png(): + """Generate a minimal valid 1x1 RGBA PNG, no external fixture needed.""" + + def chunk(name, data): + return ( + struct.pack(">I", len(data)) + + name + + data + + struct.pack(">I", zlib.crc32(name + data)) + ) + + ihdr = struct.pack(">IIBBBBB", 1, 1, 8, 6, 0, 0, 0) + idat = zlib.compress(b"\x00\x00\x00\x00\x00") # filter + transparent pixel + return ( + b"\x89PNG\r\n\x1a\n" + + chunk(b"IHDR", ihdr) + + chunk(b"IDAT", idat) + + chunk(b"IEND", b"") + ) + + +PNG_BYTES = _tiny_png() +FIXTURE_PORT = 4786 +FIXTURE_URL = f"http://localhost:{FIXTURE_PORT}/poster.png" + + +class _FixtureHandler(BaseHTTPRequestHandler): + """Serve PNG_BYTES at /poster.png; 404 elsewhere.""" + + def do_GET(self): + """Handle the GET request.""" + if self.path == "/poster.png": + self.send_response(200) + self.send_header("Content-Type", "image/png") + self.send_header("Content-Length", str(len(PNG_BYTES))) + self.end_headers() + self.wfile.write(PNG_BYTES) + else: + self.send_response(404) + self.end_headers() + + def log_message(self, format, *args): # noqa: A002 + """Silence the default access log.""" + + +class CaptionedImageTest(unittest.IsolatedAsyncioTestCase): + """Verify markdown image links produce captioned ``m.image`` events.""" + + @classmethod + def setUpClass(cls): + """Start a threaded HTTP fixture server for the whole test class.""" + cls.server = HTTPServer(("localhost", FIXTURE_PORT), _FixtureHandler) + cls.thread = threading.Thread(target=cls.server.serve_forever, daemon=True) + cls.thread.start() + + @classmethod + def tearDownClass(cls): + """Shut down the fixture server.""" + cls.server.shutdown() + cls.server.server_close() + cls.thread.join(timeout=2) + + async def test_image_with_caption(self): + """Body with a markdown image and surrounding text -> m.image with caption.""" + body = f"**Title**\n\n![poster]({FIXTURE_URL})\n\nDescription text." + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertEqual(msg.sender, FULL_ID) + self.assertIsInstance(msg, nio.RoomMessageImage) + self.assertTrue(msg.url.startswith("mxc://")) + # Caption: image markdown stripped, text preserved + self.assertEqual(msg.body, "**Title**\n\nDescription text.") + # filename field disambiguates body-as-caption (MSC4193) + src = msg.source["content"] + self.assertEqual(src["filename"], "poster.png") + self.assertEqual(src["info"]["mimetype"], "image/png") + self.assertEqual(src["info"]["size"], len(PNG_BYTES)) + self.assertIn("Title", src["formatted_body"]) + + async def test_image_only_no_caption(self): + """Body that is just a markdown image -> m.image with body=filename.""" + body = f"![poster]({FIXTURE_URL})" + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertIsInstance(msg, nio.RoomMessageImage) + # No caption -> body falls back to filename + self.assertEqual(msg.body, "poster.png") + # No formatted_body when there is no caption + self.assertNotIn("formatted_body", msg.source["content"]) + + async def test_no_image_falls_back_to_text(self): + """Body without any markdown image -> existing m.text path.""" + body = "Plain text, no image." + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertIsInstance(msg, nio.RoomMessageText) + self.assertEqual(msg.body, body) + + async def test_failed_image_falls_back_to_text(self): + """If the upload fails, the original body is sent unchanged as m.text.""" + bad_url = f"http://localhost:{FIXTURE_PORT}/missing.png" + body = f"caption text\n\n![poster]({bad_url})" + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertIsInstance(msg, nio.RoomMessageText) + self.assertIn(bad_url, msg.body) + # No m.image event sent + for event in messages.chunk: + self.assertNotIsInstance(event, nio.RoomMessageImage) From 9155d522eb242ed10029135f16df70f25ad894f9 Mon Sep 17 00:00:00 2001 From: eebette Date: Tue, 28 Apr 2026 16:45:58 +0900 Subject: [PATCH 2/8] Strip orphan markdown image refs from m.text fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When `body` contains a markdown image reference whose URL is empty or non-http (e.g. `![poster]()` produced by a templating engine like Jellyseerr's `{{image}}` resolving to empty for events with no associated media), the fallback m.text path would otherwise emit a broken `` tag. The new `strip_orphan_image_links` helper drops those references before the markdown-to-HTML render. http(s) refs are preserved so the upload-failure fallback path still surfaces the attempted URL. The helper only normalizes whitespace if it actually stripped something — bodies without orphan refs (including formatter outputs with intentional trailing newlines) pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- matrix_webhook/handler.py | 4 ++++ matrix_webhook/media.py | 26 ++++++++++++++++++++++++++ tests/test_image.py | 26 ++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/matrix_webhook/handler.py b/matrix_webhook/handler.py index dad4f31..24eba9b 100644 --- a/matrix_webhook/handler.py +++ b/matrix_webhook/handler.py @@ -97,6 +97,10 @@ async def matrix_webhook(request): body = str(data["body"]) content = await media.captioned_image_or_text(body) if content is None: + # Drop empty / non-http markdown image refs so they don't render + # as broken tags. http(s) refs are preserved (so an + # upload-failure URL stays visible). + body = media.strip_orphan_image_links(body) content = { "msgtype": "m.text", "body": body, diff --git a/matrix_webhook/media.py b/matrix_webhook/media.py index 85c68ad..30133dc 100644 --- a/matrix_webhook/media.py +++ b/matrix_webhook/media.py @@ -16,6 +16,32 @@ # Markdown image syntax with an http(s) URL: ``![alt](url)``. _MD_IMG_RE = re.compile(r"!\[([^\]]*)\]\((https?://[^)\s]+)\)") +# Permissive markdown image regex (any URL contents, including empty). +_MD_ANY_IMG_RE = re.compile(r"!\[([^\]]*)\]\(([^)]*)\)") + + +def strip_orphan_image_links(body): + """Drop markdown image refs whose URL is empty or not http(s). + + Templating engines on the sender side (e.g. Jellyseerr's + ``{{image}}``) can produce ``![alt]()`` for events with no + associated media, which the markdown renderer would emit as a + broken ```` tag. http(s) refs are preserved so the + upload-failure fallback path still shows the user the URL. + """ + + def _decide(match): + url = (match.group(2) or "").strip() + if url.startswith(("http://", "https://")): + return match.group(0) + return "" + + out = _MD_ANY_IMG_RE.sub(_decide, body) + if out == body: + # Nothing stripped; return as-is to preserve whitespace that + # other paths (e.g. formatter outputs) may rely on. + return body + return re.sub(r"\n{3,}", "\n\n", out).strip() async def upload_from_url(url): diff --git a/tests/test_image.py b/tests/test_image.py index b6e672e..5a9c1c9 100644 --- a/tests/test_image.py +++ b/tests/test_image.py @@ -185,3 +185,29 @@ async def test_failed_image_falls_back_to_text(self): # No m.image event sent for event in messages.chunk: self.assertNotIsInstance(event, nio.RoomMessageImage) + + async def test_orphan_empty_image_link_stripped(self): + """Empty `![alt]()` is stripped from m.text fallback so it doesn't render as a broken img.""" + body = "**Title**\n\n![poster]()\n\nDescription text." + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertIsInstance(msg, nio.RoomMessageText) + self.assertEqual(msg.body, "**Title**\n\nDescription text.") + self.assertNotIn("![poster]", msg.body) + self.assertNotIn(" Date: Wed, 29 Apr 2026 12:01:31 +0900 Subject: [PATCH 3/8] media: catch aiohttp.ClientError on fetch, fall back to m.text MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously `upload_from_url` only raised `ValueError` on HTTP-status failure; transport-level errors (DNS resolution failure, connection refused, TLS errors) propagated as `aiohttp.ClientError` subclasses straight to the request handler, crashing it with HTTP 500 and dropping the entire message — even the text-only fallback `captioned_image_or_text` was meant to provide. Wrap the fetch in a `try`/`except aiohttp.ClientError` that re-raises as `ValueError`. The existing `captioned_image_or_text` handler already catches `ValueError` and logs a warning + falls back to `m.text`, so any network-level failure now degrades the same way an HTTP 4xx/5xx already does. Adds an integration test against an unreachable hostname (`http://this-host-does-not-exist.invalid/...`) that confirms the request returns 200 with the text body intact. Co-Authored-By: Claude Opus 4.7 (1M context) --- matrix_webhook/media.py | 22 ++++++++++++++++------ tests/test_image.py | 27 +++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/matrix_webhook/media.py b/matrix_webhook/media.py index 30133dc..0169801 100644 --- a/matrix_webhook/media.py +++ b/matrix_webhook/media.py @@ -53,12 +53,22 @@ async def upload_from_url(url): msg = f"Fetching image from {url=}" LOGGER.debug(msg) - async with aiohttp.ClientSession() as session, session.get(url) as resp: - if resp.status != HTTPStatus.OK: - msg = f"Failed to fetch {url}: HTTP {resp.status}" - raise ValueError(msg) - content_type = resp.headers.get("Content-Type", "application/octet-stream") - image_bytes = await resp.read() + try: + async with aiohttp.ClientSession() as session, session.get(url) as resp: + if resp.status != HTTPStatus.OK: + msg = f"Failed to fetch {url}: HTTP {resp.status}" + raise ValueError(msg) + content_type = resp.headers.get( + "Content-Type", + "application/octet-stream", + ) + image_bytes = await resp.read() + except aiohttp.ClientError as e: + # DNS resolution, connection refused, TLS failures, etc. Convert to + # ValueError so the caller's existing ``except ValueError`` branch + # treats them like any other fetch failure (fall back to m.text). + msg = f"Failed to fetch {url}: {e!r}" + raise ValueError(msg) from e filename = PurePosixPath(urlparse(url).path).name or "image" diff --git a/tests/test_image.py b/tests/test_image.py index 5a9c1c9..eba6450 100644 --- a/tests/test_image.py +++ b/tests/test_image.py @@ -186,6 +186,33 @@ async def test_failed_image_falls_back_to_text(self): for event in messages.chunk: self.assertNotIsInstance(event, nio.RoomMessageImage) + async def test_unreachable_image_host_falls_back_to_text(self): + """An image URL whose host doesn't resolve must NOT crash the request.""" + # Port closed on a non-existent hostname — produces a DNS or + # connection error (aiohttp.ClientError, not ValueError). + bad_url = "http://this-host-does-not-exist.invalid/poster.png" + body = f"caption text\n\n![poster]({bad_url})" + client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) + + await client.login(MATRIX_PW) + room = await client.room_create() + + self.assertEqual( + httpx.post( + f"{BOT_URL}/{room.room_id}", + json={"body": body, "key": KEY}, + ).json(), + {"status": 200, "ret": "OK"}, + ) + + sync = await client.sync() + messages = await client.room_messages(room.room_id, sync.next_batch) + await client.close() + + msg = messages.chunk[0] + self.assertIsInstance(msg, nio.RoomMessageText) + self.assertIn(bad_url, msg.body) + async def test_orphan_empty_image_link_stripped(self): """Empty `![alt]()` is stripped from m.text fallback so it doesn't render as a broken img.""" body = "**Title**\n\n![poster]()\n\nDescription text." From 625cb11b979f25423661738fd9f1f5f8076decc0 Mon Sep 17 00:00:00 2001 From: eebette Date: Sat, 16 May 2026 18:07:03 +0900 Subject: [PATCH 4/8] Refactor/simplification --- README.md | 22 +++++------ matrix_webhook/handler.py | 33 +++++----------- matrix_webhook/media.py | 83 ++++++--------------------------------- tests/test_image.py | 78 +++++++++++------------------------- 4 files changed, 56 insertions(+), 160 deletions(-) diff --git a/README.md b/README.md index 7f6205d..4614f1a 100644 --- a/README.md +++ b/README.md @@ -107,25 +107,23 @@ curl -d '{"body":"new contrib from toto: [44](http://radio.localhost/map/#44)", ### Captioned images -When the body contains a markdown image link with an http(s) URL -(`![alt](https://...)`), the bot fetches the URL, uploads the bytes to -the homeserver media repo, and sends a single `m.image` event whose -`url` is the resulting `mxc://` URI. The rest of the body — with all -markdown image links stripped — becomes the caption (`body` and -`formatted_body`), and the original URL's basename is preserved in the -`filename` field so [MSC4193](https://github.com/matrix-org/matrix-spec-proposals/pull/4193)-aware +When the payload includes a non-empty `image_url` field with an http(s) +URL, the bot fetches the URL, uploads the bytes to the homeserver media +repo, and sends a single `m.image` event whose `url` is the resulting +`mxc://` URI. `body` becomes the caption (`body` and `formatted_body`), +and the URL's basename is preserved in the `filename` field so +[MSC4193](https://github.com/matrix-org/matrix-spec-proposals/pull/4193)-aware clients render the image with a caption rather than treating `body` as the file name. ``` -curl -d '{"body":"**Title**\n\n![poster](https://example.com/poster.png)\n\nDescription text.", "key":"secret"}' \ +curl -d '{"body":"**Title**\n\nDescription text.", "image_url":"https://example.com/poster.png", "key":"secret"}' \ 'http://matrixwebhook.localhost/!DPrUlnwOhBEfYwsDLh:matrix.org' ``` -Bodies with no image links continue to send as `m.text` events -unchanged. If the fetch or upload fails, the original body — including -the markdown image markup — is sent as `m.text` and a warning is -logged, so a flaky upstream CDN never drops a message. +Requests without `image_url` continue to send as `m.text` events. +If the fetch or upload fails, `body` is sent as `m.text` and a warning +is logged. ### For Github diff --git a/matrix_webhook/handler.py b/matrix_webhook/handler.py index 24eba9b..377a4bd 100644 --- a/matrix_webhook/handler.py +++ b/matrix_webhook/handler.py @@ -80,33 +80,20 @@ async def matrix_webhook(request): if data["key"] != conf.API_KEY: return utils.create_json_response(HTTPStatus.UNAUTHORIZED, "Invalid API key") - # If a `formatted_body` is supplied directly, the caller is taking full - # control of HTML rendering: pass through as `m.text` unchanged. - # Otherwise, look for markdown image links in `body`. The presence of - # any such link upgrades the event to `m.image` with the (stripped) - # body as caption, so MSC4193-aware clients render image-with-caption - # inline. The fallback for no images / failed upload is `m.text`. - if "formatted_body" in data: + body = str(data["body"]) + formatted_body = data.get("formatted_body") + image_url = data.get("image_url") + + content = None + if image_url: + content = await media.captioned_image(image_url, body, formatted_body) + if content is None: content = { "msgtype": "m.text", - "body": data["body"], + "body": body, "format": "org.matrix.custom.html", - "formatted_body": data["formatted_body"], + "formatted_body": formatted_body or markdown(body, extensions=["extra"]), } - else: - body = str(data["body"]) - content = await media.captioned_image_or_text(body) - if content is None: - # Drop empty / non-http markdown image refs so they don't render - # as broken tags. http(s) refs are preserved (so an - # upload-failure URL stays visible). - body = media.strip_orphan_image_links(body) - content = { - "msgtype": "m.text", - "body": body, - "format": "org.matrix.custom.html", - "formatted_body": markdown(body, extensions=["extra"]), - } # try to join room first -> non none response means error resp = await utils.join_room(data["room_id"]) diff --git a/matrix_webhook/media.py b/matrix_webhook/media.py index 0169801..4c2a4b6 100644 --- a/matrix_webhook/media.py +++ b/matrix_webhook/media.py @@ -1,7 +1,6 @@ """Matrix Webhook media upload helpers.""" import logging -import re from http import HTTPStatus from pathlib import PurePosixPath from urllib.parse import urlparse @@ -14,35 +13,6 @@ LOGGER = logging.getLogger("matrix_webhook.media") -# Markdown image syntax with an http(s) URL: ``![alt](url)``. -_MD_IMG_RE = re.compile(r"!\[([^\]]*)\]\((https?://[^)\s]+)\)") -# Permissive markdown image regex (any URL contents, including empty). -_MD_ANY_IMG_RE = re.compile(r"!\[([^\]]*)\]\(([^)]*)\)") - - -def strip_orphan_image_links(body): - """Drop markdown image refs whose URL is empty or not http(s). - - Templating engines on the sender side (e.g. Jellyseerr's - ``{{image}}``) can produce ``![alt]()`` for events with no - associated media, which the markdown renderer would emit as a - broken ```` tag. http(s) refs are preserved so the - upload-failure fallback path still shows the user the URL. - """ - - def _decide(match): - url = (match.group(2) or "").strip() - if url.startswith(("http://", "https://")): - return match.group(0) - return "" - - out = _MD_ANY_IMG_RE.sub(_decide, body) - if out == body: - # Nothing stripped; return as-is to preserve whitespace that - # other paths (e.g. formatter outputs) may rely on. - return body - return re.sub(r"\n{3,}", "\n\n", out).strip() - async def upload_from_url(url): """Fetch ``url`` and upload it to the homeserver media repo. @@ -64,9 +34,6 @@ async def upload_from_url(url): ) image_bytes = await resp.read() except aiohttp.ClientError as e: - # DNS resolution, connection refused, TLS failures, etc. Convert to - # ValueError so the caller's existing ``except ValueError`` branch - # treats them like any other fetch failure (fall back to m.text). msg = f"Failed to fetch {url}: {e!r}" raise ValueError(msg) from e @@ -89,53 +56,29 @@ async def upload_from_url(url): return upload_resp.content_uri, content_type, len(image_bytes), filename -async def captioned_image_or_text(body): - """Build an ``m.image`` event content from markdown image links in ``body``. +async def captioned_image(image_url, body, formatted_body=None): + """Build an ``m.image`` event content from an explicit URL + caption. - If ``body`` contains at least one ``![alt](http(s)://...)`` reference, - fetches and uploads the FIRST one to the homeserver media repo and - returns an ``m.image`` event content dict with the URL set to the - resulting ``mxc://`` URI. The caption is the rest of ``body`` with all - markdown image links stripped, rendered as both plain text (``body``) - and HTML (``formatted_body``). The ``filename`` field is set to the - original URL's basename so MSC4193-aware clients render the image with - a caption rather than treating ``body`` as the filename. + Fetches and uploads ``image_url`` to the homeserver media repo and + returns an ``m.image`` content dict with the resulting ``mxc://`` URI. + ``body`` is used as the caption (plain text) and rendered as HTML for + ``formatted_body``, unless an explicit ``formatted_body`` is supplied. - Returns ``None`` if ``body`` has no image references, or if the upload - fails. The caller is responsible for falling back to an ``m.text`` - event in either case. + Returns ``None`` if the upload fails; the caller falls back to ``m.text``. """ - matches = list(_MD_IMG_RE.finditer(body)) - if not matches: - return None - - first = matches[0] try: - mxc, mimetype, size, filename = await upload_from_url(first.group(2)) + mxc, mimetype, size, filename = await upload_from_url(image_url) except ValueError as e: msg = f"Image upload skipped, falling back to text: {e}" LOGGER.warning(msg) return None - # Strip ALL image refs from the body — the first becomes the m.image - # url, additional ones (rare) would require multiple events which we do - # not emit; leaving them in the caption as raw markdown would render as - # plain-text URLs in Element X anyway. Collapse the blank-line gap that - # the strip leaves around the image link. - caption = _MD_IMG_RE.sub("", body) - caption = re.sub(r"\n{3,}", "\n\n", caption).strip() - - content = { + return { "msgtype": "m.image", "url": mxc, "filename": filename, "info": {"mimetype": mimetype, "size": size}, - } - if caption: - content["body"] = caption - content["format"] = "org.matrix.custom.html" - content["formatted_body"] = markdown(caption, extensions=["extra"]) - else: - content["body"] = filename - - return content + "body": body, + "format": "org.matrix.custom.html", + "formatted_body": formatted_body or markdown(body, extensions=["extra"]), + } \ No newline at end of file diff --git a/tests/test_image.py b/tests/test_image.py index eba6450..adf1b15 100644 --- a/tests/test_image.py +++ b/tests/test_image.py @@ -58,7 +58,7 @@ def log_message(self, format, *args): # noqa: A002 class CaptionedImageTest(unittest.IsolatedAsyncioTestCase): - """Verify markdown image links produce captioned ``m.image`` events.""" + """Verify explicit ``image_url`` produces captioned ``m.image`` events.""" @classmethod def setUpClass(cls): @@ -75,8 +75,8 @@ def tearDownClass(cls): cls.thread.join(timeout=2) async def test_image_with_caption(self): - """Body with a markdown image and surrounding text -> m.image with caption.""" - body = f"**Title**\n\n![poster]({FIXTURE_URL})\n\nDescription text." + """Body + image_url -> m.image with body as caption.""" + body = "**Title**\n\nDescription text." client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) await client.login(MATRIX_PW) @@ -85,7 +85,7 @@ async def test_image_with_caption(self): self.assertEqual( httpx.post( f"{BOT_URL}/{room.room_id}", - json={"body": body, "key": KEY}, + json={"body": body, "image_url": FIXTURE_URL, "key": KEY}, ).json(), {"status": 200, "ret": "OK"}, ) @@ -98,18 +98,16 @@ async def test_image_with_caption(self): self.assertEqual(msg.sender, FULL_ID) self.assertIsInstance(msg, nio.RoomMessageImage) self.assertTrue(msg.url.startswith("mxc://")) - # Caption: image markdown stripped, text preserved - self.assertEqual(msg.body, "**Title**\n\nDescription text.") - # filename field disambiguates body-as-caption (MSC4193) + self.assertEqual(msg.body, body) src = msg.source["content"] self.assertEqual(src["filename"], "poster.png") self.assertEqual(src["info"]["mimetype"], "image/png") self.assertEqual(src["info"]["size"], len(PNG_BYTES)) self.assertIn("Title", src["formatted_body"]) - async def test_image_only_no_caption(self): - """Body that is just a markdown image -> m.image with body=filename.""" - body = f"![poster]({FIXTURE_URL})" + async def test_no_image_url_sends_text(self): + """No image_url -> m.text.""" + body = "Plain text, no image." client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) await client.login(MATRIX_PW) @@ -128,14 +126,11 @@ async def test_image_only_no_caption(self): await client.close() msg = messages.chunk[0] - self.assertIsInstance(msg, nio.RoomMessageImage) - # No caption -> body falls back to filename - self.assertEqual(msg.body, "poster.png") - # No formatted_body when there is no caption - self.assertNotIn("formatted_body", msg.source["content"]) + self.assertIsInstance(msg, nio.RoomMessageText) + self.assertEqual(msg.body, body) - async def test_no_image_falls_back_to_text(self): - """Body without any markdown image -> existing m.text path.""" + async def test_empty_image_url_sends_text(self): + """Empty image_url string -> m.text.""" body = "Plain text, no image." client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) @@ -145,7 +140,7 @@ async def test_no_image_falls_back_to_text(self): self.assertEqual( httpx.post( f"{BOT_URL}/{room.room_id}", - json={"body": body, "key": KEY}, + json={"body": body, "image_url": "", "key": KEY}, ).json(), {"status": 200, "ret": "OK"}, ) @@ -159,9 +154,9 @@ async def test_no_image_falls_back_to_text(self): self.assertEqual(msg.body, body) async def test_failed_image_falls_back_to_text(self): - """If the upload fails, the original body is sent unchanged as m.text.""" + """If the upload fails, body is sent as m.text without the URL.""" bad_url = f"http://localhost:{FIXTURE_PORT}/missing.png" - body = f"caption text\n\n![poster]({bad_url})" + body = "caption text" client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) await client.login(MATRIX_PW) @@ -170,7 +165,7 @@ async def test_failed_image_falls_back_to_text(self): self.assertEqual( httpx.post( f"{BOT_URL}/{room.room_id}", - json={"body": body, "key": KEY}, + json={"body": body, "image_url": bad_url, "key": KEY}, ).json(), {"status": 200, "ret": "OK"}, ) @@ -181,17 +176,15 @@ async def test_failed_image_falls_back_to_text(self): msg = messages.chunk[0] self.assertIsInstance(msg, nio.RoomMessageText) - self.assertIn(bad_url, msg.body) - # No m.image event sent + self.assertEqual(msg.body, body) + self.assertNotIn(bad_url, msg.body) for event in messages.chunk: self.assertNotIsInstance(event, nio.RoomMessageImage) async def test_unreachable_image_host_falls_back_to_text(self): - """An image URL whose host doesn't resolve must NOT crash the request.""" - # Port closed on a non-existent hostname — produces a DNS or - # connection error (aiohttp.ClientError, not ValueError). + """An image_url whose host doesn't resolve must NOT crash the request.""" bad_url = "http://this-host-does-not-exist.invalid/poster.png" - body = f"caption text\n\n![poster]({bad_url})" + body = "caption text" client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) await client.login(MATRIX_PW) @@ -200,7 +193,7 @@ async def test_unreachable_image_host_falls_back_to_text(self): self.assertEqual( httpx.post( f"{BOT_URL}/{room.room_id}", - json={"body": body, "key": KEY}, + json={"body": body, "image_url": bad_url, "key": KEY}, ).json(), {"status": 200, "ret": "OK"}, ) @@ -211,30 +204,5 @@ async def test_unreachable_image_host_falls_back_to_text(self): msg = messages.chunk[0] self.assertIsInstance(msg, nio.RoomMessageText) - self.assertIn(bad_url, msg.body) - - async def test_orphan_empty_image_link_stripped(self): - """Empty `![alt]()` is stripped from m.text fallback so it doesn't render as a broken img.""" - body = "**Title**\n\n![poster]()\n\nDescription text." - client = nio.AsyncClient(MATRIX_URL, MATRIX_ID) - - await client.login(MATRIX_PW) - room = await client.room_create() - - self.assertEqual( - httpx.post( - f"{BOT_URL}/{room.room_id}", - json={"body": body, "key": KEY}, - ).json(), - {"status": 200, "ret": "OK"}, - ) - - sync = await client.sync() - messages = await client.room_messages(room.room_id, sync.next_batch) - await client.close() - - msg = messages.chunk[0] - self.assertIsInstance(msg, nio.RoomMessageText) - self.assertEqual(msg.body, "**Title**\n\nDescription text.") - self.assertNotIn("![poster]", msg.body) - self.assertNotIn(" Date: Sat, 16 May 2026 09:10:32 +0000 Subject: [PATCH 5/8] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- matrix_webhook/media.py | 2 +- tests/test_image.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/matrix_webhook/media.py b/matrix_webhook/media.py index 4c2a4b6..f325fa6 100644 --- a/matrix_webhook/media.py +++ b/matrix_webhook/media.py @@ -81,4 +81,4 @@ async def captioned_image(image_url, body, formatted_body=None): "body": body, "format": "org.matrix.custom.html", "formatted_body": formatted_body or markdown(body, extensions=["extra"]), - } \ No newline at end of file + } diff --git a/tests/test_image.py b/tests/test_image.py index adf1b15..3166292 100644 --- a/tests/test_image.py +++ b/tests/test_image.py @@ -205,4 +205,4 @@ async def test_unreachable_image_host_falls_back_to_text(self): msg = messages.chunk[0] self.assertIsInstance(msg, nio.RoomMessageText) self.assertEqual(msg.body, body) - self.assertNotIn(bad_url, msg.body) \ No newline at end of file + self.assertNotIn(bad_url, msg.body) From 06fd4f3f9b6c2e922efcce4994f3311103289644 Mon Sep 17 00:00:00 2001 From: Eric Bette Date: Sun, 17 May 2026 00:22:42 +0900 Subject: [PATCH 6/8] Update CHANGELOG to simplify captioned image support Removed detailed description of captioned image support from CHANGELOG. --- CHANGELOG.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 62d44a7..0a47a3c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,12 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 in [#169](https://github.com/nim65s/matrix-webhook/pull/243) by [@nim65s](https://github.com/nim65s) - setup mergify -- captioned image support: markdown image links pointing to http(s) URLs in - the body (`![alt](https://...)`) cause the bot to fetch the URL, upload - to the homeserver media repo, and emit a single `m.image` event with the - remaining body as caption and a `filename` field per MSC4193. Bodies - without image links continue to send as `m.text`. Failed uploads fall - back to `m.text` with the original body unchanged, with a warning logged. +- captioned image support ## [v3.9.1] - 2024-03-09 From 176c13e5f1935222cd5ca7d177ba54b09f323230 Mon Sep 17 00:00:00 2001 From: Eric Bette Date: Sun, 17 May 2026 00:25:32 +0900 Subject: [PATCH 7/8] Revise captioned images section in README Updated the section on captioned images to clarify how to send images with captions in messages. --- README.md | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 4614f1a..4f25e9e 100644 --- a/README.md +++ b/README.md @@ -107,23 +107,10 @@ curl -d '{"body":"new contrib from toto: [44](http://radio.localhost/map/#44)", ### Captioned images -When the payload includes a non-empty `image_url` field with an http(s) -URL, the bot fetches the URL, uploads the bytes to the homeserver media -repo, and sends a single `m.image` event whose `url` is the resulting -`mxc://` URI. `body` becomes the caption (`body` and `formatted_body`), -and the URL's basename is preserved in the `filename` field so -[MSC4193](https://github.com/matrix-org/matrix-spec-proposals/pull/4193)-aware -clients render the image with a caption rather than treating `body` as -the file name. - -``` -curl -d '{"body":"**Title**\n\nDescription text.", "image_url":"https://example.com/poster.png", "key":"secret"}' \ - 'http://matrixwebhook.localhost/!DPrUlnwOhBEfYwsDLh:matrix.org' -``` +Supports sending images as messages by including an `image_url` field in the payload along with `body`. When `image_url` is detected in the payload, the message will be sent as an image type with the `body` field included as the image caption. Requests without `image_url` continue to send as `m.text` events. -If the fetch or upload fails, `body` is sent as `m.text` and a warning -is logged. +If the fetch or upload fails, `body` is sent as `m.text` and a warning is logged. ### For Github From f9a1627d2127f6c297ddabeb0c3543fd2dcd882f Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat, 16 May 2026 15:26:04 +0000 Subject: [PATCH 8/8] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4f25e9e..53f7544 100644 --- a/README.md +++ b/README.md @@ -107,7 +107,7 @@ curl -d '{"body":"new contrib from toto: [44](http://radio.localhost/map/#44)", ### Captioned images -Supports sending images as messages by including an `image_url` field in the payload along with `body`. When `image_url` is detected in the payload, the message will be sent as an image type with the `body` field included as the image caption. +Supports sending images as messages by including an `image_url` field in the payload along with `body`. When `image_url` is detected in the payload, the message will be sent as an image type with the `body` field included as the image caption. Requests without `image_url` continue to send as `m.text` events. If the fetch or upload fails, `body` is sent as `m.text` and a warning is logged.