Skip to content

2.1#44

Open
fyaz05 wants to merge 1 commit intomainfrom
beta
Open

2.1#44
fyaz05 wants to merge 1 commit intomainfrom
beta

Conversation

@fyaz05
Copy link
Owner

@fyaz05 fyaz05 commented Mar 19, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added canonical media file support with shorter, deduplicated links accessible via /f/{hash}/{name} and /watch/f/{hash}/{name} endpoints.
    • Enhanced media streaming with fallback support for improved reliability.
  • Bug Fixes

    • Improved media file handling and delivery across multiple sources.
  • Chores

    • Version bumped to 2.1.0.
    • Database index initialization optimizations added.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

The PR implements canonical media file handling with deduplication and caching to optimize file serving. It introduces a new canonical file utility module, extends the database layer with file record management, adds REST endpoints for canonical media delivery, and integrates canonical-aware link generation throughout the streaming pipeline.

Changes

Cohort / File(s) Summary
Version & Initialization
Thunder/__init__.py, Thunder/__main__.py
Version bumped to 2.1.0; __main__.py adds database index initialization via await db.ensure_indexes() in startup sequence before command setup.
Canonical File Utilities
Thunder/utils/canonical_files.py
New module implementing file deduplication with LRU-style caching, TTL pruning, and per-file ingestion locks. Provides canonical file retrieval/creation, validity checking via Telegram message verification, record merging, and background touch-update scheduling.
Link Generation & Template Rendering
Thunder/utils/bot_utils.py, Thunder/utils/render_template.py
Added quote_media_name() and gen_canonical_links() helpers; refactored gen_links() to use centralized _build_links() helper with URL quoting tightening; added render_media_page() for rendering canonical and stream-based media previews.
Stream Processing & Bot Plugin
Thunder/bot/plugins/stream.py
Updated send_channel_links() signature to support optional target_msg and reply_to_message_id with alternate HTTP delivery path; integrated canonical file handling into channel receive and user processing flows via get_or_create_canonical_file() with fallback to original link generation.
Server Routes & Streaming
Thunder/server/stream_routes.py, Thunder/utils/custom_dl.py
Added canonical media endpoints (/f/{secure_hash}/{name} and /watch/f/{secure_hash}/{name}); refactored media delivery via shared _serve_media_response() helper; updated ByteStreamer.stream_file() to support flexible media references and fallback streaming with optional hooks.
Database Layer
Thunder/utils/database.py
Extended with new files_col collection; added indexes on file_unique_id, public_hash, canonical_message_id, created_at, last_seen_at; added async methods for file record CRUD, touch-updates, and file\_id caching.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant StreamBot as StreamBot<br/>(Plugin)
    participant CanonicalMod as Canonical Files<br/>(Dedup Logic)
    participant DB as Database
    participant TelegramAPI as Telegram API
    participant HTTPServer as HTTP Server
    participant Streamer as ByteStreamer

    User->>StreamBot: Send media or channel message
    StreamBot->>CanonicalMod: get_or_create_canonical_file(source_msg)
    CanonicalMod->>CanonicalMod: Acquire ingest lock
    CanonicalMod->>DB: Check for existing record by unique_id
    alt Record exists & valid
        CanonicalMod->>CanonicalMod: Touch existing record
        CanonicalMod->>DB: Update touch metadata
    else Record missing or invalid
        CanonicalMod->>StreamBot: copy_media(source_msg)
        StreamBot->>TelegramAPI: Forward/copy to BIN_CHANNEL
        CanonicalMod->>DB: Create new file record
    end
    CanonicalMod->>CanonicalMod: Release ingest lock
    CanonicalMod-->>StreamBot: Return record & stored message
    StreamBot->>StreamBot: Generate canonical links with public_hash
    StreamBot->>HTTPServer: Send links to user/channel
    
    User->>HTTPServer: GET /f/{public_hash}/{name}
    HTTPServer->>DB: get_file_by_hash(public_hash)
    HTTPServer->>TelegramAPI: get_messages(canonical_message_id)
    alt Message valid
        HTTPServer->>Streamer: stream_file(message)
        Streamer->>TelegramAPI: Download chunks
        Streamer-->>HTTPServer: Byte stream
        HTTPServer-->>User: Media bytes
    else Fallback needed
        HTTPServer->>Streamer: stream_file(media_ref, fallback=backup_id)
        Streamer->>TelegramAPI: Try primary, then fallback
        Streamer-->>HTTPServer: Bytes from fallback
        HTTPServer-->>User: Media bytes
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PRs #12 & #31: Both modify core streaming and link-generation functions (stream.py, bot_utils.py, stream_routes.py, database.py) with overlapping signature changes to send_channel_links and gen_links.
  • PR #11: Both refactor streaming plumbing in stream_routes.py media delivery and ByteStreamer.stream_file() method signatures.
  • PR #16: Both modify bot startup in Thunder/__main__.py around command initialization and now-related database index setup.

Poem

🐰✨ Hoppy files now hop just once,
Cached in databases, no more bunts!
Public hashes bloom like clover fair,
Dedup magic floats through the air!
Link generation's now quite divine,
Thunder 2.1.0 — how it does shine! 🌟

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '2.1' is vague and non-descriptive, using only a version number without conveying what actually changed in the changeset. Replace the title with a clear, descriptive summary of the main changes, such as 'Add canonical file deduplication system with media hashing' or 'Implement canonical file handling and new server routes'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch beta
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
Thunder/utils/database.py (2)

254-266: Inconsistent error handling between similar getter methods.

get_file_by_unique_id returns None on exception (line 259), while get_file_by_hash re-raises the exception (line 266). This inconsistency can lead to unexpected behavior for callers expecting uniform behavior. Consider aligning the error handling strategy across all file lookup methods.

♻️ Proposed fix to make error handling consistent

Either return None consistently (if callers should handle missing records gracefully):

     async def get_file_by_hash(self, public_hash: str) -> Optional[Dict[str, Any]]:
         try:
             return await self.files_col.find_one({"public_hash": public_hash})
         except Exception as e:
             logger.error(f"Error getting file by hash {public_hash}: {e}", exc_info=True)
-            raise
+            return None

Or raise consistently (if failures should propagate):

     async def get_file_by_unique_id(self, file_unique_id: str) -> Optional[Dict[str, Any]]:
         try:
             return await self.files_col.find_one({"file_unique_id": file_unique_id})
         except Exception as e:
             logger.error(f"Error getting file by unique_id {file_unique_id}: {e}", exc_info=True)
-            return None
+            raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/database.py` around lines 254 - 266, Both getters
(get_file_by_unique_id and get_file_by_hash) currently handle exceptions
differently; make them consistent by modifying get_file_by_hash to mirror
get_file_by_unique_id: catch exceptions, log the error with exc_info via
logger.error including the public_hash value, and return None instead of
re-raising; also scan other file lookup methods using files_col to ensure they
follow the same pattern if the project convention is to return None on failure.

302-327: Silent error swallowing may hide database issues.

Both touch_file_record and update_file_id log errors but do not propagate them. While this might be intentional for non-critical "fire-and-forget" operations, it can hide persistent database problems. Consider whether these operations should at least have an option to raise or return a success indicator.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/database.py` around lines 302 - 327, The current
touch_file_record and update_file_id methods swallow exceptions after logging,
which can hide persistent DB failures; modify these functions (touch_file_record
and update_file_id) to return a success indicator (bool) or accept an optional
parameter like raise_on_error: bool = False that, when true, re-raises the
caught exception after logging; ensure you update the calls that rely on
files_col.update_one to handle the returned bool or allow exceptions to
propagate, and keep logger.error(..., exc_info=True) but do not silently
suppress errors when raise_on_error is requested.
Thunder/bot/plugins/stream.py (1)

389-403: Consider clarifying fallback behavior when stored_msg is None.

When canonical_record exists but stored_msg is None, send_channel_links is called with target_msg=None. This works correctly because send_channel_links uses StreamBot.send_message with reply_to_message_id in that case. However, a brief comment would help future readers understand this intentional design.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/bot/plugins/stream.py` around lines 389 - 403, Add a brief inline
comment explaining the intentional fallback when canonical_record exists but
stored_msg is None: that send_channel_links is deliberately called with
target_msg=None so StreamBot.send_message will fall back to using
reply_to_message_id for threading. Update the call site around the
canonical_record / stored_msg logic (referencing variables canonical_record and
stored_msg and the function send_channel_links) to include this explanatory
comment so future readers understand why target_msg is passed as None instead of
skipping or altering the call.
Thunder/utils/render_template.py (1)

42-42: Function parameter id shadows Python builtin.

Consider renaming to message_id for clarity and to avoid shadowing the builtin id() function.

♻️ Proposed fix
-async def render_page(id: int, secure_hash: str, requested_action: str | None = None) -> str:
+async def render_page(message_id: int, secure_hash: str, requested_action: str | None = None) -> str:
     try:
         try:
-            message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=id)
+            message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=message_id)

(and update the error message at line 63 accordingly)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/render_template.py` at line 42, Rename the parameter id in the
async function render_page to message_id to avoid shadowing the builtin id();
update its type hint and every reference inside render_page (including uses in
f-strings, comparisons, and the error/ValueError message currently referencing
id) to message_id, and adjust any callers to pass message_id (or update call
sites to use the new name) so behavior remains identical; also update the error
message text that mentioned id to use message_id for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Thunder/utils/canonical_files.py`:
- Around line 293-297: Don't treat any exception from
_is_canonical_record_valid(existing, file_unique_id) as a reuse; instead, only
call schedule_touch_file_record and return the existing record when validation
explicitly succeeds. Change the except branch to either re-raise the exception
(to surface permanent fetch/config errors) or return a non-reuse result (e.g.,
existing, error, False) so stale canonical mappings are not kept and
reuse_count/seen_count aren't inflated; ensure the only place that
touches/reuses the record is the success path where is_valid is True and retain
references to _is_canonical_record_valid, schedule_touch_file_record, existing,
and file_unique_id to locate the code.
- Around line 303-332: The code currently calls copy_media() before reserving
canonical ownership, which can leave untracked uploads if another worker wins
the unique-key race; change the flow in the function that uses copy_media,
create_file_record, replace_file_record and _merge_replacement_record so that
the unique canonical row is reserved first (e.g., insert a tentative record or
acquire a DB lock/transaction on file_unique_id), then call copy_media() to
upload and update that reserved record with stored_message and final metadata;
on DuplicateKeyError or if get_file_by_unique_id shows another owner, ensure you
schedule_touch_file_record(existing, reused=True) and cleanup any uploaded
message in Var.BIN_CHANNEL (so the caller can remove the stray stored_message),
and only call _remember(record) after the reservation+upload+DB update succeeds.
- Around line 258-279: The file_ingest_lock context manager increments
_upload_lock_counts before awaiting lock.acquire(), which lets a CancelledError
leak counts and lock entries; fix by adding a cancellation-safe path: after
incrementing but before yielding, wrap await lock.acquire() in a try/except that
catches asyncio.CancelledError (and other exceptions if desired), and inside
that except acquire the _upload_locks_guard and decrement/remove the
_upload_lock_counts and _upload_locks for the file_unique_id as done in the
finally cleanup, then re-raise the cancellation; keep the existing finally block
unchanged to handle normal releases. Ensure you reference the same symbols
(_upload_locks, _upload_lock_counts, _upload_locks_guard, file_ingest_lock,
lock.acquire) so the cleanup logic mirrors the existing removal behavior.

In `@Thunder/utils/custom_dl.py`:
- Around line 52-54: The current comparison can wrongly treat a string/file_id
or Message as unequal to an int fallback, so change the logic around refs,
media_ref and fallback_message_id to compare IDs explicitly: if
fallback_message_id is not None, derive media_id = media_ref if
isinstance(media_ref, int) else (media_ref.message_id if isinstance(media_ref,
Message) else None) and only append fallback_message_id to refs when media_id is
None (no comparable id) or fallback_message_id != media_id; use the symbols
refs, media_ref, fallback_message_id and the Message.type check to implement
this explicit type-aware comparison.

In `@Thunder/utils/render_template.py`:
- Around line 16-22: The Jinja2 Environment is created without autoescaping
which risks XSS; update the Environment construction for template_env (the
Environment call using FileSystemLoader) to enable autoescape (either
autoescape=True or use jinja2.select_autoescape for HTML templates) so templates
rendered by render_media_page (which currently manually escapes file_name) are
automatically escaped by default; ensure the change targets the template_env
setup so existing render calls continue to work and adjust any templates that
intentionally output raw HTML to use |safe.

---

Nitpick comments:
In `@Thunder/bot/plugins/stream.py`:
- Around line 389-403: Add a brief inline comment explaining the intentional
fallback when canonical_record exists but stored_msg is None: that
send_channel_links is deliberately called with target_msg=None so
StreamBot.send_message will fall back to using reply_to_message_id for
threading. Update the call site around the canonical_record / stored_msg logic
(referencing variables canonical_record and stored_msg and the function
send_channel_links) to include this explanatory comment so future readers
understand why target_msg is passed as None instead of skipping or altering the
call.

In `@Thunder/utils/database.py`:
- Around line 254-266: Both getters (get_file_by_unique_id and get_file_by_hash)
currently handle exceptions differently; make them consistent by modifying
get_file_by_hash to mirror get_file_by_unique_id: catch exceptions, log the
error with exc_info via logger.error including the public_hash value, and return
None instead of re-raising; also scan other file lookup methods using files_col
to ensure they follow the same pattern if the project convention is to return
None on failure.
- Around line 302-327: The current touch_file_record and update_file_id methods
swallow exceptions after logging, which can hide persistent DB failures; modify
these functions (touch_file_record and update_file_id) to return a success
indicator (bool) or accept an optional parameter like raise_on_error: bool =
False that, when true, re-raises the caught exception after logging; ensure you
update the calls that rely on files_col.update_one to handle the returned bool
or allow exceptions to propagate, and keep logger.error(..., exc_info=True) but
do not silently suppress errors when raise_on_error is requested.

In `@Thunder/utils/render_template.py`:
- Line 42: Rename the parameter id in the async function render_page to
message_id to avoid shadowing the builtin id(); update its type hint and every
reference inside render_page (including uses in f-strings, comparisons, and the
error/ValueError message currently referencing id) to message_id, and adjust any
callers to pass message_id (or update call sites to use the new name) so
behavior remains identical; also update the error message text that mentioned id
to use message_id for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4d662ab-86e0-4b4c-b713-4cb3b500961c

📥 Commits

Reviewing files that changed from the base of the PR and between 0dce97e and 2fe1440.

📒 Files selected for processing (9)
  • Thunder/__init__.py
  • Thunder/__main__.py
  • Thunder/bot/plugins/stream.py
  • Thunder/server/stream_routes.py
  • Thunder/utils/bot_utils.py
  • Thunder/utils/canonical_files.py
  • Thunder/utils/custom_dl.py
  • Thunder/utils/database.py
  • Thunder/utils/render_template.py

Comment on lines +258 to +279
@asynccontextmanager
async def file_ingest_lock(file_unique_id: str):
async with _upload_locks_guard:
lock = _upload_locks.get(file_unique_id)
if lock is None:
lock = asyncio.Lock()
_upload_locks[file_unique_id] = lock
_upload_lock_counts[file_unique_id] = 0
_upload_lock_counts[file_unique_id] += 1

await lock.acquire()
try:
yield
finally:
lock.release()
async with _upload_locks_guard:
remaining = _upload_lock_counts.get(file_unique_id, 1) - 1
if remaining <= 0:
_upload_lock_counts.pop(file_unique_id, None)
_upload_locks.pop(file_unique_id, None)
else:
_upload_lock_counts[file_unique_id] = remaining
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cancellation can leak per-file lock entries.

_upload_lock_counts is incremented before await lock.acquire(), but the cleanup finally only starts after that await. If a waiter is cancelled while blocked on the lock, the count is never decremented and that file_unique_id stays pinned in _upload_locks / _upload_lock_counts.

🔧 Minimal fix
 `@asynccontextmanager`
 async def file_ingest_lock(file_unique_id: str):
     async with _upload_locks_guard:
         lock = _upload_locks.get(file_unique_id)
         if lock is None:
             lock = asyncio.Lock()
             _upload_locks[file_unique_id] = lock
             _upload_lock_counts[file_unique_id] = 0
         _upload_lock_counts[file_unique_id] += 1

-    await lock.acquire()
+    acquired = False
     try:
+        await lock.acquire()
+        acquired = True
         yield
     finally:
-        lock.release()
+        if acquired:
+            lock.release()
         async with _upload_locks_guard:
             remaining = _upload_lock_counts.get(file_unique_id, 1) - 1
             if remaining <= 0:
                 _upload_lock_counts.pop(file_unique_id, None)
                 _upload_locks.pop(file_unique_id, None)
             else:
                 _upload_lock_counts[file_unique_id] = remaining
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@asynccontextmanager
async def file_ingest_lock(file_unique_id: str):
async with _upload_locks_guard:
lock = _upload_locks.get(file_unique_id)
if lock is None:
lock = asyncio.Lock()
_upload_locks[file_unique_id] = lock
_upload_lock_counts[file_unique_id] = 0
_upload_lock_counts[file_unique_id] += 1
await lock.acquire()
try:
yield
finally:
lock.release()
async with _upload_locks_guard:
remaining = _upload_lock_counts.get(file_unique_id, 1) - 1
if remaining <= 0:
_upload_lock_counts.pop(file_unique_id, None)
_upload_locks.pop(file_unique_id, None)
else:
_upload_lock_counts[file_unique_id] = remaining
`@asynccontextmanager`
async def file_ingest_lock(file_unique_id: str):
async with _upload_locks_guard:
lock = _upload_locks.get(file_unique_id)
if lock is None:
lock = asyncio.Lock()
_upload_locks[file_unique_id] = lock
_upload_lock_counts[file_unique_id] = 0
_upload_lock_counts[file_unique_id] += 1
acquired = False
try:
await lock.acquire()
acquired = True
yield
finally:
if acquired:
lock.release()
async with _upload_locks_guard:
remaining = _upload_lock_counts.get(file_unique_id, 1) - 1
if remaining <= 0:
_upload_lock_counts.pop(file_unique_id, None)
_upload_locks.pop(file_unique_id, None)
else:
_upload_lock_counts[file_unique_id] = remaining
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/canonical_files.py` around lines 258 - 279, The
file_ingest_lock context manager increments _upload_lock_counts before awaiting
lock.acquire(), which lets a CancelledError leak counts and lock entries; fix by
adding a cancellation-safe path: after incrementing but before yielding, wrap
await lock.acquire() in a try/except that catches asyncio.CancelledError (and
other exceptions if desired), and inside that except acquire the
_upload_locks_guard and decrement/remove the _upload_lock_counts and
_upload_locks for the file_unique_id as done in the finally cleanup, then
re-raise the cancellation; keep the existing finally block unchanged to handle
normal releases. Ensure you reference the same symbols (_upload_locks,
_upload_lock_counts, _upload_locks_guard, file_ingest_lock, lock.acquire) so the
cleanup logic mirrors the existing removal behavior.

Comment on lines +293 to +297
try:
is_valid = await _is_canonical_record_valid(existing, file_unique_id)
except Exception:
schedule_touch_file_record(existing, reused=True)
return existing, None, True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Failed validation should not be counted as reuse.

This branch turns any exception from _is_canonical_record_valid() into a successful reuse by touching the record and returning it. A permanent fetch/config error will keep a stale canonical mapping alive and inflate seen_count / reuse_count instead of forcing replacement or surfacing the failure.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 295-295: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/canonical_files.py` around lines 293 - 297, Don't treat any
exception from _is_canonical_record_valid(existing, file_unique_id) as a reuse;
instead, only call schedule_touch_file_record and return the existing record
when validation explicitly succeeds. Change the except branch to either re-raise
the exception (to surface permanent fetch/config errors) or return a non-reuse
result (e.g., existing, error, False) so stale canonical mappings are not kept
and reuse_count/seen_count aren't inflated; ensure the only place that
touches/reuses the record is the success path where is_valid is True and retain
references to _is_canonical_record_valid, schedule_touch_file_record, existing,
and file_unique_id to locate the code.

Comment on lines +303 to +332
stored_message = await copy_media(source_message)
if not stored_message:
return None, None, False

record = build_file_record(
stored_message,
source_chat_id=source_message.chat.id if source_message.chat else None,
source_message_id=source_message.id
)
if not record:
return None, stored_message, False

try:
if existing:
record = _merge_replacement_record(existing, record)
await db.replace_file_record(record)
else:
await db.create_file_record(record)
_remember(record)
return record, stored_message, False
except DuplicateKeyError:
existing = await get_file_by_unique_id(file_unique_id)
if existing:
schedule_touch_file_record(existing, reused=True)
return existing, None, True
raise
except FloodWait:
raise
except Exception as e:
logger.error(f"Error creating canonical file for {file_unique_id}: {e}", exc_info=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reserve canonical ownership before copy_media() writes to Telegram.

copy_media() is the irreversible side effect here, but it runs before create_file_record() / replace_file_record() decide who owns the canonical row. If another worker wins the unique-key race, or the DB write fails after the upload, this function leaves an untracked duplicate message in Var.BIN_CHANNEL; the DuplicateKeyError branch even drops stored_message, so the caller cannot clean it up. That breaks the dedupe invariant under multi-worker load.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 331-331: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/canonical_files.py` around lines 303 - 332, The code currently
calls copy_media() before reserving canonical ownership, which can leave
untracked uploads if another worker wins the unique-key race; change the flow in
the function that uses copy_media, create_file_record, replace_file_record and
_merge_replacement_record so that the unique canonical row is reserved first
(e.g., insert a tentative record or acquire a DB lock/transaction on
file_unique_id), then call copy_media() to upload and update that reserved
record with stored_message and final metadata; on DuplicateKeyError or if
get_file_by_unique_id shows another owner, ensure you
schedule_touch_file_record(existing, reused=True) and cleanup any uploaded
message in Var.BIN_CHANNEL (so the caller can remove the stray stored_message),
and only call _remember(record) after the reservation+upload+DB update succeeds.

Comment on lines +52 to +54
refs: list[int | str | Message] = [media_ref]
if fallback_message_id is not None and fallback_message_id != media_ref:
refs.append(fallback_message_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type mismatch in fallback comparison may cause unexpected behavior.

The comparison fallback_message_id != media_ref compares int | None with int | str | Message. When media_ref is a string (file_id) or a Message object, this comparison will always be True, potentially adding a redundant fallback. Consider explicit type handling.

♻️ Proposed fix
         refs: list[int | str | Message] = [media_ref]
-        if fallback_message_id is not None and fallback_message_id != media_ref:
+        if fallback_message_id is not None:
+            primary_id = media_ref if isinstance(media_ref, int) else (
+                media_ref.id if isinstance(media_ref, Message) else None
+            )
+            if fallback_message_id != primary_id:
+                refs.append(fallback_message_id)
-            refs.append(fallback_message_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
refs: list[int | str | Message] = [media_ref]
if fallback_message_id is not None and fallback_message_id != media_ref:
refs.append(fallback_message_id)
refs: list[int | str | Message] = [media_ref]
if fallback_message_id is not None:
primary_id = media_ref if isinstance(media_ref, int) else (
media_ref.id if isinstance(media_ref, Message) else None
)
if fallback_message_id != primary_id:
refs.append(fallback_message_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/custom_dl.py` around lines 52 - 54, The current comparison can
wrongly treat a string/file_id or Message as unequal to an int fallback, so
change the logic around refs, media_ref and fallback_message_id to compare IDs
explicitly: if fallback_message_id is not None, derive media_id = media_ref if
isinstance(media_ref, int) else (media_ref.message_id if isinstance(media_ref,
Message) else None) and only append fallback_message_id to refs when media_id is
None (no comparable id) or fallback_message_id != media_id; use the symbols
refs, media_ref, fallback_message_id and the Message.type check to implement
this explicit type-aware comparison.

Comment on lines +16 to 22
template_env = Environment(
loader=FileSystemLoader('Thunder/template'),
enable_async=True,
cache_size=200,
auto_reload=False,
optimized=True
optimized=True
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Enable Jinja2 autoescape to mitigate XSS vulnerabilities.

The Jinja2 environment is created without autoescape=True, which is flagged as a potential XSS vulnerability (S701). While html_module.escape() is used for file_name in render_media_page, this is manual and error-prone. Consider enabling autoescape for defense-in-depth.

🛡️ Proposed fix
+from jinja2 import select_autoescape
+
 template_env = Environment(
     loader=FileSystemLoader('Thunder/template'),
     enable_async=True,
     cache_size=200,
     auto_reload=False,
-    optimized=True
+    optimized=True,
+    autoescape=select_autoescape(['html', 'htm', 'xml'])
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
template_env = Environment(
loader=FileSystemLoader('Thunder/template'),
enable_async=True,
cache_size=200,
auto_reload=False,
optimized=True
optimized=True
)
from jinja2 import select_autoescape
template_env = Environment(
loader=FileSystemLoader('Thunder/template'),
enable_async=True,
cache_size=200,
auto_reload=False,
optimized=True,
autoescape=select_autoescape(['html', 'htm', 'xml'])
)
🧰 Tools
🪛 Ruff (0.15.6)

[error] 16-16: By default, jinja2 sets autoescape to False. Consider using autoescape=True or the select_autoescape function to mitigate XSS vulnerabilities.

(S701)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/render_template.py` around lines 16 - 22, The Jinja2
Environment is created without autoescaping which risks XSS; update the
Environment construction for template_env (the Environment call using
FileSystemLoader) to enable autoescape (either autoescape=True or use
jinja2.select_autoescape for HTML templates) so templates rendered by
render_media_page (which currently manually escapes file_name) are automatically
escaped by default; ensure the change targets the template_env setup so existing
render calls continue to work and adjust any templates that intentionally output
raw HTML to use |safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant