Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2026-05-27 - ElementTree.iterparse Memory Leaks and Overhead
**Learning:** `ElementTree.iterparse` leaks memory if only parsing the `end` events and calling `.clear()` on child elements, as the `root` element retains references to its children until the document finishes parsing. Also, parsing the local XML namespace name using a custom method (like `.rsplit()`) adds up linearly with the size of the XML.
**Action:** For parsing large XML files using `iterparse`, always fetch the `start` and `end` events, grab the `root` element on the first `start` event, and call both `elem.clear()` and `root.clear()` on the `end` events. Use simple string checks like `tag.endswith("}segment") or tag == "segment"` to save string parsing overhead.
24 changes: 11 additions & 13 deletions verify_nzb.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,23 +80,21 @@ class MissingArticleError(NntpError):
"""The NNTP server does not have the requested article."""


def _local_name(tag: str) -> str:
if "}" in tag:
return tag.rsplit("}", 1)[-1]
return tag


def parse_nzb_message_ids(path: str | Path) -> Iterator[str]:
"""Yield message IDs from <segment> elements in an NZB file."""

with open(path, "rb") as handle:
for event, elem in ET.iterparse(handle, events=("end",)):
if _local_name(elem.tag) != "segment":
continue
text = (elem.text or "").strip()
if text:
yield text
elem.clear()
context = ET.iterparse(handle, events=("start", "end"))
_, root = next(context)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle empty or malformed XML files.

If the XML file is empty or contains no events, next(context) will raise StopIteration, causing an uncaught exception. Consider adding error handling or documenting the assumption that input XML is well-formed.

🛡️ Proposed fix to handle empty files gracefully
-    context = ET.iterparse(handle, events=("start", "end"))
-    _, root = next(context)
+    context = ET.iterparse(handle, events=("start", "end"))
+    try:
+        _, root = next(context)
+    except StopIteration:
+        return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@verify_nzb.py` at line 88, The code calls "_, root = next(context)" which
will raise StopIteration for empty/malformed XML; wrap the next(context) call in
a try/except StopIteration block (or check with a safe iterator pattern) and
handle the case by logging an error or raising a clear exception (e.g.,
ValueError("Empty or malformed XML")) and exiting gracefully; also validate that
root is not None and contains expected elements before proceeding so downstream
code using root won't fail unexpectedly.

for event, elem in context:
if event == "end":
tag = elem.tag
if tag.endswith("}segment") or tag == "segment":
text = (elem.text or "").strip()
if text:
yield text
elem.clear()
root.clear()


def normalize_message_id(message_id: str) -> str:
Expand Down