Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 16 additions & 12 deletions verify_nzb.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,9 @@ def normalize_message_id(message_id: str) -> str:
return f"<{text.strip('<>')}>"


YENC_TRANSLATE_TABLE = bytes((i - 42) % 256 for i in range(256))


def _parse_yenc_attrs(line: bytes) -> dict[str, str]:
attrs: dict[str, str] = {}
for token in line.decode("latin-1", errors="replace").split()[1:]:
Expand All @@ -116,19 +119,20 @@ def _parse_yenc_attrs(line: bytes) -> dict[str, str]:


def _decode_yenc_lines(lines: Iterable[bytes]) -> bytes:
data = b"".join(lines)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject yEnc escapes that cross line boundaries

When a deep-check body has an encoded data line ending in =, joining all data lines first makes the decoder consume the first byte of the next physical line as the escape continuation. For example, data lines like b"=", b"x" are decoded as one escaped byte instead of reporting dangling yEnc escape; if the article's size/CRC are computed for that synthesized byte, the corrupt body is reported as ok. The previous per-line loop rejected this malformed yEnc, so the optimized path needs to preserve line-boundary escape checks while still using the faster operations.

Useful? React with πŸ‘Β / πŸ‘Ž.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid copying the full yEnc payload before decoding

For large deep-check articles, b"".join(lines) creates an additional contiguous copy of the entire encoded body before allocating the decoded bytearray. Since _read_multiline() and validate_yenc_body() already keep the article lines in memory and multiple sampled bodies can be validated concurrently, this optimization can substantially increase peak memory and cause large NZBs/deep-check runs to fail under memory pressure. Processing each line with the C-backed operations would preserve the speedup without adding a full-body copy.

Useful? React with πŸ‘Β / πŸ‘Ž.

decoded = bytearray()
for line in lines:
index = 0
while index < len(line):
byte = line[index]
if byte == 61:
index += 1
if index >= len(line):
raise ValueError("dangling yEnc escape")
byte = (line[index] - 64) % 256
decoded.append((byte - 42) % 256)
index += 1
return bytes(decoded)
start = 0
while True:
idx = data.find(b"=", start)
if idx == -1:
decoded.extend(data[start:])
break
decoded.extend(data[start:idx])
if idx + 1 >= len(data):
raise ValueError("dangling yEnc escape")
decoded.append((data[idx + 1] - 64) % 256)
start = idx + 2
return bytes(decoded.translate(YENC_TRANSLATE_TABLE))


def validate_yenc_body(lines: Iterable[bytes | str]) -> YencValidationResult:
Expand Down