Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 2024-05-29 - yEnc decoding optimization with bytes.find() and bytes.translate()
**Learning:** Using `bytes.split(b'=')` for yEnc decoding fails when lines contain consecutive escapes (e.g., `==`) due to how splitting creates empty strings and fails to track indices correctly. To optimize Python byte string processing effectively while handling escapes correctly, use `bytes.translate()` for bulk decoding and `bytes.find()` to locate and apply escapes manually.
**Action:** Prioritize `bytes.translate` and `bytes.find` over `.split` when translating strings with multi-character escape sequences, ensuring correctness on edge cases like consecutive escapes.
38 changes: 28 additions & 10 deletions verify_nzb.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,19 +115,37 @@ def _parse_yenc_attrs(line: bytes) -> dict[str, str]:
return attrs


# Translation tables for yEnc decoding to avoid slow byte-by-byte iteration in Python.
# Normal bytes are shifted by 42. Escaped bytes are shifted by 42+64=106.
_YENC_TRANS = bytes((i - 42) % 256 for i in range(256))
_YENC_ESCAPE_TRANS = bytes((i - 106) % 256 for i in range(256))

def _decode_yenc_lines(lines: Iterable[bytes]) -> bytes:
"""
Decodes yEnc-encoded lines into bytes.
Optimized to use C-backed bytes.translate() and bytes.find() instead of manual iteration.
"""
decoded = bytearray()
for line in lines:
index = 0
while index < len(line):
byte = line[index]
if byte == 61:
index += 1
if index >= len(line):
raise ValueError("dangling yEnc escape")
byte = (line[index] - 64) % 256
decoded.append((byte - 42) % 256)
index += 1
# Fast path: no escapes in the line
if b"=" not in line:
decoded.extend(line.translate(_YENC_TRANS))
continue

# Slow path: apply escapes manually using string find
idx = 0
length = len(line)
while idx < length:
next_eq = line.find(b"=", idx)
if next_eq == -1:
decoded.extend(line[idx:].translate(_YENC_TRANS))
break
decoded.extend(line[idx:next_eq].translate(_YENC_TRANS))
idx = next_eq + 1
if idx >= length:
raise ValueError("dangling yEnc escape")
decoded.append(_YENC_ESCAPE_TRANS[line[idx]])
idx += 1
Comment on lines +135 to +148
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical | ⚑ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify yEnc decoding correctness and performance

# Run unit tests to confirm correctness
python3 -m unittest discover tests

# Note: Manual timing benchmarks and --deep-check integration tests
# should also be run as suggested in the PR description to confirm
# the ~10-15x performance improvement and correctness on real NZB data.

Repository: Appz4Fun/cyclops

Length of output: 9453


Do not approve yet: unit tests regress after the yEnc slow-path escape handling change.
The logic in verify_nzb.py lines 135-148 is internally plausible (handles consecutive == and raises ValueError("dangling yEnc escape") for trailing =), but python3 -m unittest discover tests fails with 8 failing tests (multiple summary.present assertions across async verify scenarios). Investigate how this change affects end-to-end verification (especially decoding/segment content), add targeted tests for consecutive escapes and trailing =, and rerun the full suite (plus any deep-check/integration coverage).

πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@verify_nzb.py` around lines 135 - 148, The yEnc slow-path in the loop for
handling escapes is causing end-to-end decode regressions; add targeted unit
tests for consecutive "==" and a trailing "=" case, then modify the slow-path in
the decode loop (the while loop handling next_eq and using _YENC_ESCAPE_TRANS)
so it exactly mirrors the fast-path escape semantics by delegating the escape
decode to the same helper or by explicitly looking up the integer byte value
from the bytes object and appending the mapped byte from _YENC_ESCAPE_TRANS (and
still raising ValueError("dangling yEnc escape") when no next byte); run the
full test suite and integration/deep-checks to confirm fixes.

return bytes(decoded)


Expand Down