Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,7 @@
## 2026-01-27 - Redundant Validation for Cached Data
**Learning:** Re-validating resource properties (like DNS/IP) when using *cached content* is pure overhead. If the content is served from memory (proven safe at fetch time), checking the *current* state of the source is disconnected from the data being used.
**Action:** When using a multi-stage pipeline (Warmup -> Process), ensure validation state persists alongside the data cache. Avoid clearing validation caches between stages if the data cache is not also cleared.

## 2024-05-22 - Ordered Deduplication Optimization
**Learning:** `dict.fromkeys(list)` is significantly faster (~2x) than a Python loop with `seen = set()` for deduplicating large lists while preserving order. It also naturally deduplicates invalid items if validation happens after, which prevents log spam.
**Action:** Use `dict.fromkeys()` for ordered deduplication of large inputs instead of manual loop with `seen` set.
13 changes: 7 additions & 6 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -1074,23 +1074,24 @@

original_count = len(hostnames)

# Optimization 1: Deduplicate input list while preserving order
# Optimization 2: Check directly against existing_rules to avoid O(N) copy.
seen = set()
# Optimization 1: Deduplicate input list while preserving order using dict.fromkeys()
# This is significantly faster than using a 'seen' set in the loop for large lists.
# It also naturally deduplicates invalid rules, preventing log spam.
unique_hostnames = dict.fromkeys(hostnames)

filtered_hostnames = []
skipped_unsafe = 0

for h in hostnames:
for h in unique_hostnames:

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Variable name "h" doesn't conform to snake_case naming style Warning

Variable name "h" doesn't conform to snake_case naming style

Check warning

Code scanning / Pylint (reported by Codacy)

Variable name "h" doesn't conform to snake_case naming style Warning

Variable name "h" doesn't conform to snake_case naming style
if not is_valid_rule(h):
log.warning(
f"Skipping unsafe rule in {sanitize_for_log(folder_name)}: {sanitize_for_log(h)}"
)
skipped_unsafe += 1
continue

if h not in existing_rules and h not in seen:
if h not in existing_rules:
filtered_hostnames.append(h)
Comment on lines +1085 to 1094

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability, you can combine the two if statements in this loop into a single if/elif structure. This removes the need for continue and slightly flattens the logic, making the conditions for filtering hostnames clearer.

Suggested change
for h in unique_hostnames:
if not is_valid_rule(h):
log.warning(
f"Skipping unsafe rule in {sanitize_for_log(folder_name)}: {sanitize_for_log(h)}"
)
skipped_unsafe += 1
continue
if h not in existing_rules and h not in seen:
if h not in existing_rules:
filtered_hostnames.append(h)
for h in unique_hostnames:
if not is_valid_rule(h):
log.warning(
f"Skipping unsafe rule in {sanitize_for_log(folder_name)}: {sanitize_for_log(h)}"
)
skipped_unsafe += 1
elif h not in existing_rules:
filtered_hostnames.append(h)

seen.add(h)

if skipped_unsafe > 0:
log.warning(
Expand Down
Loading