[Snyk] Security upgrade urllib3 from 1.24.3 to 1.26.19#45
Open
matholiveira91 wants to merge 1 commit intomasterfrom
Open
[Snyk] Security upgrade urllib3 from 1.24.3 to 1.26.19#45matholiveira91 wants to merge 1 commit intomasterfrom
matholiveira91 wants to merge 1 commit intomasterfrom
Conversation
The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-URLLIB3-7267250
matholiveira91
pushed a commit
that referenced
this pull request
Mar 4, 2026
…rovements Rewrites the three main Python modules to address performance bottlenecks identified in the scraping, text processing and headless browser pipelines. GoMutation is preserved unchanged pending evaluation. modules/scraper.py - Replace sequential requests with asyncio + aiohttp parallel pipeline - Add configurable concurrency semaphore (default: 10 simultaneous requests) - Switch HTML parser from html.parser to lxml (up to 10x faster parsing) - Add SHA-256 disk cache per URL to skip redundant fetches on re-runs - Add automatic retry with exponential backoff (3 attempts per URL) - Use set() for immediate deduplication during word collection - Expose synchronous scrape() entry point for backward compatibility modules/aggressive.py - Replace per-URL browser instantiation with a single reusable Playwright instance - Add async tab pool with configurable concurrency (default: 4 simultaneous tabs) - Add JS-detection heuristic to delegate non-JS pages to the faster aiohttp path - Retain geckodriver support as --use-gecko fallback for legacy environments - geckodriver path also improved: single driver instance reused across all URLs modules/wordlist.py - Replace manual frequency dict with collections.Counter (C-level implementation) - Switch file reading to line-by-line streaming to avoid full file loading in RAM - Add Unicode normalization (NFKD) for correct handling of accented characters - Deduplicate early with set(); Counter.most_common() replaces manual sort - Fix bug #17: static text list not saved from interactive mode — add explicit flush + fsync to guarantee writes before process exit - GoMutation invoked via stdin pipe instead of temp file, reducing disk I/O - GoMutation binary preserved and unchanged main - Dispatch to correct Python module based on CLI flag (-w, -t, -b) - Conditional GoMutation compilation preserved (go build only when binary absent) - Interactive menu retained for no-argument invocations requirements.txt - Add aiohttp>=3.9.3 and playwright>=1.43.0 for async/headless improvements - Bump urllib3 to >=1.26.19 to address open security PR #45 (Snyk CVE fix) - Pin lxml>=5.1.0 and beautifulsoup4>=4.12.3 tests/test_improvements.py (new) - Unit tests for normalize(), tokenize(), Counter pipeline, top_words() - Streaming file reader test with 10k-line corpus - save_wordlist() test asserting bug #17 regression does not reoccur - extract_words_from_html() tests covering script stripping and deduplication - Cache path determinism and collision-resistance tests .github/workflows/ci.yml (new) - Test matrix across Python 3.10, 3.11 and 3.12 - Bandit static security analysis on modules/ - pip-audit dependency vulnerability scan on each PR - ShellCheck linting for main, functions.sh and load.sh Expected performance gains: - Standard mode (-w / -t): 5–20x faster on multi-URL targets - HTML parsing: up to 10x faster with lxml - Aggressive mode (-a): 3–10x faster with browser tab pool - Repeated runs: near-instant via disk cache
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR was automatically created by Snyk using the credentials of a real user.

Snyk has created this PR to fix 1 vulnerabilities in the pip dependencies of this project.
Snyk changed the following file(s):
requirements.txt⚠️ Warning
``` requests 2.20.1 has requirement urllib3<1.25,>=1.21.1, but you have urllib3 2.0.7.