Skip to content

deps(python): Update twscrape requirement from >=0.12 to >=0.17.0#12

Open
dependabot[bot] wants to merge 24 commits intomainfrom
dependabot/pip/twscrape-gte-0.17.0
Open

deps(python): Update twscrape requirement from >=0.12 to >=0.17.0#12
dependabot[bot] wants to merge 24 commits intomainfrom
dependabot/pip/twscrape-gte-0.17.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Apr 20, 2026

Updates the requirements on twscrape to permit the latest version.

Release notes

Sourced from twscrape's releases.

v0.17.0

What's Changed

  • added x-request-client-id support (#245, #248)
  • update gql endpoints

Full Changelog: vladkens/twscrape@v0.16.0...v0.17.0

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependabot Bot and others added 24 commits April 17, 2026 12:00
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [playwright](https://github.com/microsoft/playwright-python) to permit the latest version.
- [Release notes](https://github.com/microsoft/playwright-python/releases)
- [Commits](microsoft/playwright-python@v1.40.0...v1.58.0)

---
updated-dependencies:
- dependency-name: playwright
  dependency-version: 1.58.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [imagehash](https://github.com/JohannesBuchner/imagehash) to permit the latest version.
- [Release notes](https://github.com/JohannesBuchner/imagehash/releases)
- [Commits](JohannesBuchner/imagehash@v4.3.0...v4.3.2)

---
updated-dependencies:
- dependency-name: imagehash
  dependency-version: 4.3.2
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Supersedes Dependabot PRs #2 and #3 which couldn't be merged via gh CLI
(OAuth token lacks 'workflow' scope). Applied directly via SSH push.
Same version bumps as the PRs, same scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#6)

Updates the requirements on [faster-whisper](https://github.com/SYSTRAN/faster-whisper) to permit the latest version.
- [Release notes](https://github.com/SYSTRAN/faster-whisper/releases)
- [Commits](SYSTRAN/faster-whisper@v1.0.0...v1.2.1)

---
updated-dependencies:
- dependency-name: faster-whisper
  dependency-version: 1.2.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@7.0.0...9.0.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Supersedes #8 (had merge conflict after other PRs merged).
Same bump as Dependabot proposed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ew + security audit

## CRITICAL (runtime crashes / dead code)

1. **VimeoScraper._parse_vtt signature mismatch** (scraperx/vimeo_scraper.py:309)
   YouTubeScraper._parse_vtt is an INSTANCE METHOD taking a file PATH.
   VimeoScraper imported it as a module function and passed VTT STRING content
   → would crash with TypeError on any Vimeo video with text_tracks.
   FIX: extracted parse_vtt_content(str) -> str into new _transcript_common.py

2. **Whisper backend string mismatch** (scraperx/vimeo_scraper.py:337)
   _detect_whisper_backend() returns "faster-whisper" and "whisper-cli".
   VimeoScraper compared against "faster" and "whisper_cli"
   → faster-whisper GPU path was UNREACHABLE dead code.
   FIX: use canonical strings from _transcript_common.detect_whisper_backend

3. **SSRF via redirect chain** (scraperx/avatar_matcher.py:92)
   urlopen follows HTTP redirects. Allowlist checked only on initial URL.
   Attacker-controlled pbs.twimg.com URL could redirect to 169.254.169.254
   (AWS IMDS) or internal IPs, bypassing SSRF protection.
   FIX: custom _StrictRedirectHandler re-validates allowlist on every hop.

## HIGH

4. **Pillow decompression bomb** (avatar_matcher.py)
   2MB byte cap doesn't bound decoded pixel count. Pillow default MAX is
   ~178MP (still enough to OOM). A 2MB PNG can decompress to multi-GB bitmap.
   FIX: Image.MAX_IMAGE_PIXELS = 20_000_000 (ample for any avatar).

5. **SSRF via _download_vtt track URL** (scraperx/vimeo_scraper.py:137)
   track["url"] comes from untrusted player_config JSON. No host validation
   before fetch — attacker-influenced config could redirect HTTP fetch.
   FIX: _VTT_HOST_ALLOWLIST + scheme check before _http_get.

6. **SSRF via _fetch_html in discover_videos** (scraperx/video_discovery.py:114)
   page_url passed to urlopen with no validation.
   file:///etc/passwd, http://127.0.0.1/admin, http://169.254.169.254/ all
   were reachable.
   FIX: _is_safe_page_url() — scheme allowlist (http/https only), block
   RFC-1918, loopback, link-local, multicast, reserved via ipaddress module.
   Verified 8/8 SSRF cases blocked.

7. **SQLite connection leak + thread-unsafe** (avatar_matcher.py)
   __init__ opened connection before schema init; exception leaked the fd.
   Single connection without check_same_thread=False → ProgrammingError on
   concurrent use.
   FIX: try/except in __init__ closes on failure. check_same_thread=False +
   threading.Lock + PRAGMA journal_mode=WAL. Plus __enter__/__exit__ for
   context-manager usage.

## MEDIUM correctness

8. **authenticity.is_authentic penalized root_deleted** (docstring contradiction)
   Docstring said "has_branches + root_deleted are advisory flags, don't fail."
   Code made is_authentic=False on root_deleted anyway.
   FIX: advisory flags truly advisory. is_authentic = AND of the 4 formal
   properties only. Callers read root_deleted/has_branches separately.

## Infrastructure

- New module: scraperx/_transcript_common.py (133 lines)
  Pure functions: parse_vtt_content(str), parse_vtt_file(path),
  detect_whisper_backend(), detect_gpu_for_whisper(),
  transcribe_faster_whisper(path, *, model, device, compute_type, language),
  transcribe_whisper_cli(path, *, model, language),
  transcribe_audio(path) → auto-picks backend.
- Resolves long-standing DRY debt flagged in original VimeoScraper file header.
- AvatarMatcher now usable as context manager: `with AvatarMatcher() as m:`
- Secure-by-default DB dir perms (mode=0o700).

## Testing

- 212 existing tests still passing (no regressions)
- 10/10 live E2E sims passed (VTT parsing, SSRF allowlist, Hamming, etc.)
- 8/8 SSRF attack vectors blocked (localhost, AWS IMDS, file://, ftp://, etc.)
- Live Vimeo fetch still works (Sintel via player_config fallback)

Found via: parallel security-reviewer + code-reviewer agent audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…task plan

Research + rubric judge + voting deliverable for next session implementation.

Source universe rated on 4 axes (signal density, accessibility, coverage, freshness):
- Tier A (direct API, score >= 16): Reddit 19, HN 18, StackOverflow 18, GH Discussions 18,
  GH Trending 17, arXiv 17, PapersWithCode 17, DEV.to 17, X/Twitter 16, Substack 16,
  Semantic Scholar 16, YouTube 16
- Tier B (local_web_search generic): Lobsters 15, Google Scholar 15, Product Hunt 14,
  Medium 13, LinkedIn 13, Bluesky 12, Discord 12
- Tier C (skip): Mastodon 10

Architecture: single scraperx.github_analyzer module (NOT 10 separate scrapers).
Leverages existing local-ai-mcp.web_research (SearXNG + Jina + qwen3.5:27b) for
Tier B sources, avoiding 7 platform-specific scrapers.

Landscape audit confirmed genuine gap — github/github-mcp-server (29k⭐) is generic
tool wrapper, gitingest/repomix/code2prompt dump repos for LLMs. None answer
'is this repo worth using, better alternative exists, what community says'.

16-task sequenced plan (~15h):
- Phase 1: Scaffolding (T1-T2) — 2h
- Phase 2: Core GitHub adapter (T3-T4) — 3h
- Phase 3: External mentions Tier A (T5-T9) — 3h
- Phase 4: Semantic layer Tier B (T10) — 1.5h
- Phase 5: Trending + Discovery (T11) — 1.5h
- Phase 6: Synthesis + CLI (T12-T13) — 2h
- Phase 7: Testing + MCP (T14-T15) — 2h
- Phase 8: Docs + 1.4.0 release (T16) — 1h

5 open questions for user resolution before T1 starts.

No code written this session — per user request this is research + plan only.
Implementation deferred to next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
46 CVEs found across 23 packages in extras. Dependabot will catch most weekly;
manual review recommended within 1 week. Action items tracked in
global-graph/journals/global-graph-todo.md #2.

Key items requiring bumps:
- cryptography 41.0.7 → 42.0.4+ (7 CVEs, incl RSA key disclosure)
- urllib3 2.0.7 → 2.6.2+ (5 CVEs, incl decompression bomb DoS)

Core scraperx unaffected (stdlib-only). Vulns are in optional extras'
transitive deps (playwright, pillow) — opt-in only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 1
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 2
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 3
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 4
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 5
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 6
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 7
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 8
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 9
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 10
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task-Id: 12
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…py,stackoverflow.py,synthesis.py

Auto-committed by stop-auto-commit hook to prevent data loss.
Files changed: 6 | Repo: scraperx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Task-Id: 22
Auto-committed by per-task-commit hook after TaskUpdate(completed).
Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the requirements on [twscrape](https://github.com/vladkens/twscrape) to permit the latest version.
- [Release notes](https://github.com/vladkens/twscrape/releases)
- [Commits](vladkens/twscrape@v0.12...v0.17.0)

---
updated-dependencies:
- dependency-name: twscrape
  dependency-version: 0.17.0
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github Apr 20, 2026

Labels

The following labels could not be found: dependencies, python. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant