deps(python): Update beautifulsoup4 requirement from >=4.12 to >=4.14.3 by dependabot[bot] · Pull Request #13 · prezis/scraperx

dependabot · 2026-04-20T10:43:18Z

Updates the requirements on beautifulsoup4 to permit the latest version.

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Updates the requirements on [playwright](https://github.com/microsoft/playwright-python) to permit the latest version. - [Release notes](https://github.com/microsoft/playwright-python/releases) - [Commits](microsoft/playwright-python@v1.40.0...v1.58.0) --- updated-dependencies: - dependency-name: playwright dependency-version: 1.58.0 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Updates the requirements on [imagehash](https://github.com/JohannesBuchner/imagehash) to permit the latest version. - [Release notes](https://github.com/JohannesBuchner/imagehash/releases) - [Commits](JohannesBuchner/imagehash@v4.3.0...v4.3.2) --- updated-dependencies: - dependency-name: imagehash dependency-version: 4.3.2 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Supersedes Dependabot PRs #2 and #3 which couldn't be merged via gh CLI (OAuth token lacks 'workflow' scope). Applied directly via SSH push. Same version bumps as the PRs, same scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#6) Updates the requirements on [faster-whisper](https://github.com/SYSTRAN/faster-whisper) to permit the latest version. - [Release notes](https://github.com/SYSTRAN/faster-whisper/releases) - [Commits](SYSTRAN/faster-whisper@v1.0.0...v1.2.1) --- updated-dependencies: - dependency-name: faster-whisper dependency-version: 1.2.1 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@7.0.0...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Supersedes #8 (had merge conflict after other PRs merged). Same bump as Dependabot proposed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ew + security audit ## CRITICAL (runtime crashes / dead code) 1. **VimeoScraper._parse_vtt signature mismatch** (scraperx/vimeo_scraper.py:309) YouTubeScraper._parse_vtt is an INSTANCE METHOD taking a file PATH. VimeoScraper imported it as a module function and passed VTT STRING content → would crash with TypeError on any Vimeo video with text_tracks. FIX: extracted parse_vtt_content(str) -> str into new _transcript_common.py 2. **Whisper backend string mismatch** (scraperx/vimeo_scraper.py:337) _detect_whisper_backend() returns "faster-whisper" and "whisper-cli". VimeoScraper compared against "faster" and "whisper_cli" → faster-whisper GPU path was UNREACHABLE dead code. FIX: use canonical strings from _transcript_common.detect_whisper_backend 3. **SSRF via redirect chain** (scraperx/avatar_matcher.py:92) urlopen follows HTTP redirects. Allowlist checked only on initial URL. Attacker-controlled pbs.twimg.com URL could redirect to 169.254.169.254 (AWS IMDS) or internal IPs, bypassing SSRF protection. FIX: custom _StrictRedirectHandler re-validates allowlist on every hop. ## HIGH 4. **Pillow decompression bomb** (avatar_matcher.py) 2MB byte cap doesn't bound decoded pixel count. Pillow default MAX is ~178MP (still enough to OOM). A 2MB PNG can decompress to multi-GB bitmap. FIX: Image.MAX_IMAGE_PIXELS = 20_000_000 (ample for any avatar). 5. **SSRF via _download_vtt track URL** (scraperx/vimeo_scraper.py:137) track["url"] comes from untrusted player_config JSON. No host validation before fetch — attacker-influenced config could redirect HTTP fetch. FIX: _VTT_HOST_ALLOWLIST + scheme check before _http_get. 6. **SSRF via _fetch_html in discover_videos** (scraperx/video_discovery.py:114) page_url passed to urlopen with no validation. file:///etc/passwd, http://127.0.0.1/admin, http://169.254.169.254/ all were reachable. FIX: _is_safe_page_url() — scheme allowlist (http/https only), block RFC-1918, loopback, link-local, multicast, reserved via ipaddress module. Verified 8/8 SSRF cases blocked. 7. **SQLite connection leak + thread-unsafe** (avatar_matcher.py) __init__ opened connection before schema init; exception leaked the fd. Single connection without check_same_thread=False → ProgrammingError on concurrent use. FIX: try/except in __init__ closes on failure. check_same_thread=False + threading.Lock + PRAGMA journal_mode=WAL. Plus __enter__/__exit__ for context-manager usage. ## MEDIUM correctness 8. **authenticity.is_authentic penalized root_deleted** (docstring contradiction) Docstring said "has_branches + root_deleted are advisory flags, don't fail." Code made is_authentic=False on root_deleted anyway. FIX: advisory flags truly advisory. is_authentic = AND of the 4 formal properties only. Callers read root_deleted/has_branches separately. ## Infrastructure - New module: scraperx/_transcript_common.py (133 lines) Pure functions: parse_vtt_content(str), parse_vtt_file(path), detect_whisper_backend(), detect_gpu_for_whisper(), transcribe_faster_whisper(path, *, model, device, compute_type, language), transcribe_whisper_cli(path, *, model, language), transcribe_audio(path) → auto-picks backend. - Resolves long-standing DRY debt flagged in original VimeoScraper file header. - AvatarMatcher now usable as context manager: `with AvatarMatcher() as m:` - Secure-by-default DB dir perms (mode=0o700). ## Testing - 212 existing tests still passing (no regressions) - 10/10 live E2E sims passed (VTT parsing, SSRF allowlist, Hamming, etc.) - 8/8 SSRF attack vectors blocked (localhost, AWS IMDS, file://, ftp://, etc.) - Live Vimeo fetch still works (Sintel via player_config fallback) Found via: parallel security-reviewer + code-reviewer agent audit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…task plan Research + rubric judge + voting deliverable for next session implementation. Source universe rated on 4 axes (signal density, accessibility, coverage, freshness): - Tier A (direct API, score >= 16): Reddit 19, HN 18, StackOverflow 18, GH Discussions 18, GH Trending 17, arXiv 17, PapersWithCode 17, DEV.to 17, X/Twitter 16, Substack 16, Semantic Scholar 16, YouTube 16 - Tier B (local_web_search generic): Lobsters 15, Google Scholar 15, Product Hunt 14, Medium 13, LinkedIn 13, Bluesky 12, Discord 12 - Tier C (skip): Mastodon 10 Architecture: single scraperx.github_analyzer module (NOT 10 separate scrapers). Leverages existing local-ai-mcp.web_research (SearXNG + Jina + qwen3.5:27b) for Tier B sources, avoiding 7 platform-specific scrapers. Landscape audit confirmed genuine gap — github/github-mcp-server (29k⭐) is generic tool wrapper, gitingest/repomix/code2prompt dump repos for LLMs. None answer 'is this repo worth using, better alternative exists, what community says'. 16-task sequenced plan (~15h): - Phase 1: Scaffolding (T1-T2) — 2h - Phase 2: Core GitHub adapter (T3-T4) — 3h - Phase 3: External mentions Tier A (T5-T9) — 3h - Phase 4: Semantic layer Tier B (T10) — 1.5h - Phase 5: Trending + Discovery (T11) — 1.5h - Phase 6: Synthesis + CLI (T12-T13) — 2h - Phase 7: Testing + MCP (T14-T15) — 2h - Phase 8: Docs + 1.4.0 release (T16) — 1h 5 open questions for user resolution before T1 starts. No code written this session — per user request this is research + plan only. Implementation deferred to next session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

46 CVEs found across 23 packages in extras. Dependabot will catch most weekly; manual review recommended within 1 week. Action items tracked in global-graph/journals/global-graph-todo.md #2. Key items requiring bumps: - cryptography 41.0.7 → 42.0.4+ (7 CVEs, incl RSA key disclosure) - urllib3 2.0.7 → 2.6.2+ (5 CVEs, incl decompression bomb DoS) Core scraperx unaffected (stdlib-only). Vulns are in optional extras' transitive deps (playwright, pillow) — opt-in only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 1 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 2 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 3 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 4 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 6 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 8 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 9 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 10 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 12 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…py,stackoverflow.py,synthesis.py Auto-committed by stop-auto-commit hook to prevent data loss. Files changed: 6 | Repo: scraperx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Task-Id: 22 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dependabot · 2026-04-20T10:43:19Z

Labels

The following labels could not be found: dependencies, python. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

…er (v1.4.2) New scraperx/github_analyzer/telemetry.py module: - log_verdict(report, feedback=None) → appends JSONL event to ~/.scraperx/verdicts.jsonl - prompt_and_log_verdict(report) → interactive wrapper: scores first, then prompts "Agree? [y/n/<reason>]" on stderr; non-TTY stdin silently skipped - _normalise_feedback(): y/yes/agree/tak → "agree", n/no/disagree/nie → "disagree", free-text preserved - Telemetry never raises (returns False on failure); auto-creates ~/.scraperx/ CLI: scraperx github OWNER/REPO --log-verdict Fires after report output so it never delays rendering; --json safe (prompt on stderr). 44 new tests in test_github_telemetry.py (522 total, 0 ruff warnings). This builds the disagreement corpus v1.5.0 layered scoring needs (target: 30 labeled verdicts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 23 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Collapses 5 early commits (v1.0.0, v1.1.0, thread-walk, search, errors) authored under noreply@users.noreply.github.com back to the real email. GitHub honours .mailmap for display — no force-push / history rewrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New doc: docs/api-endpoint-discovery.md — 8-step ladder for discovering a REST API's endpoint inventory when the vendor won't give you the contract. Includes new-scraper-client checklist so contributors don't ship code with invented path names. Canonical version at ~/ai/global-graph/patterns/api-endpoint-discovery-without-docs.md. Anchored to the Inter Cars HCMW-33526 incident 2026-04-24. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 22 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 23 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 26 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- scraperx/bmw_corpus/README.md: how-to-add-a-new-source recipe with legal posture audit + skeleton + operational guarantees. The specialized place for scraping methodology going forward. - output/bmw-trails/{kba,nhtsa,reddit}/*.jsonl: initial production data — 202+50+900 BMW records. - .processed markers: ingester byte-offset state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Documents architectural roads we explored but haven't built: translation enrichment, Common Crawl historical mining, YouTube transcripts, XenForo+Discourse engines, closing-post detection, cross-source dedup, manufacturer service docs. Each path tracks itself in user's living backlog at ~/ai/global-graph/projects/bmw-corpus-backlog.md (progress bars, buried-with-reason discipline). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 2 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 6 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…callsites Closes the unbounded-WAL disaster vector that existed in 1.4.2 and earlier: ~/.scraperx/social.db had journal_size_limit=-1 (uncapped) — same root cause that produced an 87 GB WAL on a sister project. With long-running scraperx daemons (BMW corpus ingester, Reddit/KBA scrapers, forum adapters) hammering the cache 24/7, the WAL could grow without bound. Adds scraperx/_sqlite_pragmas.py — shared apply_pragmas(conn) helper applying the 7-PRAGMA production stack (journal_mode=WAL, journal_size_limit=64MB, synchronous=NORMAL, busy_timeout=5000, foreign_keys=ON, mmap_size=256MB, temp_store=MEMORY). Idempotent. Wired into all 3 storage callsites: - SocialDB.__init__ (was: only journal_mode=WAL) - AvatarMatcher.__init__ (was: only journal_mode=WAL) - VerifiedAvatarRegistry.__init__ (was: no pragmas at all — implicitly relied on another consumer to open the shared DB first) 7 new tests in tests/test_sqlite_pragmas.py — all pass. Pre-existing 20-test github analyzer cache suite still green (zero regression on the SocialDB consumer path). Research grounding: loke.dev (Feb 2026), oneuptime (Feb 2026), powersync, phiresky tune.md, sqlite.org/pragma.html. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…jsonl.processed,2026-04.jsonl,2026-04.jsonl.processed Auto-committed by stop-auto-commit hook to prevent data loss. Files changed: 4 | Repo: scraperx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bumps __version__ + pyproject.toml to 1.4.3. Moves CHANGELOG entries from [Unreleased] under [1.4.3] — 2026-04-25. SemVer patch since this is a backward-compatible bug fix (closes the unbounded-WAL vector for long-running daemons). See 453ad8e for the actual implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nl.processed Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2 fingerprints — both are public on-chain Solana token mint addresses in tests/test_token_extractor.py (DezXAZ8z..., orcaEKTd...). Not secrets. Pre-commit hook ~/.git-hooks/gitleaks-protect.sh installed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 99cd8d19-8f7d-4a7a-8416-5a9f744e49d5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cascade) Phase 2.2 P2 of wojak-wojtek s23 plan. Universal URL fetcher with cascade fallback for "fetch URL → maybe-Cloudflare-walled" use case across the research stack. Per-URL cache in social.db (web_fetch_cache table, 24h TTL). Cascade: - jina: r.jina.ai → clean markdown extraction, free, no API key - urllib: stdlib HTTP → fast, JSON/RSS/static - playwright: optional dep → bypass bot-walls Hardening (post code-review): - Singleton sqlite Connection per db_path (was: open/close per call) - SSRF guard blocks private/loopback/link-local IPs + non-http schemes - Encoding cascade: header_charset → utf-8 → latin-1 - Cache-key stable on sha256(url); callers don't normalize Public API: smart_fetch(url, prefer="jina", strict=False, ...) → FetchResult Tests: 19 cases (cascade order, SSRF, TTL, cache singleton, encoding). 521 existing tests still pass. Version bump 1.4.3 → 1.5.0 (minor — additive feature). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Encodes wojak-wojtek s22 round 2 lesson: topics > keywords. Keyword search returns thousands of off-topic hits; topic-tagged repos cluster the actual ecosystem (e.g. topic:macroeconomics+topic:python returns the 4-5 repos serious people use). Public API: from scraperx import discover_repos, RepoCandidate candidates = discover_repos( topics=["macroeconomics", "python"], min_stars=100, recency_months=12, exclude_owners=["lb-tokenomiapro"], limit=50, ) CLI: scraperx gh-discover --topic X --topic Y --min-stars 100 [--json] scraperx gh-discover --topic onchain --analyze-top 3 # chain to analyzer scraperx gh-discover --topic python --query # print query, exit Hardening (post code-review): - Topic char validation: enforce GitHub naming rules (a-z 0-9 -, 1-50 chars). Spaces/slashes/commas/uppercase rejected with clear error before API hit. - Pagination: walks pages internally when limit > per_page (capped at 100). Respects GitHub's 1000-result ceiling on /search/repositories. Reuses existing GithubAPI client (auth via GITHUB_TOKEN, rate-limit tracking). Adds GithubAPI.search_repositories() public method for the search endpoint. Tests: 23 cases (query builder, candidate coercion, sort, dedup, exclusion, pagination, char validation). 544 existing scraperx tests still pass. Version bump 1.5.0 → 1.6.0 (minor — additive feature). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cache (Phase 2.2 P1) Replaces the manual prefix-probing dance from wojak-wojtek s22 batches 1+3+5 where VIX9D was blocked under TVC: but resolved under CBOE:; PCC failed everywhere and burned 8 retry probes per cron tick before being blacklisted. Public API: from scraperx import resolve_symbol, SymbolResolution res = resolve_symbol("VIX9D") # auto-detect → vol → CBOE res = resolve_symbol("ZN", asset_class="futures") res = resolve_symbol("X", candidates=["INDEX", "TVC"]) CLI: scraperx tv-resolve VIX9D # CBOE:VIX9D scraperx tv-resolve ZN --asset-class futures scraperx tv-resolve VIX9D --json scraperx tv-resolve VIX9D --candidates CBOE,TVC --strict Cache: per-(ticker,exchange) in social.db tv_symbol_cache table. - status=resolved → 7d TTL (exchange membership stable) - status=empty_no_data → 6h TTL (TV recognises but no bars yet) - status=not_found → 24h TTL (TV doesn't recognise; rare flip) Asset-class auto-detect from ticker patterns: - VIX*/SKEW/VVIX/GVZ → vol - EURUSD/USDJPY → fx - BTCUSDT/ETHUSDC → crypto - ZN/CL/GC/ZQ/ES → futures - SPX/NDX/TOTAL2 → index - else → broad fallback sweep across most-common exchanges Exchange priority lists per class (probe order; first hit wins): futures: CME, CBOT, COMEX, NYMEX, ICE vol: CBOE, TVC, INDEX fx: FX_IDC, OANDA, FOREXCOM equity: NASDAQ, NYSE, AMEX index: TVC, INDEX, CBOE, CRYPTOCAP crypto: BINANCE, COINBASE, KRAKEN, BYBIT, CRYPTOCAP Optional dep: pip install scraperx[tv-resolve] (pulls tvDatafeed). Module is import-safe without it — calls degrade to status=not_found with clear error. Tests: 36 cases (auto-detect for 18 ticker patterns, cascade behavior, cache hit/miss/TTL expiry, negative cache, CLI happy path, optional-dep absence). 540 existing scraperx tests still pass. Version bump 1.6.0 → 1.7.0 (minor — additive feature). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 99cd8d19-8f7d-4a7a-8416-5a9f744e49d5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates the requirements on [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/bs4/) to permit the latest version. --- updated-dependencies: - dependency-name: beautifulsoup4 dependency-version: 4.14.3 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot and others added 23 commits April 17, 2026 12:00

deps(python): bump Pillow >=10.0 → >=12.2.0

a9db48f

Supersedes #8 (had merge conflict after other PRs merged). Same bump as Dependabot proposed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: Add initial GitHub repo trust analysis implementation

740d61a

Task-Id: 1 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update social_db.py,test_github_analyzer_cache.py

d1b9e51

Task-Id: 2 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: Add initial GitHub API client implementation

a0421e0

Task-Id: 3 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: Add initial scoring heuristics for GitHub repo analysis

2667e42

Task-Id: 4 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: Add support for multiple external platforms in github_analyzer men

500d180

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update semantic.py,test_github_semantic.py

8f8e79a

Task-Id: 6 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update trending.py,test_github_trending.py

3158e70

Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat: Synthesize verdict and rationale for GitHub repo trust assessment

b96f38e

Task-Id: 8 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update __main__.py,cli.py,core.py

0753af8

Task-Id: 9 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update test_github_pipeline_e2e.py

22c19c0

Task-Id: 10 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update CHANGELOG.md,README.md,pyproject.toml

e865685

Task-Id: 12 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

checkpoint (00:18): session crash protection — devto.py,hn.py,reddit.…

be1204e

…py,stackoverflow.py,synthesis.py Auto-committed by stop-auto-commit hook to prevent data loss. Files changed: 6 | Repo: scraperx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: Add safe_float to handle numeric coercion quirks in Reddit respons

fb9fbe2

Task-Id: 22 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prezis and others added 6 commits April 22, 2026 08:41

The provided text appears to be a large list of URLs (web addresses) for

ee94e20

Task-Id: 23 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 4f61b6d2-1e45-4f14-aa5d-b9cb6552ff90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,__init__.py,_output.py

217e7ea

Task-Id: 22 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,nhtsa.py

6e846ec

Task-Id: 23 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prezis and others added 17 commits April 25, 2026 03:31

chore(auto): update 2026-04.jsonl,__init__.py,_http.py

010c269

Task-Id: 26 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: fdb1f2e6-6888-4a58-ac2d-68083f26a85d Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,2026-04.jsonl.processed,2026-04.jsonl

f6df0e9

Task-Id: 2 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl.processed

19c1bde

Task-Id: 6 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,2026-04.jsonl.processed,dvsa.py

96bb1c4

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,2026-04.jsonl.processed,2026-04.jso…

ecad2d4

…nl.processed Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 21e09fc9-77ee-4718-88af-f50afc79f9ce Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,2026-04.jsonl.processed,2026-04.jsonl

b3ae606

Task-Id: 5 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 99cd8d19-8f7d-4a7a-8416-5a9f744e49d5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(auto): update 2026-04.jsonl,2026-04.jsonl.processed

6685b5f

Task-Id: 7 Auto-committed by per-task-commit hook after TaskUpdate(completed). Session: 99cd8d19-8f7d-4a7a-8416-5a9f744e49d5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dependabot Bot force-pushed the dependabot/pip/beautifulsoup4-gte-4.14.3 branch from 723be30 to 6bcf4ab Compare April 26, 2026 07:55

prezis force-pushed the main branch from 8612d09 to 493d039 Compare April 27, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps(python): Update beautifulsoup4 requirement from >=4.12 to >=4.14.3#13

deps(python): Update beautifulsoup4 requirement from >=4.12 to >=4.14.3#13
dependabot[bot] wants to merge 46 commits intomainfrom
dependabot/pip/beautifulsoup4-gte-4.14.3

dependabot Bot commented on behalf of github Apr 20, 2026 •

edited

Loading

Uh oh!

dependabot Bot commented on behalf of github Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dependabot Bot commented on behalf of github Apr 20, 2026

Labels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dependabot Bot commented on behalf of github Apr 20, 2026 •

edited

Loading