Skip to content

feat(importers): add twitterapi.io and generic CSV import support#12

Open
Frostbite1536 wants to merge 11 commits intoMaskyS:mainfrom
Frostbite1536:main
Open

feat(importers): add twitterapi.io and generic CSV import support#12
Frostbite1536 wants to merge 11 commits intoMaskyS:mainfrom
Frostbite1536:main

Conversation

@Frostbite1536
Copy link

Summary

twitterapi.io is a third party API which is much cheaper than the official API. I have been using it for another analysis tool and it works well.

  • twitterapi_io.py — Live API fetcher + offline JSON loader for twitterapi.io. Fetches any public account's tweets with cursor-based pagination, rate-limit backoff, and retries. Also loads saved JSON responses from disk or in-memory dicts/lists. Maps camelCase API schema to tweetscope's flat _flatten_tweet() row format, including extra engagement fields (quotes, views, bookmarks). Pure stdlib — no external dependencies.
  • csv_import.py — Generic CSV/TSV importer with 60+ column name aliases for broad compatibility with Twitter data export tools (Chrome extensions, analytics platforms, etc.). Auto-detects delimiters, parses URL fields in multiple formats, and handles common column naming conventions.
  • Both produce rows schema-compatible with _flatten_tweet(), plugging directly into the ingestion pipeline
  • 29 new tests + 4 existing pass with no regressions

Architecture note

The importers follow a drop-in pattern: import ImportResult from twitter.py when inside tweetscope, fall back to a local dataclass for standalone use. Anyone can add format-specific importers by following the same pattern.

claude and others added 5 commits March 16, 2026 00:18
Add two new importers that plug directly into the existing ingestion
pipeline by producing rows matching the _flatten_tweet() schema:

- twitterapi_io.py: normalises camelCase JSON from the twitterapi.io
  REST API (accepts raw API response or bare tweet list)
- csv_import.py: auto-detects column names from common Twitter CSV
  export formats (X_Account_Analyzer, Chrome extensions, etc.) with
  TSV support and flexible column alias mapping

Both importers are pure stdlib with no external dependencies, use
ImportResult from twitter.py when inside tweetscope, and include a
standalone fallback for independent use.

Re-exports added to importers/__init__.py. 24 new tests covering
schema compatibility, HTML decoding, URL extraction, reply/retweet
detection, and column alias resolution.

https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…and X_Account_Analyzer CSV

Rework importers to combine the best of both implementations:

twitterapi_io.py (merged):
- Add fetch_twitterapi_io() for live API fetching with pagination,
  rate-limit backoff, and configurable max_pages
- load_twitterapi_io_json() now accepts file paths, dicts, or lists
- Add extra engagement fields: quotes, views, bookmarks
- Richer profile with followers, following, statuses_count, is_verified
- Date parsing handles both ISO and Twitter native formats
- Pure stdlib HTTP (urllib) — no external dependencies

xanalyzer_csv.py (new, from reference):
- Purpose-built for X_Account_Analyzer detailed.csv format
- Extracts tweet IDs from URLs (/status/123 → id: "123")
- Maps post_type ("reply"/"retweet"/"original") to is_reply/is_retweet
- Preserves sentiment_score, sentiment_label, engagement
- Auto-discovers summary.csv for profile enrichment (follower counts)
- Username filtering for multi-handle CSVs

csv_import.py (kept):
- Generic CSV/TSV importer for other Twitter export formats
- 60+ column name aliases for broad compatibility

__init__.py exports all five public functions:
  fetch_twitterapi_io, load_twitterapi_io_json,
  load_xanalyzer_csv, load_csv, load_csv_string

46 tests pass (17 twitterapi_io + 12 csv_import + 13 xanalyzer_csv + 4 existing).

https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
Remove X_Account_Analyzer-specific importer since the tool is not
publicly available. The generic csv_import.py and twitterapi_io.py
remain as generally useful importers for the community.

https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
- Remove unnecessary _flatten_twitterapi_tweet alias (no backwards
  compat needed on new code) and its test
- Remove unnecessary Content-Type header on GET requests in _api_request
- Fix inconsistent indices validation: URL entities now check
  isinstance(list) same as media entities
- Update csv_import.py docstring to remove reference to private tool
- Add missing test coverage: extendedEntities media extraction,
  TypeError on invalid input, fallback username/display_name

35 tests pass (19 twitterapi_io + 12 csv_import + 4 existing).

https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…importer-XM8vb

Claude/integrate tweetscope importer xm8vb
@vercel
Copy link

vercel bot commented Mar 16, 2026

@Frostbite1536 is attempting to deploy a commit to the maskys' projects Team on Vercel.

A member of the Team first needs to authorize it.

claude and others added 6 commits March 16, 2026 00:43
Add back xanalyzer_csv.py and its tests for private use. This was
excluded from the upstream PR but belongs in this fork.

https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…importer-XM8vb

feat(importers): restore X_Account_Analyzer CSV importer
CRITICAL fixes:
- Python SSRF: restrict resolve-url to t.co domain only (was open proxy)
- Python path traversal: add _safe_dataset_path() with realpath validation
  to all dataset routes (16+ endpoints)

HIGH fixes:
- SQL LIKE injection: escape %, _, \ in contains filter with ESCAPE clause
- Unbounded URL cache: add eviction at 10k entries (Python) and 5k (JS)
- Error message leakage: sanitize internal errors in search routes
- Batch DoS: limit resolve-urls to 50 URLs per request (both TS and Python)
- HTTP method misuse: change write endpoints from GET to POST
- Regex injection: disable regex in Python str.contains (use literal match)

MEDIUM fixes:
- Graph query limits: add upper bounds (10k chain, 50k descendants)
- Frontend memory leaks: add cache eviction to urlResolver, destroy()
  method to EmbedScheduler for event listener cleanup

LLM agent patterns addressed:
- Legacy code left unpatched during TS rewrite
- No adversarial input consideration (happy path only)
- Unbounded operations throughout

https://claude.ai/code/session_01KBwYSnfgmhwSNuu9XBDcgA
- Cap max_edges graph parameter at 50k to prevent DoS via massive responses
- Cap page parameter at 10k in query routes to prevent excessive offsets
- Add 5s timeout to t.co URL resolution fetch to prevent hanging connections
- Add 30s timeout to VoyageAI embedding API calls

https://claude.ai/code/session_01KBwYSnfgmhwSNuu9XBDcgA
…-rk1xH

Claude/audit codebase issues rk1x h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants