feat(importers): add twitterapi.io and generic CSV import support#12
Open
Frostbite1536 wants to merge 11 commits intoMaskyS:mainfrom
Open
feat(importers): add twitterapi.io and generic CSV import support#12Frostbite1536 wants to merge 11 commits intoMaskyS:mainfrom
Frostbite1536 wants to merge 11 commits intoMaskyS:mainfrom
Conversation
Add two new importers that plug directly into the existing ingestion pipeline by producing rows matching the _flatten_tweet() schema: - twitterapi_io.py: normalises camelCase JSON from the twitterapi.io REST API (accepts raw API response or bare tweet list) - csv_import.py: auto-detects column names from common Twitter CSV export formats (X_Account_Analyzer, Chrome extensions, etc.) with TSV support and flexible column alias mapping Both importers are pure stdlib with no external dependencies, use ImportResult from twitter.py when inside tweetscope, and include a standalone fallback for independent use. Re-exports added to importers/__init__.py. 24 new tests covering schema compatibility, HTML decoding, URL extraction, reply/retweet detection, and column alias resolution. https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…and X_Account_Analyzer CSV
Rework importers to combine the best of both implementations:
twitterapi_io.py (merged):
- Add fetch_twitterapi_io() for live API fetching with pagination,
rate-limit backoff, and configurable max_pages
- load_twitterapi_io_json() now accepts file paths, dicts, or lists
- Add extra engagement fields: quotes, views, bookmarks
- Richer profile with followers, following, statuses_count, is_verified
- Date parsing handles both ISO and Twitter native formats
- Pure stdlib HTTP (urllib) — no external dependencies
xanalyzer_csv.py (new, from reference):
- Purpose-built for X_Account_Analyzer detailed.csv format
- Extracts tweet IDs from URLs (/status/123 → id: "123")
- Maps post_type ("reply"/"retweet"/"original") to is_reply/is_retweet
- Preserves sentiment_score, sentiment_label, engagement
- Auto-discovers summary.csv for profile enrichment (follower counts)
- Username filtering for multi-handle CSVs
csv_import.py (kept):
- Generic CSV/TSV importer for other Twitter export formats
- 60+ column name aliases for broad compatibility
__init__.py exports all five public functions:
fetch_twitterapi_io, load_twitterapi_io_json,
load_xanalyzer_csv, load_csv, load_csv_string
46 tests pass (17 twitterapi_io + 12 csv_import + 13 xanalyzer_csv + 4 existing).
https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
Remove X_Account_Analyzer-specific importer since the tool is not publicly available. The generic csv_import.py and twitterapi_io.py remain as generally useful importers for the community. https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
- Remove unnecessary _flatten_twitterapi_tweet alias (no backwards compat needed on new code) and its test - Remove unnecessary Content-Type header on GET requests in _api_request - Fix inconsistent indices validation: URL entities now check isinstance(list) same as media entities - Update csv_import.py docstring to remove reference to private tool - Add missing test coverage: extendedEntities media extraction, TypeError on invalid input, fallback username/display_name 35 tests pass (19 twitterapi_io + 12 csv_import + 4 existing). https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…importer-XM8vb Claude/integrate tweetscope importer xm8vb
|
@Frostbite1536 is attempting to deploy a commit to the maskys' projects Team on Vercel. A member of the Team first needs to authorize it. |
Add back xanalyzer_csv.py and its tests for private use. This was excluded from the upstream PR but belongs in this fork. https://claude.ai/code/session_019HSb1hE1xWXAkh6S9ZGub8
…importer-XM8vb feat(importers): restore X_Account_Analyzer CSV importer
CRITICAL fixes: - Python SSRF: restrict resolve-url to t.co domain only (was open proxy) - Python path traversal: add _safe_dataset_path() with realpath validation to all dataset routes (16+ endpoints) HIGH fixes: - SQL LIKE injection: escape %, _, \ in contains filter with ESCAPE clause - Unbounded URL cache: add eviction at 10k entries (Python) and 5k (JS) - Error message leakage: sanitize internal errors in search routes - Batch DoS: limit resolve-urls to 50 URLs per request (both TS and Python) - HTTP method misuse: change write endpoints from GET to POST - Regex injection: disable regex in Python str.contains (use literal match) MEDIUM fixes: - Graph query limits: add upper bounds (10k chain, 50k descendants) - Frontend memory leaks: add cache eviction to urlResolver, destroy() method to EmbedScheduler for event listener cleanup LLM agent patterns addressed: - Legacy code left unpatched during TS rewrite - No adversarial input consideration (happy path only) - Unbounded operations throughout https://claude.ai/code/session_01KBwYSnfgmhwSNuu9XBDcgA
- Cap max_edges graph parameter at 50k to prevent DoS via massive responses - Cap page parameter at 10k in query routes to prevent excessive offsets - Add 5s timeout to t.co URL resolution fetch to prevent hanging connections - Add 30s timeout to VoyageAI embedding API calls https://claude.ai/code/session_01KBwYSnfgmhwSNuu9XBDcgA
…-rk1xH Claude/audit codebase issues rk1x h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
twitterapi.io is a third party API which is much cheaper than the official API. I have been using it for another analysis tool and it works well.
twitterapi_io.py— Live API fetcher + offline JSON loader for twitterapi.io. Fetches any public account's tweets with cursor-based pagination, rate-limit backoff, and retries. Also loads saved JSON responses from disk or in-memory dicts/lists. Maps camelCase API schema to tweetscope's flat_flatten_tweet()row format, including extra engagement fields (quotes,views,bookmarks). Pure stdlib — no external dependencies.csv_import.py— Generic CSV/TSV importer with 60+ column name aliases for broad compatibility with Twitter data export tools (Chrome extensions, analytics platforms, etc.). Auto-detects delimiters, parses URL fields in multiple formats, and handles common column naming conventions._flatten_tweet(), plugging directly into the ingestion pipelineArchitecture note
The importers follow a drop-in pattern: import
ImportResultfromtwitter.pywhen inside tweetscope, fall back to a local dataclass for standalone use. Anyone can add format-specific importers by following the same pattern.