Turn a Twitter/X archive into a searchable map of themes, threads, and quotes.
Tweetscope imports an archive, enriches tweets with surrounding context, projects the corpus into an interactive space, names clusters with hierarchical labels, and serves the result through a React UI backed by a Hono API and LanceDB.
- Browse a topic map instead of scrolling a timeline.
- Search semantically and by keyword against the active scope.
- Open thread and quote side views with reply/quote graph overlays.
- Expand into a topic directory or multi-column carousel.
- Scrub time with timeline playback and filter to thread-heavy regions.
- Import likes into a sibling
-likesdataset that is grouped with the main collection on the dashboard.
The frontend is a routed React/Vite app with three live screens: dashboard, new collection, and the main explore surface. It talks only to the TypeScript Hono API. The API reads catalog metadata and serving tables from LanceDB, proxies a small set of raw files, and spawns Python subprocesses for imports. The Python side materializes artifacts under LATENT_SCOPE_DATA, exports serving tables to LanceDB, and keeps the catalog in sync.
Diagram source: documentation/diagrams/system-architecture.mmd
The current default Twitter pipeline is no longer the old ingest -> embed -> UMAP -> cluster -> label -> explore path. Today it is:
- Import and normalize archive data with
twitter_import.py - Create contextual embeddings with
embed.py - Build a 2D display UMAP and a separate clustering UMAP
- Build a hierarchy with
build_hierarchy.py(PLSCAN) - Name the hierarchy with
toponymy_labels.py - Materialize a serving scope, validate the contract, export to LanceDB, and register it in the catalog
- Build reply/quote graph artifacts for thread and quote views
Diagram source: documentation/diagrams/pipeline-flow.mmd
- Python 3.11+
uv- Node.js 22+
- npm
VOYAGE_API_KEYOPENAI_API_KEY- A writable
LATENT_SCOPE_DATAdirectory
Note: the repo root does not currently include a checked-in Python packaging manifest, so the commands below assume the Python dependencies are already available in your active environment.
git clone --recurse-submodules https://github.com/maskys/tweetscope.git
cd latent-scope
cd api && npm install && cd ..
cd web && npm install && cd ..cp .env.example .envSet at least:
LATENT_SCOPE_DATA=~/latent-scope-data
LATENT_SCOPE_APP_MODE=studio
VOYAGE_API_KEY=your-key
OPENAI_API_KEY=your-key
PORT=3000The API dev server reads the repo-root .env via api/package.json.
# Terminal 1
cd api && npm run dev
# Terminal 2
cd web && npm run devOpen http://localhost:5174.
How to request your X data export
- Go to x.com/settings/download_your_data
- Re-enter your password and request your archive
- X will email you when it's ready (usually 24–48 hours)
- Download the
.zip— this is what you upload to Tweetscope
Preferred UI flow:
- Open
/new - Upload a native X archive zip or start a Community Archive import
- For native archives, the browser extracts and normalizes the zip locally before upload
- The API runs
latentscope.scripts.twitter_importand redirects into the new scope when the job completes
Community Archive is an alternative if you don't have your own export. It pulls publicly donated tweet archives by username. Enter any username that has donated their archive and Tweetscope will fetch and process it. Community archives may not include likes.
Direct CLI flow:
uv run python3 -m latentscope.scripts.twitter_import my-tweets \
--source zip \
--zip_path archives/twitter-archive.zip \
--run_pipelineFor large archives, run year-by-year ingest first and then a final --run_pipeline pass. See the development guide for the current storage layout and pipeline details.
See DEVELOPMENT.md for:
- Frontend route and provider architecture
- Hono route groups and LanceDB serving model
- Current Python import and scope-export pipeline
- Dataset storage layout under
LATENT_SCOPE_DATA - Local development commands and verification commands
Deployment notes live under documentation/, including Vercel deployment and Cloudflare R2 / CDN setup.