From e4c1bfc5ddd565a8688dda176f92d43cbf0f6e77 Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 6 Apr 2026 23:38:58 +0200 Subject: [PATCH 01/10] Plan for ENSdb CLI --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 303 ++++++++++++++++++ 1 file changed, 303 insertions(+) create mode 100644 .cursor/plans/ensdb_cli_tool_422abf99.plan.md diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md new file mode 100644 index 000000000..622b52c2a --- /dev/null +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -0,0 +1,303 @@ +--- +name: ENSDb CLI Tool +overview: Create a new `apps/ensdb-cli` application that provides database inspection, schema management, and snapshot import/export/push/pull operations for ENSDb, targeting 50-100GB PostgreSQL databases with S3-compatible storage and safe restore flows for fresh or isolated databases. +todos: + - id: scaffold + content: "Scaffold apps/ensdb-cli: package.json, tsconfig, vitest config, yargs entry point, workspace integration" + status: pending + - id: inspect + content: "Implement inspect command: list schemas with classification, per-schema details (tables, sizes, row counts)" + status: pending + - id: schema-drop + content: Implement schema drop command with safety confirmation + status: pending + - id: pgdump-wrapper + content: Implement pg_dump/pg_restore wrapper with parallel jobs and progress reporting + status: pending + - id: snapshot-create + content: "Implement snapshot create: dump all indexer schemas + ponder_sync + full ensnode.metadata, then generate manifest + checksums" + status: pending + - id: snapshot-restore + content: "Implement snapshot restore: unpack selected archives, validate manifest, pg_restore with parallel jobs, and restore filtered metadata rows into a fresh or isolated database" + status: pending + - id: s3-client + content: Implement S3 client layer with multipart upload/download support + status: pending + - id: snapshot-push + content: "Implement snapshot push: upload snapshot artifacts and manifest to S3-compatible storage" + status: pending + - id: snapshot-pull + content: "Implement snapshot pull: download from S3, verify checksums" + status: pending + - id: snapshot-list-info + content: Implement snapshot list and snapshot info commands for browsing remote snapshots + status: pending + - id: dockerfile + content: Create Dockerfile with postgresql-client for pg_dump/pg_restore + status: pending + - id: docs + content: Add documentation and README + status: pending +isProject: false +--- + +# ENSDb CLI Tool + +## Context + +ENSNode production databases are 50-100GB PostgreSQL instances. Each chain deployment gets its own indexer schema following the naming convention `{deployment}Schema{version}`. Three schema types coexist in one database: + +**Production database (7 schemas):** + +- **Indexer schemas (5):** + - `alphaSchema1.9.0` -- alpha deployment (all chains) + - `alphaSepoliaSchema1.9.0` -- alpha sepolia + - `mainnetSchema1.9.0` -- mainnet + - `sepoliaSchema1.9.0` -- sepolia + - `v2SepoliaSchema1.9.0` -- v2 sepolia +- **Shared schemas (2):** + - `ensnode` -- metadata table (rows scoped by `ens_indexer_schema_name`) + - `ponder_sync` -- shared RPC cache and sync state (needed by every indexer) + +Schema names are set via `ENSINDEXER_SCHEMA_NAME` env var in the blue-green deploy workflow (`[.github/workflows/deploy_ensnode_blue_green.yml](.github/workflows/deploy_ensnode_blue_green.yml)`). Old schemas are orphaned on redeploy and must be dropped manually to reclaim space. + +The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snapshotting and restoring database state. + +## Architecture Decisions + +### Snapshot Composition + +A snapshot always contains: + +1. **All current indexer schemas** from the source database +2. The **full `ponder_sync`** schema +3. The **full `ensnode.metadata`** table contents + +Selective workflows happen later: + +1. `snapshot pull --schemas ...` downloads only the selected indexer schema archives plus shared artifacts +2. `snapshot restore --schemas ...` restores only those selected indexer schemas and filters `ensnode.metadata` rows to match + +Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. Restoring a partial snapshot into an already-used shared database is out of scope for the first version. + +### Snapshot Format + +Use `**pg_dump --format=directory`** with `**--jobs=N`** for parallel dump/restore. This is the only format that supports parallelism, which is critical for 50-100GB databases. Each directory-format dump is then archived as a `**.dump.tar.zst`** artifact for storage and transfer, and unpacked to a temporary directory before restore. + +- Dump: `pg_dump --format=directory --jobs=4 --schema= --file /.dumpdir` +- Archive: `tar --zstd -cf /.dump.tar.zst -C .dumpdir` +- Restore: unpack `.dump.tar.zst` to a temp directory, then run `pg_restore --format=directory --jobs=4 --schema= /.dumpdir` + +The implementation should explicitly budget temporary disk usage for both the compressed archive and the unpacked directory during restore. + +### S3 Storage Layout + +Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containing a `manifest.json` and per-schema dump files: + +``` +{prefix}/ + {snapshot-id}/ + manifest.json # snapshot metadata (all schemas, sizes, versions) + {schema-name}.dump.tar.zst # archived pg_dump directory output (one per indexer schema) + ponder_sync.dump.tar.zst # archived pg_dump of ponder_sync + ensnode_metadata.json # all ensnode.metadata rows + checksums.sha256 # integrity verification +``` + +- `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` for metadata display. +- `snapshot pull` downloads only the selected schema dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json` (CLI filters metadata rows locally to match selected schemas during restore). + +### Technology + +- **CLI framework**: yargs (consistent with ENSRainbow's `apps/ensrainbow/src/cli.ts`) +- **S3**: `@aws-sdk/client-s3` + `@aws-sdk/lib-storage` (multipart uploads for large files) +- **Database**: `pg` for connection validation, shells out to `pg_dump`/`pg_restore` for actual operations +- **Runtime**: tsx (consistent with other apps) +- **Validation**: zod +- **Existing code to leverage**: `@ensnode/ensdb-sdk` for schema definitions and metadata access + +## CLI Commands + +### Inspect + +``` +ensdb-cli inspect --database-url + List all schemas with type classification and size info. + +ensdb-cli inspect --database-url --schema + Show detailed info for a specific schema (tables, row counts, sizes). +``` + +### Schema Management + +``` +ensdb-cli schema drop --database-url --schema [--force] + Drop a schema. Requires --force or interactive confirmation. +``` + +### Snapshot Operations + +``` +ensdb-cli snapshot create --database-url --output + Export ALL indexer schemas + ponder_sync + all ensnode.metadata to a local snapshot. + Runs pg_dump with parallel jobs for each schema. + +ensdb-cli snapshot restore --database-url --input --schemas [--drop-existing] + Restore selected schema(s) from a local snapshot into a fresh or isolated database. + Restores the selected indexer schema dump(s) + ponder_sync + filtered ensnode.metadata rows. + Fails if target schema already exists unless --drop-existing is passed. + Unpacks `.dump.tar.zst` archives to temp storage, then runs pg_restore with parallel jobs. + +ensdb-cli snapshot push --input --bucket [--endpoint ] [--prefix ] + Upload a local snapshot to S3-compatible storage. Uses multipart upload. + +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--schemas ] + Download from S3. If --schemas specified, downloads only those schema dumps + shared artifacts. + If --schemas omitted, downloads the full snapshot. + +ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] + List available snapshots from S3 with metadata summary (uses ListObjects + manifest.json). + +ensdb-cli snapshot info --snapshot-id --bucket [--endpoint ] + Show detailed metadata for a specific remote snapshot (fetches and displays manifest.json). +``` + +### Common Options + +- `--database-url` / `ENSDB_URL` -- PostgreSQL connection string +- `--jobs` / `-j` -- parallelism for pg_dump/pg_restore (default: 4) +- `--bucket` / `ENSDB_SNAPSHOT_BUCKET` -- S3 bucket name +- `--endpoint` / `ENSDB_SNAPSHOT_ENDPOINT` -- S3-compatible endpoint (for R2, MinIO) +- `--verbose` / `-v` -- detailed output + +## Manifest Schema + +Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by reading `ensindexer_public_config` from `ensnode.metadata` -- no manual input needed for namespace, plugins, or chain IDs. + +```json +{ + "version": 1, + "snapshotId": "mainnetSchema1.9.0-2026-04-06-abc123", + "createdAt": "2026-04-06T12:00:00Z", + "postgresVersion": "16.2", + "schemas": [ + { + "name": "mainnetSchema1.9.0", + "type": "ensindexer", + "sizeBytes": 45000000000, + "tableCount": 12, + "dumpFile": "mainnetSchema1.9.0.dump.tar.zst", + "indexerConfig": { + "ensdbVersion": "1.9.0", + "namespace": "mainnet", + "plugins": ["subgraph"], + "indexedChainIds": [1], + "isSubgraphCompatible": true, + "labelSet": { "labelSetId": "subgraph", "labelSetVersion": 0 }, + "versionInfo": { + "ensDb": "1.9.0", + "ponder": "0.16.3", + "ensIndexer": "1.9.0" + } + } + } + ], + "ponderSync": { + "sizeBytes": 8000000000, + "dumpFile": "ponder_sync.dump.tar.zst" + }, + "metadata": { + "file": "ensnode_metadata.json", + "indexerSchemas": ["mainnetSchema1.9.0"] + }, + "totalSizeBytes": 53000000000, + "checksumFile": "checksums.sha256" +} +``` + +The `indexerConfig` is extracted from the three `ensnode.metadata` keys: + +- `ensdb_version` -- ENSDb version string +- `ensindexer_public_config` -- namespace, plugins, chains, version info, label set, subgraph compatibility +- `ensindexer_indexing_status` -- per-chain sync status (block numbers, timestamps, chain-following state) + +This means `snapshot list` can show rich summaries like: + +``` +ID Namespace Plugins Chains Created +mainnetSchema1.9.0-2026-04-06-abc123 mainnet subgraph 1 2026-04-06 +alphaSchema1.9.0-2026-04-06-def456 mainnet subgraph+basenames+6 6 2026-04-06 +``` + +## Project Structure + +``` +apps/ensdb-cli/ + package.json + tsconfig.json + vitest.config.ts + src/ + cli.ts # yargs entry point + commands/ + inspect.ts # inspect command + schema-drop.ts # schema drop command + snapshot-create.ts # snapshot create + snapshot-restore.ts # snapshot restore + snapshot-push.ts # push to S3 + snapshot-pull.ts # pull from S3 + snapshot-list.ts # list remote snapshots + snapshot-info.ts # remote snapshot info + lib/ + database.ts # pg connection, schema queries + pgdump.ts # pg_dump/pg_restore wrapper + s3.ts # S3 client, multipart upload/download + manifest.ts # manifest read/write, validation + snapshot.ts # snapshot directory management + types.ts # shared types +``` + +## Implementation Phases + +### Phase 1: Project Setup + Inspect + Schema Drop + +- Scaffold `apps/ensdb-cli` with yargs, tsx, vitest +- Implement `inspect` command (re-implement PR #891 cleanly, using `@ensnode/ensdb-sdk` where possible) +- Implement `schema drop` command +- Add to pnpm workspace + +### Phase 2: Local Snapshot Create + Restore + +- Implement `pg_dump` wrapper with parallel jobs and progress reporting +- Implement `snapshot create` (dump all indexer schemas + ponder_sync + full metadata extraction) +- Implement archive packaging and unpacking for directory-format dumps +- Implement `snapshot restore` (fresh or isolated database only, pg_restore with parallel jobs) +- Manifest generation and validation +- Checksum generation and verification + +### Phase 3: S3 Push + Pull + List + +- S3 client with multipart upload support +- `snapshot push` with manifest and artifact upload only +- `snapshot pull` with integrity verification +- `snapshot list` and `snapshot info` for browsing remote snapshots + +### Phase 4: Polish + Production Readiness + +- Dockerfile (include `postgresql-client` for pg_dump/pg_restore) +- Progress bars for large operations +- Dry-run mode for destructive operations +- Comprehensive error messages and recovery guidance +- Documentation + +## Resolved Decisions + +1. **Discovery**: No shared `index.json`. Use S3 `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. +2. **Snapshot granularity**: `snapshot create` always dumps ALL indexer schemas + `ponder_sync` + all `ensnode.metadata`. `snapshot pull` and `snapshot restore` let the user select which indexer schema(s) they want from that full snapshot. +3. **Restore safety**: `snapshot restore` targets a fresh or isolated database only because `ponder_sync` is shared state. Partial restore into an already-used shared database is not supported in v1. +4. **Restore behavior**: Fail by default if a target schema already exists. Pass `--drop-existing` to drop and replace. Prevents accidental data loss while keeping the convenient path available. + +## Open Questions for Stakeholders + +1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. +2. **Retention policy**: Should `snapshot list` show all snapshots ever, or should there be a TTL/cleanup mechanism (e.g. `snapshot delete`)? + From 532b8cfa2123a1612679e165e471b402a9124e0a Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 6 Apr 2026 23:48:47 +0200 Subject: [PATCH 02/10] Add streaming upload mode consideration to ENSdb CLI plan --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 622b52c2a..2f39e43e9 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -300,4 +300,4 @@ apps/ensdb-cli/ 1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. 2. **Retention policy**: Should `snapshot list` show all snapshots ever, or should there be a TTL/cleanup mechanism (e.g. `snapshot delete`)? - +3. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? True end-to-end streaming likely conflicts with the chosen `pg_dump --format=directory` approach, so supporting it may require either a different dump format or a hybrid design where each completed schema archive is uploaded immediately after local creation. From d72ad26bbefe34f2488f3ec25149a6a6378ebb88 Mon Sep 17 00:00:00 2001 From: djstrong Date: Tue, 7 Apr 2026 22:37:46 +0200 Subject: [PATCH 03/10] Enhance ENSdb CLI plan with detailed snapshot creation and restoration processes, including schema discovery, metadata handling, and new snapshot delete functionality. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 140 +++++++++++++++--- 1 file changed, 120 insertions(+), 20 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 2f39e43e9..9be89cfb3 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -15,7 +15,7 @@ todos: content: Implement pg_dump/pg_restore wrapper with parallel jobs and progress reporting status: pending - id: snapshot-create - content: "Implement snapshot create: dump all indexer schemas + ponder_sync + full ensnode.metadata, then generate manifest + checksums" + content: "Implement snapshot create: dump discovered indexer schemas + ponder_sync + full ensnode schema + ensnode_metadata.json, then generate manifest + checksums" status: pending - id: snapshot-restore content: "Implement snapshot restore: unpack selected archives, validate manifest, pg_restore with parallel jobs, and restore filtered metadata rows into a fresh or isolated database" @@ -32,6 +32,9 @@ todos: - id: snapshot-list-info content: Implement snapshot list and snapshot info commands for browsing remote snapshots status: pending + - id: snapshot-delete + content: "Implement snapshot delete: list objects under snapshot prefix, delete with --force or confirmation" + status: pending - id: dockerfile content: Create Dockerfile with postgresql-client for pg_dump/pg_restore status: pending @@ -63,23 +66,33 @@ Schema names are set via `ENSINDEXER_SCHEMA_NAME` env var in the blue-green depl The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snapshotting and restoring database state. +### Related Issues + +- [#833](https://github.com/namehash/ensnode/issues/833) -- Simplify downloading of `ponder_sync` for internal developers. The CLI's `snapshot pull` and `snapshot restore` commands directly address this by supporting selective download of just `ponder_sync` from a remote snapshot. +- [#1127](https://github.com/namehash/ensnode/issues/1127) -- Matrix ENSApi smoke tests across subgraph-compat, alpha-style, and v2 configs. The CLI's snapshot infrastructure enables setting up isolated test databases with specific indexer configurations for CI smoke testing. See "CI Test Matrix Support" section below. +- [#279](https://github.com/namehash/ensnode/issues/279) -- Count Unknown Names & Unknown Labels. A future roadmap extension: the CLI's database access and inspect infrastructure can be extended with an `analyze` command to compute analytical metrics over indexed data. See "Future Roadmap" section below. + ## Architecture Decisions ### Snapshot Composition -A snapshot always contains: +A full snapshot contains **separate pg_dump archives** for: -1. **All current indexer schemas** from the source database -2. The **full `ponder_sync`** schema -3. The **full `ensnode.metadata`** table contents +1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `**--exclude-schemas`** (or an allowlist flag) so they are not dumped by mistake. +2. The `**ponder_sync`** schema (full) +3. The `**ensnode**` schema (full), not only the `metadata` table -Selective workflows happen later: +Additionally, the snapshot includes `**ensnode_metadata.json**`: a JSON export of all rows from `ensnode.metadata`. This supports manifest enrichment, `snapshot list` summaries, and **selective restore** (replay only the metadata rows for chosen indexer schema names without requiring a second full `ensnode` download when operators use the slim pull path). -1. `snapshot pull --schemas ...` downloads only the selected indexer schema archives plus shared artifacts -2. `snapshot restore --schemas ...` restores only those selected indexer schemas and filters `ensnode.metadata` rows to match +Selective workflows: + +1. `snapshot pull --schemas ...` downloads the selected indexer archives + `ponder_sync` + `ensnode_metadata.json` (+ optionally full `ensnode.dump.tar.zst` when a full `ensnode` restore is required -- see implementation note below) +2. `snapshot restore --schemas ...` restores the selected indexer schema dumps + `ponder_sync`, then applies **filtered** `ensnode.metadata` rows (from JSON or from a partial upsert strategy) so other indexers' metadata rows are not overwritten incorrectly Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. Restoring a partial snapshot into an already-used shared database is out of scope for the first version. +**Implementation note:** For a **full** restore of everything, restore `ponder_sync`, `ensnode` (from `ensnode.dump.tar.zst`), and each indexer schema. For **selective** restore, the CLI may restore indexer schema(s) + `ponder_sync` and upsert only the relevant metadata rows from `ensnode_metadata.json` instead of replacing the entire `ensnode` schema, to avoid clobbering metadata for indexers not being restored. Exact mechanics should be validated against how Drizzle/Ponder expect `ensnode` to look after restore. + ### Snapshot Format Use `**pg_dump --format=directory`** with `**--jobs=N`** for parallel dump/restore. This is the only format that supports parallelism, which is critical for 50-100GB databases. Each directory-format dump is then archived as a `**.dump.tar.zst`** artifact for storage and transfer, and unpacked to a temporary directory before restore. @@ -90,6 +103,8 @@ Use `**pg_dump --format=directory`** with `**--jobs=N`** for parallel dump/resto The implementation should explicitly budget temporary disk usage for both the compressed archive and the unpacked directory during restore. +**Tooling prerequisites:** Archiving uses `tar` with zstd compression (`tar --zstd` or pipe to `zstd`). The Docker image and operator docs must include `tar`, `zstd`, and PostgreSQL client tools (`pg_dump`, `pg_restore`) compatible with the server major version. + ### S3 Storage Layout Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containing a `manifest.json` and per-schema dump files: @@ -100,12 +115,13 @@ Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containi manifest.json # snapshot metadata (all schemas, sizes, versions) {schema-name}.dump.tar.zst # archived pg_dump directory output (one per indexer schema) ponder_sync.dump.tar.zst # archived pg_dump of ponder_sync - ensnode_metadata.json # all ensnode.metadata rows + ensnode.dump.tar.zst # archived pg_dump of full ensnode schema + ensnode_metadata.json # all ensnode.metadata rows (JSON; for manifest + selective metadata replay) checksums.sha256 # integrity verification ``` - `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` for metadata display. -- `snapshot pull` downloads only the selected schema dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json` (CLI filters metadata rows locally to match selected schemas during restore). +- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json`. Downloading `**ensnode.dump.tar.zst**` is optional for selective restore if metadata replay from JSON is sufficient; include it when doing a full `ensnode` schema restore. ### Technology @@ -138,9 +154,9 @@ ensdb-cli schema drop --database-url --schema [--force] ### Snapshot Operations ``` -ensdb-cli snapshot create --database-url --output - Export ALL indexer schemas + ponder_sync + all ensnode.metadata to a local snapshot. - Runs pg_dump with parallel jobs for each schema. +ensdb-cli snapshot create --database-url --output [--exclude-schemas ] + Export all discovered indexer schemas + ponder_sync + full ensnode schema + ensnode.metadata JSON. + Runs pg_dump with parallel jobs for each schema. Use --exclude-schemas to skip unrelated app schemas. ensdb-cli snapshot restore --database-url --input --schemas [--drop-existing] Restore selected schema(s) from a local snapshot into a fresh or isolated database. @@ -148,18 +164,32 @@ ensdb-cli snapshot restore --database-url --input --schemas --input --ponder-sync-only [--drop-existing] + Restore only ponder_sync (no indexer schemas, no ensnode.metadata). + Enables the developer workflow described in #833: quickly bootstrap a local ponder_sync + so a new indexer can skip RPC re-fetching. + ensdb-cli snapshot push --input --bucket [--endpoint ] [--prefix ] Upload a local snapshot to S3-compatible storage. Uses multipart upload. -ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--schemas ] - Download from S3. If --schemas specified, downloads only those schema dumps + shared artifacts. - If --schemas omitted, downloads the full snapshot. +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--schemas ] [--with-ensnode-schema] + Download from S3. If --schemas specified, downloads those indexer dumps + ponder_sync + ensnode_metadata.json; + pass --with-ensnode-schema to also fetch ensnode.dump.tar.zst for a full ensnode pg_restore. + If --schemas omitted, downloads the full snapshot (all artifacts). + +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] --ponder-sync-only + Download only ponder_sync.dump.tar.zst from a remote snapshot (#833). + Skips all indexer schema dumps and ensnode_metadata.json. ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] List available snapshots from S3 with metadata summary (uses ListObjects + manifest.json). ensdb-cli snapshot info --snapshot-id --bucket [--endpoint ] Show detailed metadata for a specific remote snapshot (fetches and displays manifest.json). + +ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint ] [--force] + Delete a snapshot and all its artifacts from S3. Requires --force or interactive confirmation. + Removes all objects under the snapshot prefix ({prefix}/{snapshot-id}/). ``` ### Common Options @@ -206,6 +236,10 @@ Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by r "sizeBytes": 8000000000, "dumpFile": "ponder_sync.dump.tar.zst" }, + "ensnodeSchema": { + "sizeBytes": 1200000, + "dumpFile": "ensnode.dump.tar.zst" + }, "metadata": { "file": "ensnode_metadata.json", "indexerSchemas": ["mainnetSchema1.9.0"] @@ -247,6 +281,7 @@ apps/ensdb-cli/ snapshot-pull.ts # pull from S3 snapshot-list.ts # list remote snapshots snapshot-info.ts # remote snapshot info + snapshot-delete.ts # delete remote snapshot prefix lib/ database.ts # pg connection, schema queries pgdump.ts # pg_dump/pg_restore wrapper @@ -274,12 +309,13 @@ apps/ensdb-cli/ - Manifest generation and validation - Checksum generation and verification -### Phase 3: S3 Push + Pull + List +### Phase 3: S3 Push + Pull + List + Delete - S3 client with multipart upload support - `snapshot push` with manifest and artifact upload only - `snapshot pull` with integrity verification - `snapshot list` and `snapshot info` for browsing remote snapshots +- `snapshot delete` (list objects under prefix, batch delete, `--force` / confirmation) ### Phase 4: Polish + Production Readiness @@ -289,15 +325,79 @@ apps/ensdb-cli/ - Comprehensive error messages and recovery guidance - Documentation +## CI Test Matrix Support (#1127) + +The snapshot infrastructure directly enables the matrix smoke tests described in [#1127](https://github.com/namehash/ensnode/issues/1127). The production database contains indexer schemas with three distinct configurations that map to the test matrix: + +- **Subgraph-compat**: `mainnetSchema1.9.0` (plugins: `["subgraph"]`, `isSubgraphCompatible: true`) +- **Alpha-style**: `alphaSchema1.9.0` (plugins: `["subgraph","basenames","lineanames","threedns",...]`, `isSubgraphCompatible: false`) +- **V2**: `v2SepoliaSchema1.9.0` (plugins: `["ensv2","protocol-acceleration"]`, `isSubgraphCompatible: false`) + +The manifest's `indexerConfig` on each schema entry includes `plugins`, `namespace`, `isSubgraphCompatible`, and `indexedChainIds`, which provides enough information for CI to select the correct schema for each test variant. + +**CI workflow pattern:** + +``` +# 1. Pull only the schema needed for this matrix entry +ensdb-cli snapshot pull \ + --snapshot-id \ + --schemas mainnetSchema1.9.0 \ + --bucket $ENSDB_SNAPSHOT_BUCKET \ + --output /tmp/snapshot + +# 2. Restore into an isolated test database +ensdb-cli snapshot restore \ + --database-url $TEST_DB_URL \ + --input /tmp/snapshot \ + --schemas mainnetSchema1.9.0 + +# 3. Run smoke tests against the restored database +ENSDB_URL=$TEST_DB_URL ENSINDEXER_SCHEMA_NAME=mainnetSchema1.9.0 pnpm test:smoke +``` + +Each matrix entry pulls and restores a different schema, then runs ENSApi smoke tests against it. The selective pull avoids downloading the full 50-100GB snapshot for each matrix entry -- only the relevant schema dump plus `ponder_sync` are transferred. + +The `snapshot list` and `snapshot info` commands can also be used in CI to discover the latest available snapshot ID before pulling. + +## Future Roadmap + +### Analytical Queries (#279) + +[#279](https://github.com/namehash/ensnode/issues/279) requires counting Unknown Names and Unknown Labels by iterating through domain data. This will be a **separate `analyze` command**, not part of `inspect`. + +**Why separate from `inspect`:** + +- `inspect` stays fast (milliseconds) -- it reads only `information_schema`, `pg_stat_user_tables`, and `ensnode.metadata`. +- `analyze` performs heavy table scans over potentially millions of domain rows (seconds to minutes at 50-100GB scale). Mixing slow analytical queries into `inspect` would make it unpredictably slow. +- The output shape is different: `inspect` shows schema structure and metadata; `analyze` produces statistical reports with their own formatting and flags (e.g. `--top-n`, `--output-format`). +- `analyze` becomes a natural home for future heavy queries (domain distribution by chain, registration trends, label healing coverage). + +`inspect --schema ` may include a lightweight `Domain count` line from `pg_stat_user_tables.n_live_tup` (free, approximate) as a hint, but the deep scan stays in `analyze`. + +**Future command:** + +``` +ensdb-cli analyze unknown-labels --database-url --schema [--top-n 100] [--output-format table|csv|json] + Count unknown names, unknown labels (distinct and non-distinct), + and return the top-N most frequent unknown labels with occurrence counts. + Uses @ensnode/ensdb-sdk typed access to domain tables. + Supports progress reporting for long-running scans. +``` + +The snapshot create/restore workflow enables **offline analysis**: snapshot production, restore into an isolated database, run analysis without impacting production. The `ensindexer_public_config` metadata (available in manifests) identifies which schemas are subgraph-compatible, which is relevant to #279 since the metrics are anchored to the ENS Subgraph definition of Unknown Labels. + +This is explicitly **out of scope for v1** but the plan ensures the CLI's database access layer (`lib/database.ts`, `@ensnode/ensdb-sdk` integration) is designed to support it. + ## Resolved Decisions 1. **Discovery**: No shared `index.json`. Use S3 `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. -2. **Snapshot granularity**: `snapshot create` always dumps ALL indexer schemas + `ponder_sync` + all `ensnode.metadata`. `snapshot pull` and `snapshot restore` let the user select which indexer schema(s) they want from that full snapshot. +2. **Snapshot granularity**: `snapshot create` dumps all discovered indexer deployment schemas + `ponder_sync` + full `ensnode` schema + `ensnode_metadata.json`, with optional `--exclude-schemas` for unrelated app schemas. `snapshot pull` and `snapshot restore` let the user select which indexer schema(s) they want from that full snapshot; full `ensnode` schema archive is optional on pull when metadata JSON replay is enough. 3. **Restore safety**: `snapshot restore` targets a fresh or isolated database only because `ponder_sync` is shared state. Partial restore into an already-used shared database is not supported in v1. 4. **Restore behavior**: Fail by default if a target schema already exists. Pass `--drop-existing` to drop and replace. Prevents accidental data loss while keeping the convenient path available. +5. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. ## Open Questions for Stakeholders 1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. -2. **Retention policy**: Should `snapshot list` show all snapshots ever, or should there be a TTL/cleanup mechanism (e.g. `snapshot delete`)? -3. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? True end-to-end streaming likely conflicts with the chosen `pg_dump --format=directory` approach, so supporting it may require either a different dump format or a hybrid design where each completed schema archive is uploaded immediately after local creation. +2. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? True end-to-end streaming likely conflicts with the chosen `pg_dump --format=directory` approach, so supporting it may require either a different dump format or a hybrid design where each completed schema archive is uploaded immediately after local creation. + From 04bedf18c9ed1cd0f34341d449b954e1884957ae Mon Sep 17 00:00:00 2001 From: djstrong Date: Tue, 7 Apr 2026 22:41:02 +0200 Subject: [PATCH 04/10] Expand ENSdb CLI plan with in-depth analysis of streaming upload modes, detailing implications of `pg_dump` formats, checksum handling, and manifest finalization for stakeholder decision-making. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 9be89cfb3..b47ea514a 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -399,5 +399,18 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa ## Open Questions for Stakeholders 1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. -2. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? True end-to-end streaming likely conflicts with the chosen `pg_dump --format=directory` approach, so supporting it may require either a different dump format or a hybrid design where each completed schema archive is uploaded immediately after local creation. +2. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? + + **Why `pg_dump --format=directory` complicates streaming:** directory format writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). + + **Other `pg_dump` formats do not remove all constraints:** + - **`--format=custom`**: single-file output and can be streamed (e.g. `pg_dump -Fc ...` piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. + - **`--format=plain`**: single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. + + **Checksums and manifest:** The plan includes `checksums.sha256` and a `manifest.json` with per-artifact sizes. For **end-to-end streaming** without a local file: + - Per-artifact SHA-256 can still be computed by hashing bytes **as they pass through** the upload pipeline (running a digest alongside the stream), then writing the digest into `checksums.sha256` and the manifest **after** that artifact finishes. + - Alternatively, **S3 object checksums** (multipart part ETags, or `ChecksumSHA256` on `PutObject` where supported) can supplement or replace client-side files, but the manifest must state what is verified (client hash vs object checksum). + - The **full snapshot manifest** cannot be finalized until all artifacts are complete, so uploads can be incremental, but **manifest upload is always last** (or use a two-phase manifest: provisional then final). + + **Conclusion for stakeholders:** Decide between (a) v1 local-first only, (b) **hybrid** -- each schema completes locally, then upload (simplest integrity story), or (c) true pipe-to-S3 with streaming hash and deferred manifest. From b08bfc117f3733dbed5d1027a7455e082494fe14 Mon Sep 17 00:00:00 2001 From: djstrong Date: Tue, 7 Apr 2026 22:51:27 +0200 Subject: [PATCH 05/10] Refine ENSdb CLI plan with updated snapshot restore process, introducing preflight checks to ensure safe restoration, and clarifying metadata handling and schema validation steps. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 81 +++++++++++++------ 1 file changed, 56 insertions(+), 25 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index b47ea514a..45138490f 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -18,7 +18,7 @@ todos: content: "Implement snapshot create: dump discovered indexer schemas + ponder_sync + full ensnode schema + ensnode_metadata.json, then generate manifest + checksums" status: pending - id: snapshot-restore - content: "Implement snapshot restore: unpack selected archives, validate manifest, pg_restore with parallel jobs, and restore filtered metadata rows into a fresh or isolated database" + content: "Implement snapshot restore: preflight-restore checks, unpack archives, validate manifest, pg_restore + filtered metadata upsert" status: pending - id: s3-client content: Implement S3 client layer with multipart upload/download support @@ -78,24 +78,46 @@ The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snap A full snapshot contains **separate pg_dump archives** for: -1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `**--exclude-schemas`** (or an allowlist flag) so they are not dumped by mistake. -2. The `**ponder_sync`** schema (full) -3. The `**ensnode**` schema (full), not only the `metadata` table +1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `--exclude-schemas` (or an allowlist flag) so they are not dumped by mistake. +2. The `ponder_sync` schema (full) +3. The `ensnode` schema (full), not only the `metadata` table -Additionally, the snapshot includes `**ensnode_metadata.json**`: a JSON export of all rows from `ensnode.metadata`. This supports manifest enrichment, `snapshot list` summaries, and **selective restore** (replay only the metadata rows for chosen indexer schema names without requiring a second full `ensnode` download when operators use the slim pull path). +Additionally, the snapshot includes `ensnode_metadata.json`: a JSON export of all rows from `ensnode.metadata`. This supports manifest enrichment, `snapshot list` summaries, and **selective restore** (replay only the metadata rows for chosen indexer schema names without requiring a second full `ensnode` download when operators use the slim pull path). Selective workflows: 1. `snapshot pull --schemas ...` downloads the selected indexer archives + `ponder_sync` + `ensnode_metadata.json` (+ optionally full `ensnode.dump.tar.zst` when a full `ensnode` restore is required -- see implementation note below) 2. `snapshot restore --schemas ...` restores the selected indexer schema dumps + `ponder_sync`, then applies **filtered** `ensnode.metadata` rows (from JSON or from a partial upsert strategy) so other indexers' metadata rows are not overwritten incorrectly -Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. Restoring a partial snapshot into an already-used shared database is out of scope for the first version. +Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. The CLI enforces that with **preflight checks** (below) instead of relying on operator discipline alone. **Implementation note:** For a **full** restore of everything, restore `ponder_sync`, `ensnode` (from `ensnode.dump.tar.zst`), and each indexer schema. For **selective** restore, the CLI may restore indexer schema(s) + `ponder_sync` and upsert only the relevant metadata rows from `ensnode_metadata.json` instead of replacing the entire `ensnode` schema, to avoid clobbering metadata for indexers not being restored. Exact mechanics should be validated against how Drizzle/Ponder expect `ensnode` to look after restore. +### Preflight checks (`snapshot restore`) + +Runs in the **restore command handler** immediately after validating CLI args and loading the manifest, and **before** any `pg_restore` and **before** evaluating `--drop-existing` for target indexer schemas. + +**Checks (fail closed by default):** + +1. **ponder_sync non-empty:** If schema `ponder_sync` exists, detect any user data (e.g. `SELECT EXISTS (SELECT 1 FROM ponder_sync. LIMIT 1)` or sum `n_live_tup` from `pg_stat_user_tables` for `ponder_sync`). If non-empty, fail with identifier `ENSDB_CLI_ERR_PREFLIGHT_PONDER_SYNC_NONEMPTY` (human-readable message explains shared sync state would be overwritten). +2. **ensnode.metadata conflicts:** Query `ensnode.metadata` if the schema exists. For **selective** restore (`--schemas`), fail if any row exists whose `ens_indexer_schema_name` is **not** in the target schema set (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_CONFLICT`). For **full** restore (all indexer schemas + full `ensnode` from dump), fail if **any** metadata rows exist (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_NONEMPTY`) — distinct message from selective conflict so operators know which case fired. +3. **Unexpected non-system schemas / objects:** Enumerate schemas outside PostgreSQL system namespaces (`pg_*`, `information_schema`, etc.). For the intended restore set, fail if a **non-target indexer schema** already exists (tables present or schema present) (`ENSDB_CLI_ERR_PREFLIGHT_UNEXPECTED_SCHEMA`). Optionally extend with a stricter mode: fail if `public` (or other default) contains unexpected user tables — keep the rule deterministic in code and documented. + +**Override:** Pass `--force-or-confirm` to skip these preflight failures and proceed (operator asserts the target is disposable or they accept clobbering). Implementation may require interactive confirmation when TTY is available; non-interactive mode requires the flag. This is separate from `--drop-existing`, which only governs dropping **named target indexer schemas** (and optionally `ponder_sync` / `ensnode` if explicitly documented) **after** preflight passes or is overridden. + +**Order of operations:** + +1. Preflight (unless skipped via `--force-or-confirm`) +2. If restoring indexer schemas and targets exist: apply `--drop-existing` drops for those schemas only (as today) +3. `pg_restore` / metadata upsert + +**Surfacing errors:** Print distinct messages per failure class; include the `ENSDB_CLI_ERR_PREFLIGHT_*` identifier in the message (and optionally `process.exit` with dedicated codes, e.g. `2` / `3` / `4`, if the team wants scriptable CI — document in README). + +**Selective restore:** Preflight must ensure metadata upsert will not clobber other indexers: the `ensnode.metadata` row check above is mandatory before replaying filtered `ensnode_metadata.json`. + ### Snapshot Format -Use `**pg_dump --format=directory`** with `**--jobs=N`** for parallel dump/restore. This is the only format that supports parallelism, which is critical for 50-100GB databases. Each directory-format dump is then archived as a `**.dump.tar.zst`** artifact for storage and transfer, and unpacked to a temporary directory before restore. +Use `pg_dump` with `--format=directory` and `--jobs=N` for parallel dump/restore. This is the only format that supports parallelism, which is critical for 50-100GB databases. Each directory-format dump is then archived as a `.dump.tar.zst` file for storage and transfer, and unpacked to a temporary directory before restore. - Dump: `pg_dump --format=directory --jobs=4 --schema= --file /.dumpdir` - Archive: `tar --zstd -cf /.dump.tar.zst -C .dumpdir` @@ -121,7 +143,7 @@ Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containi ``` - `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` for metadata display. -- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json`. Downloading `**ensnode.dump.tar.zst**` is optional for selective restore if metadata replay from JSON is sufficient; include it when doing a full `ensnode` schema restore. +- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json`. Downloading `ensnode.dump.tar.zst` is optional for selective restore if metadata replay from JSON is sufficient; include it when doing a full `ensnode` schema restore. ### Technology @@ -158,38 +180,41 @@ ensdb-cli snapshot create --database-url --output [--exclude-schema Export all discovered indexer schemas + ponder_sync + full ensnode schema + ensnode.metadata JSON. Runs pg_dump with parallel jobs for each schema. Use --exclude-schemas to skip unrelated app schemas. -ensdb-cli snapshot restore --database-url --input --schemas [--drop-existing] +ensdb-cli snapshot restore --database-url --input --schemas [--drop-existing] [--force-or-confirm] Restore selected schema(s) from a local snapshot into a fresh or isolated database. Restores the selected indexer schema dump(s) + ponder_sync + filtered ensnode.metadata rows. - Fails if target schema already exists unless --drop-existing is passed. + Runs preflight (see above) before pg_restore; fails if shared state or metadata conflicts unless --force-or-confirm. + Fails if target indexer schema already exists unless --drop-existing is passed (after preflight). Unpacks `.dump.tar.zst` archives to temp storage, then runs pg_restore with parallel jobs. -ensdb-cli snapshot restore --database-url --input --ponder-sync-only [--drop-existing] +ensdb-cli snapshot restore --database-url --input --ponder-sync-only [--drop-existing] [--force-or-confirm] Restore only ponder_sync (no indexer schemas, no ensnode.metadata). + Preflight still applies to non-empty ponder_sync unless --force-or-confirm. Enables the developer workflow described in #833: quickly bootstrap a local ponder_sync so a new indexer can skip RPC re-fetching. ensdb-cli snapshot push --input --bucket [--endpoint ] [--prefix ] Upload a local snapshot to S3-compatible storage. Uses multipart upload. -ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--schemas ] [--with-ensnode-schema] +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] [--schemas ] [--with-ensnode-schema] Download from S3. If --schemas specified, downloads those indexer dumps + ponder_sync + ensnode_metadata.json; pass --with-ensnode-schema to also fetch ensnode.dump.tar.zst for a full ensnode pg_restore. If --schemas omitted, downloads the full snapshot (all artifacts). + --prefix scopes all keys to `{prefix}/{snapshot-id}/...` (same as list/push/info/delete). -ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] --ponder-sync-only +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] --ponder-sync-only Download only ponder_sync.dump.tar.zst from a remote snapshot (#833). Skips all indexer schema dumps and ensnode_metadata.json. ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] List available snapshots from S3 with metadata summary (uses ListObjects + manifest.json). -ensdb-cli snapshot info --snapshot-id --bucket [--endpoint ] - Show detailed metadata for a specific remote snapshot (fetches and displays manifest.json). +ensdb-cli snapshot info --snapshot-id --bucket [--endpoint ] [--prefix ] + Show detailed metadata for a specific remote snapshot (fetches and displays manifest.json under `{prefix}/{snapshot-id}/`). -ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint ] [--force] +ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint ] [--prefix ] [--force] Delete a snapshot and all its artifacts from S3. Requires --force or interactive confirmation. - Removes all objects under the snapshot prefix ({prefix}/{snapshot-id}/). + Removes all objects under `{prefix}/{snapshot-id}/`. ``` ### Common Options @@ -198,6 +223,8 @@ ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint - `--jobs` / `-j` -- parallelism for pg_dump/pg_restore (default: 4) - `--bucket` / `ENSDB_SNAPSHOT_BUCKET` -- S3 bucket name - `--endpoint` / `ENSDB_SNAPSHOT_ENDPOINT` -- S3-compatible endpoint (for R2, MinIO) +- `--prefix` / `ENSDB_SNAPSHOT_PREFIX` -- key prefix inside the bucket (default empty). All snapshot S3 commands (`push`, `pull`, `list`, `info`, `delete`) must resolve object keys as `{prefix}/{snapshot-id}/...` so behavior matches when omitted (empty prefix) or when a shared prefix is used. +- `--force-or-confirm` -- snapshot restore only: skip preflight failures (non-empty `ponder_sync`, `ensnode.metadata` conflicts, unexpected schemas). Use only when the operator accepts overwriting shared state. - `--verbose` / `-v` -- detailed output ## Manifest Schema @@ -284,6 +311,7 @@ apps/ensdb-cli/ snapshot-delete.ts # delete remote snapshot prefix lib/ database.ts # pg connection, schema queries + preflight-restore.ts # fresh/isolated DB checks before pg_restore pgdump.ts # pg_dump/pg_restore wrapper s3.ts # S3 client, multipart upload/download manifest.ts # manifest read/write, validation @@ -305,13 +333,14 @@ apps/ensdb-cli/ - Implement `pg_dump` wrapper with parallel jobs and progress reporting - Implement `snapshot create` (dump all indexer schemas + ponder_sync + full metadata extraction) - Implement archive packaging and unpacking for directory-format dumps -- Implement `snapshot restore` (fresh or isolated database only, pg_restore with parallel jobs) +- Implement `snapshot restore` (preflight in `preflight-restore.ts`, then pg_restore with parallel jobs) - Manifest generation and validation - Checksum generation and verification ### Phase 3: S3 Push + Pull + List + Delete - S3 client with multipart upload support +- Shared helper: resolve `{prefix}/{snapshot-id}/` from `--prefix` / `ENSDB_SNAPSHOT_PREFIX` for every snapshot S3 command (`push`, `pull`, `list`, `info`, `delete`) - `snapshot push` with manifest and artifact upload only - `snapshot pull` with integrity verification - `snapshot list` and `snapshot info` for browsing remote snapshots @@ -363,7 +392,7 @@ The `snapshot list` and `snapshot info` commands can also be used in CI to disco ### Analytical Queries (#279) -[#279](https://github.com/namehash/ensnode/issues/279) requires counting Unknown Names and Unknown Labels by iterating through domain data. This will be a **separate `analyze` command**, not part of `inspect`. +[#279](https://github.com/namehash/ensnode/issues/279) requires counting Unknown Names and Unknown Labels by iterating through domain data. This will be a separate **`analyze`** command, not part of **`inspect`**. **Why separate from `inspect`:** @@ -392,8 +421,8 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa 1. **Discovery**: No shared `index.json`. Use S3 `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. 2. **Snapshot granularity**: `snapshot create` dumps all discovered indexer deployment schemas + `ponder_sync` + full `ensnode` schema + `ensnode_metadata.json`, with optional `--exclude-schemas` for unrelated app schemas. `snapshot pull` and `snapshot restore` let the user select which indexer schema(s) they want from that full snapshot; full `ensnode` schema archive is optional on pull when metadata JSON replay is enough. -3. **Restore safety**: `snapshot restore` targets a fresh or isolated database only because `ponder_sync` is shared state. Partial restore into an already-used shared database is not supported in v1. -4. **Restore behavior**: Fail by default if a target schema already exists. Pass `--drop-existing` to drop and replace. Prevents accidental data loss while keeping the convenient path available. +3. **Restore safety**: `snapshot restore` assumes a fresh or isolated database; **preflight** enforces this (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas) unless `--force-or-confirm` is passed. +4. **Restore behavior**: Preflight runs first, then fail if a target indexer schema already exists unless `--drop-existing` is passed. `--drop-existing` does not bypass preflight; only `--force-or-confirm` does. Prevents accidental data loss while keeping an explicit escape hatch. 5. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. ## Open Questions for Stakeholders @@ -401,16 +430,18 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa 1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. 2. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? - **Why `pg_dump --format=directory` complicates streaming:** directory format writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). + **Why directory format complicates streaming:** `pg_dump --format=directory` writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). **Other `pg_dump` formats do not remove all constraints:** - - **`--format=custom`**: single-file output and can be streamed (e.g. `pg_dump -Fc ...` piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. - - **`--format=plain`**: single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. + + - **Custom** (`pg_dump --format=custom` / `-Fc`): single-file output and can be streamed (e.g. piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. + - **Plain** (`pg_dump --format=plain`): single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. **Checksums and manifest:** The plan includes `checksums.sha256` and a `manifest.json` with per-artifact sizes. For **end-to-end streaming** without a local file: + - Per-artifact SHA-256 can still be computed by hashing bytes **as they pass through** the upload pipeline (running a digest alongside the stream), then writing the digest into `checksums.sha256` and the manifest **after** that artifact finishes. - Alternatively, **S3 object checksums** (multipart part ETags, or `ChecksumSHA256` on `PutObject` where supported) can supplement or replace client-side files, but the manifest must state what is verified (client hash vs object checksum). - The **full snapshot manifest** cannot be finalized until all artifacts are complete, so uploads can be incremental, but **manifest upload is always last** (or use a two-phase manifest: provisional then final). - **Conclusion for stakeholders:** Decide between (a) v1 local-first only, (b) **hybrid** -- each schema completes locally, then upload (simplest integrity story), or (c) true pipe-to-S3 with streaming hash and deferred manifest. + **Conclusion for stakeholders:** Decide between (a) v1 local-first only, (b) **hybrid** — each schema completes locally, then upload (simplest integrity story), or (c) true pipe-to-S3 with streaming hash and deferred manifest. From 303562b1d37107ec6b84065cc9942e6899c724ab Mon Sep 17 00:00:00 2001 From: djstrong Date: Wed, 8 Apr 2026 16:19:11 +0200 Subject: [PATCH 06/10] Update ENSdb CLI plan to standardize command options by replacing `--database-url` with `--ensdb-url` across various commands, enhancing clarity and consistency in usage. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 47 +++++++++---------- 1 file changed, 22 insertions(+), 25 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 45138490f..361719ba1 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -159,35 +159,35 @@ Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containi ### Inspect ``` -ensdb-cli inspect --database-url +ensdb-cli inspect [--ensdb-url ] List all schemas with type classification and size info. -ensdb-cli inspect --database-url --schema +ensdb-cli inspect [--ensdb-url ] --schema Show detailed info for a specific schema (tables, row counts, sizes). ``` ### Schema Management ``` -ensdb-cli schema drop --database-url --schema [--force] +ensdb-cli schema drop [--ensdb-url ] --schema [--force] Drop a schema. Requires --force or interactive confirmation. ``` ### Snapshot Operations ``` -ensdb-cli snapshot create --database-url --output [--exclude-schemas ] +ensdb-cli snapshot create [--ensdb-url ] --output [--exclude-schemas ] Export all discovered indexer schemas + ponder_sync + full ensnode schema + ensnode.metadata JSON. Runs pg_dump with parallel jobs for each schema. Use --exclude-schemas to skip unrelated app schemas. -ensdb-cli snapshot restore --database-url --input --schemas [--drop-existing] [--force-or-confirm] +ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--drop-existing] [--force-or-confirm] Restore selected schema(s) from a local snapshot into a fresh or isolated database. Restores the selected indexer schema dump(s) + ponder_sync + filtered ensnode.metadata rows. Runs preflight (see above) before pg_restore; fails if shared state or metadata conflicts unless --force-or-confirm. Fails if target indexer schema already exists unless --drop-existing is passed (after preflight). Unpacks `.dump.tar.zst` archives to temp storage, then runs pg_restore with parallel jobs. -ensdb-cli snapshot restore --database-url --input --ponder-sync-only [--drop-existing] [--force-or-confirm] +ensdb-cli snapshot restore [--ensdb-url ] --input --ponder-sync-only [--drop-existing] [--force-or-confirm] Restore only ponder_sync (no indexer schemas, no ensnode.metadata). Preflight still applies to non-empty ponder_sync unless --force-or-confirm. Enables the developer workflow described in #833: quickly bootstrap a local ponder_sync @@ -219,7 +219,7 @@ ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint ### Common Options -- `--database-url` / `ENSDB_URL` -- PostgreSQL connection string +- `--ensdb-url` / `ENSDB_URL` -- PostgreSQL connection string for the source/target ENSDb. Optional; defaults to `process.env.ENSDB_URL`. - `--jobs` / `-j` -- parallelism for pg_dump/pg_restore (default: 4) - `--bucket` / `ENSDB_SNAPSHOT_BUCKET` -- S3 bucket name - `--endpoint` / `ENSDB_SNAPSHOT_ENDPOINT` -- S3-compatible endpoint (for R2, MinIO) @@ -376,7 +376,7 @@ ensdb-cli snapshot pull \ # 2. Restore into an isolated test database ensdb-cli snapshot restore \ - --database-url $TEST_DB_URL \ + --ensdb-url $TEST_DB_URL \ --input /tmp/snapshot \ --schemas mainnetSchema1.9.0 @@ -406,7 +406,7 @@ The `snapshot list` and `snapshot info` commands can also be used in CI to disco **Future command:** ``` -ensdb-cli analyze unknown-labels --database-url --schema [--top-n 100] [--output-format table|csv|json] +ensdb-cli analyze unknown-labels [--ensdb-url ] --schema [--top-n 100] [--output-format table|csv|json] Count unknown names, unknown labels (distinct and non-distinct), and return the top-N most frequent unknown labels with occurrence counts. Uses @ensnode/ensdb-sdk typed access to domain tables. @@ -424,24 +424,21 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa 3. **Restore safety**: `snapshot restore` assumes a fresh or isolated database; **preflight** enforces this (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas) unless `--force-or-confirm` is passed. 4. **Restore behavior**: Preflight runs first, then fail if a target indexer schema already exists unless `--drop-existing` is passed. `--drop-existing` does not bypass preflight; only `--force-or-confirm` does. Prevents accidental data loss while keeping an explicit escape hatch. 5. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. +6. **Snapshot IDs**: Snapshot IDs are auto-generated and immutable in v1 (no operator override). +7. **Streaming uploads**: v1 stays local-first (`snapshot create` then `snapshot push`). No streaming/pipe-to-S3 mode in v1. -## Open Questions for Stakeholders - -1. **Snapshot ID format**: Should snapshot IDs be auto-generated (e.g. `ensdb-2026-04-06-abc123`) or user-specified? Auto-generated is safer for avoiding collisions. -2. **Streaming upload mode**: Should v1 support a direct "snapshot and push" flow that uploads artifacts to S3 as they are produced, or should v1 stay local-first (`snapshot create` then `snapshot push`)? - - **Why directory format complicates streaming:** `pg_dump --format=directory` writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). + **Rationale (kept for future roadmap):** - **Other `pg_dump` formats do not remove all constraints:** + - **Why directory format complicates streaming:** `pg_dump --format=directory` writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). + - **Other `pg_dump` formats do not remove all constraints:** + - **Custom** (`pg_dump --format=custom` / `-Fc`): single-file output and can be streamed (e.g. piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. + - **Plain** (`pg_dump --format=plain`): single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. + - **Checksums and manifest:** The plan includes `checksums.sha256` and a `manifest.json` with per-artifact sizes. For end-to-end streaming without a local file: + - Per-artifact SHA-256 can still be computed by hashing bytes as they pass through the upload pipeline (digest alongside the stream), then writing the digest into `checksums.sha256` and the manifest after that artifact finishes. + - Alternatively, S3 object checksums (multipart part ETags, or `ChecksumSHA256` on `PutObject` where supported) can supplement or replace client-side files, but the manifest must state what is verified (client hash vs object checksum). + - The full snapshot manifest cannot be finalized until all artifacts are complete, so uploads can be incremental, but manifest upload is always last (or use a two-phase manifest: provisional then final). - - **Custom** (`pg_dump --format=custom` / `-Fc`): single-file output and can be streamed (e.g. piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. - - **Plain** (`pg_dump --format=plain`): single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. - - **Checksums and manifest:** The plan includes `checksums.sha256` and a `manifest.json` with per-artifact sizes. For **end-to-end streaming** without a local file: - - - Per-artifact SHA-256 can still be computed by hashing bytes **as they pass through** the upload pipeline (running a digest alongside the stream), then writing the digest into `checksums.sha256` and the manifest **after** that artifact finishes. - - Alternatively, **S3 object checksums** (multipart part ETags, or `ChecksumSHA256` on `PutObject` where supported) can supplement or replace client-side files, but the manifest must state what is verified (client hash vs object checksum). - - The **full snapshot manifest** cannot be finalized until all artifacts are complete, so uploads can be incremental, but **manifest upload is always last** (or use a two-phase manifest: provisional then final). +## Open Questions for Stakeholders - **Conclusion for stakeholders:** Decide between (a) v1 local-first only, (b) **hybrid** — each schema completes locally, then upload (simplest integrity story), or (c) true pipe-to-S3 with streaming hash and deferred manifest. +1. **Snapshot ID format**: Confirm the exact auto-generated format (e.g. `ensdb-YYYY-MM-DDTHHMMSSZ-` vs `{primarySchemaName}-...`). v1 does not allow overriding the generated ID. From 5eb5cafbe37742c3e0aa06a83752028e3eccea3e Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 13 Apr 2026 01:22:44 +0200 Subject: [PATCH 07/10] Refine ENSdb CLI plan with enhanced snapshot creation and restoration details, including improved metadata handling, new snapshot verification process, and clearer command descriptions for S3-compatible storage interactions. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 273 +++++++++++++----- 1 file changed, 198 insertions(+), 75 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 361719ba1..94bbf42c6 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -15,19 +15,22 @@ todos: content: Implement pg_dump/pg_restore wrapper with parallel jobs and progress reporting status: pending - id: snapshot-create - content: "Implement snapshot create: dump discovered indexer schemas + ponder_sync + full ensnode schema + ensnode_metadata.json, then generate manifest + checksums" + content: "Implement snapshot create: dump indexer schemas + ponder_sync + ensnode + ensnode_metadata.json; manifest includes postgresVersion, ensnode.drizzleMigrations fingerprint, checksums" status: pending - id: snapshot-restore - content: "Implement snapshot restore: preflight-restore checks, unpack archives, validate manifest, pg_restore + filtered metadata upsert" + content: "Implement snapshot restore: preflight (fresh DB, conflicts, ENSNODE_BOOTSTRAP_REQUIRED, version/Drizzle/build compatibility vs manifest), --skip-preflight escape hatch; full + selective + --bootstrap-ensnode paths" status: pending - id: s3-client - content: Implement S3 client layer with multipart upload/download support + content: Implement S3-compatible client layer with multipart upload/download support status: pending - id: snapshot-push content: "Implement snapshot push: upload snapshot artifacts and manifest to S3-compatible storage" status: pending - id: snapshot-pull - content: "Implement snapshot pull: download from S3, verify checksums" + content: "Implement snapshot pull: download from S3-compatible storage, verify checksums" + status: pending + - id: snapshot-verify + content: "Implement snapshot verify: validate a local snapshot manifest and checksums without restoring" status: pending - id: snapshot-list-info content: Implement snapshot list and snapshot info commands for browsing remote snapshots @@ -59,10 +62,10 @@ ENSNode production databases are 50-100GB PostgreSQL instances. Each chain deplo - `sepoliaSchema1.9.0` -- sepolia - `v2SepoliaSchema1.9.0` -- v2 sepolia - **Shared schemas (2):** - - `ensnode` -- metadata table (rows scoped by `ens_indexer_schema_name`) + - `ensnode` -- application schema: `metadata` table (rows scoped by `ens_indexer_schema_name`) + `__drizzle_migrations` (Drizzle migration journal). Full schema must be preserved on full restore. - `ponder_sync` -- shared RPC cache and sync state (needed by every indexer) -Schema names are set via `ENSINDEXER_SCHEMA_NAME` env var in the blue-green deploy workflow (`[.github/workflows/deploy_ensnode_blue_green.yml](.github/workflows/deploy_ensnode_blue_green.yml)`). Old schemas are orphaned on redeploy and must be dropped manually to reclaim space. +Schema names are set via `ENSINDEXER_SCHEMA_NAME` env var in the blue-green deploy workflow (`[.github/workflows/deploy_ensnode_blue_green.yml](.github/workflows/deploy_ensnode_blue_green.yml)`). Old schemas are orphaned on redeploy and must be dropped manually to reclaim space. Also, all orphaned records in ENSNode Metadata tables must be deleted manually. The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snapshotting and restoring database state. @@ -74,46 +77,109 @@ The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snap ## Architecture Decisions +### Snapshot UX principles + +The snapshot system should optimize for a fast, low-friction developer experience while staying operationally safe for 50–100GB databases. + +- **Restore modes**: keep the user-facing model simple. + - **Full** (omit `--schemas`): restore every indexer dump in the snapshot + `ponder_sync` + **`ensnode` via `pg_restore` from `ensnode.dump.tar.zst`** (includes `metadata` and `__drizzle_migrations`). Do **not** rebuild `ensnode` from JSON alone in this mode; `ensnode_metadata.json` is redundant for data integrity but still useful for listing and verification. + - `ponder_sync` only (`--ponder-sync-only`) + - **Selective**: chosen indexer schema(s) (`--schemas ...`) plus the matching rows from `ensnode.metadata` (replayed from `ensnode_metadata.json`); optional `ponder_sync`. Does **not** `pg_restore` the full `ensnode` dump (would clobber other indexers’ metadata). + + `ponder_sync` is **included by default** for selective restore. Pass **`--without-ponder-sync`** to opt out. + + - **Read-only consumers** (ENSApi serving, analytics, any app that will not run ENSIndexer): pass `--without-ponder-sync` — it is not needed. + - **Indexer operators** (restore then run ENSIndexer to keep the DB updated): keep `ponder_sync` (the default) — it preserves sync/RPC cache state and speeds up reaching a healthy following state. +- **Manifest-driven tooling**: `manifest.json` is the source of truth for: + - which artifacts exist under `{prefix}/{snapshot-id}/` + - per-artifact sizes and checksums + - metadata required to derive UI “capabilities” (via `deriveCapabilities(...)`, not stored in the manifest) +- **Resumable + retry downloads (roadmap)**: Large snapshot downloads should tolerate flaky networks. + - download to a `.part` file + - resume via HTTP range requests + - retry with backoff on failure + +S3-compatible storage supports range reads, so `snapshot pull` can add an optional resumable mode in a later iteration (v1 can stay simple). + ### Snapshot Composition A full snapshot contains **separate pg_dump archives** for: -1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `--exclude-schemas` (or an allowlist flag) so they are not dumped by mistake. +1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `--ignore-schemas` so they are not dumped by mistake. When `--ignore-schemas` is used, the manifest should record which schemas were **ignored** (so consumers can tell whether the source DB had additional schemas that were intentionally excluded). 2. The `ponder_sync` schema (full) -3. The `ensnode` schema (full), not only the `metadata` table +3. The **`ensnode` schema (full `pg_dump`)** → `ensnode.dump.tar.zst`, same directory-format + tar+zstd pipeline as other schemas. This captures **`metadata`, `__drizzle_migrations`,** and any other objects under `ensnode` so a **full** restore is faithful to the source DB. +4. **`ensnode_metadata.json`** -- export of all `ensnode.metadata` rows (JSON). Used for manifest enrichment, `snapshot list` summaries, and **selective** restore (replay only the rows for chosen indexer schema names). It is **not** a substitute for `ensnode.dump.tar.zst` on full restore. + +**Optional manifest enrichment (best-effort):** -Additionally, the snapshot includes `ensnode_metadata.json`: a JSON export of all rows from `ensnode.metadata`. This supports manifest enrichment, `snapshot list` summaries, and **selective restore** (replay only the metadata rows for chosen indexer schema names without requiring a second full `ensnode` download when operators use the slim pull path). +- `ponderSync.chainIdsPresent` -- list of chain IDs observed in the `ponder_sync` RPC cache at snapshot time (if derivable cheaply and deterministically from the current `ponder_sync` schema). This should be **best-effort**: if the CLI can’t derive it reliably (schema differs, missing columns, etc.), omit the field rather than failing snapshot creation. -Selective workflows: +**Full snapshot workflows:** -1. `snapshot pull --schemas ...` downloads the selected indexer archives + `ponder_sync` + `ensnode_metadata.json` (+ optionally full `ensnode.dump.tar.zst` when a full `ensnode` restore is required -- see implementation note below) -2. `snapshot restore --schemas ...` restores the selected indexer schema dumps + `ponder_sync`, then applies **filtered** `ensnode.metadata` rows (from JSON or from a partial upsert strategy) so other indexers' metadata rows are not overwritten incorrectly +1. `snapshot pull` with no `--schemas` downloads all artifacts, including `ensnode.dump.tar.zst` and `ensnode_metadata.json`. +2. `snapshot restore` with no `--schemas` runs preflight for a full clone, then `pg_restore`s all indexer dumps, `ponder_sync`, and **`ensnode`** from `ensnode.dump.tar.zst`. JSON is not the source of truth for `ensnode` in this path. + +**Selective workflows:** + +1. `snapshot pull --schemas ...` downloads the selected indexer archives + `ensnode_metadata.json` + `ensnode.dump.tar.zst` and (by default) `ponder_sync`. The `ensnode` dump is always included because it is very small (tens of KB). For a **clean** target database, selective restore can bootstrap Drizzle state from the snapshot (see below). +2. `snapshot restore --schemas ...` restores the selected indexer schema dumps and reconciles `ensnode.metadata`. On a **clean** DB this **must** also establish **`ensnode.__drizzle_migrations`** (JSON alone cannot do that). Two supported approaches (document which is default for CI): + - **Migrate-first:** Run ENSDb/ENSIndexer migrations against the empty database **before** `snapshot restore --schemas`, creating `ensnode` + `__drizzle_migrations`; then the CLI restores indexer dumps and **upserts** filtered rows from `ensnode_metadata.json`. + - **Bootstrap-from-snapshot:** Pass **`--bootstrap-ensnode`** (requires `ensnode.dump.tar.zst` under `--input`). The CLI `pg_restore`s the full `ensnode` schema (migrations + `metadata` as captured), **deletes** `ensnode.metadata` rows whose `ens_indexer_schema_name` is **not** in `--schemas`, then restores the selected indexer dumps. Optionally still apply JSON upsert for the target schemas to match checksums exactly. Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. The CLI enforces that with **preflight checks** (below) instead of relying on operator discipline alone. -**Implementation note:** For a **full** restore of everything, restore `ponder_sync`, `ensnode` (from `ensnode.dump.tar.zst`), and each indexer schema. For **selective** restore, the CLI may restore indexer schema(s) + `ponder_sync` and upsert only the relevant metadata rows from `ensnode_metadata.json` instead of replacing the entire `ensnode` schema, to avoid clobbering metadata for indexers not being restored. Exact mechanics should be validated against how Drizzle/Ponder expect `ensnode` to look after restore. +**Implementation notes:** + +- **Full restore:** `pg_restore` the `ensnode` dump so `__drizzle_migrations` matches the snapshot source. If JSON and dump ever disagree, **the dump wins** (JSON is auxiliary). +- **Selective restore (shared `ensnode` already present):** Do not `pg_restore` `ensnode.dump.tar.zst` (would clobber `__drizzle_migrations` / other indexers' metadata). Replay only the relevant `ensnode.metadata` rows from `ensnode_metadata.json` after preflight passes. **Upsert semantics:** `INSERT ... ON CONFLICT (ens_indexer_schema_name, key) DO UPDATE SET value = EXCLUDED.value` (primary key from `ensdb-sdk`). +- **Selective restore (clean database):** You **must** get `__drizzle_migrations` from somewhere: either **migrate-first** (app creates `ensnode`) or **`--bootstrap-ensnode`** from `ensnode.dump.tar.zst` + metadata prune. Failing that, preflight should fail with a dedicated error (e.g. `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_BOOTSTRAP_REQUIRED`) if `ensnode` / `__drizzle_migrations` is missing when `--bootstrap-ensnode` was not used and migrations were not run. Exact rules should be validated against how ENSIndexer/ENSApi expect `ensnode` after a partial restore. ### Preflight checks (`snapshot restore`) -Runs in the **restore command handler** immediately after validating CLI args and loading the manifest, and **before** any `pg_restore` and **before** evaluating `--drop-existing` for target indexer schemas. +Runs in the **restore command handler** immediately after validating CLI args, loading the manifest, and computing the **effective restore plan** (target schemas, whether `ponder_sync` will be restored, whether `ensnode` will be replaced, and whether `--drop-existing` applies). Preflight still runs **before** any destructive action and **before** any `pg_restore`. **Checks (fail closed by default):** -1. **ponder_sync non-empty:** If schema `ponder_sync` exists, detect any user data (e.g. `SELECT EXISTS (SELECT 1 FROM ponder_sync. LIMIT 1)` or sum `n_live_tup` from `pg_stat_user_tables` for `ponder_sync`). If non-empty, fail with identifier `ENSDB_CLI_ERR_PREFLIGHT_PONDER_SYNC_NONEMPTY` (human-readable message explains shared sync state would be overwritten). -2. **ensnode.metadata conflicts:** Query `ensnode.metadata` if the schema exists. For **selective** restore (`--schemas`), fail if any row exists whose `ens_indexer_schema_name` is **not** in the target schema set (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_CONFLICT`). For **full** restore (all indexer schemas + full `ensnode` from dump), fail if **any** metadata rows exist (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_NONEMPTY`) — distinct message from selective conflict so operators know which case fired. +1. **ponder_sync non-empty:** If schema `ponder_sync` exists, check it **deterministically**: enumerate all base tables in `ponder_sync`; if any table contains at least one row, fail with identifier `ENSDB_CLI_ERR_PREFLIGHT_PONDER_SYNC_NONEMPTY`. If the schema exists but has no base tables, treat it as empty. Do **not** rely on `pg_stat_user_tables` estimates for this guardrail. +2. **ensnode / metadata conflicts:** + - **Full** restore (no `--schemas`): treat **`ensnode` as a unit**. If schema `ensnode` exists and has any user objects, fail with `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_NONEMPTY` — **unless** `--drop-existing` is also set (preflight recognizes it will be dropped in the next step) **or** `--skip-preflight`. + - **Selective** restore (`--schemas`): if `ensnode.metadata` exists, fail if any row’s `ens_indexer_schema_name` is **not** in the target schema set (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_CONFLICT`). If **`ensnode` is absent** (or `__drizzle_migrations` missing / empty when required) and **`--bootstrap-ensnode` is not set**, fail with `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_BOOTSTRAP_REQUIRED` (message: run migrations first or pass `--bootstrap-ensnode`). If **`--bootstrap-ensnode` is set** but **`ensnode` already exists** with objects, fail unless **`--drop-existing`** is set (will drop `ensnode` before bootstrap restore). 3. **Unexpected non-system schemas / objects:** Enumerate schemas outside PostgreSQL system namespaces (`pg_*`, `information_schema`, etc.). For the intended restore set, fail if a **non-target indexer schema** already exists (tables present or schema present) (`ENSDB_CLI_ERR_PREFLIGHT_UNEXPECTED_SCHEMA`). Optionally extend with a stricter mode: fail if `public` (or other default) contains unexpected user tables — keep the rule deterministic in code and documented. +4. **Version / compatibility (target already has `ensnode` or you use migrate-first):** When restoring **into** a database that will keep an existing `ensnode` schema (selective restore **without** `--bootstrap-ensnode`, or any path where Drizzle state is not fully replaced by the snapshot’s `ensnode` dump), compare **snapshot manifest** (and optionally `ensnode_metadata.json`) to **live target** state. **Fail closed** if incompatible unless **`--skip-preflight`** is passed. + +### Version and build compatibility (`snapshot restore`) + +Restoring beside **existing** migration or metadata state is unsafe if versions diverge: the indexer schema dump may assume tables/columns from migration set **A** while the target’s `__drizzle_migrations` reflects set **B**, or `ensnode.metadata` may carry **versionInfo** / build identifiers that no longer match the app you intend to run. + +**At `snapshot create`**, record enough in `manifest.json` to compare on restore (exact shape validated during implementation): + +- **`postgresVersion`** (already planned) — target server should be **same major** (and ideally same minor) as the source; mismatch → `ENSDB_CLI_ERR_PREFLIGHT_PG_VERSION_MISMATCH`. +- **`ensnode.drizzleMigrations`** — fingerprint of source `ensnode.__drizzle_migrations` at snapshot time, e.g. ordered list of migration **tags** (or hashes) from Drizzle’s journal. Enables comparing to the **target** `__drizzle_migrations` **before** `pg_restore` when `ensnode` already exists. +- **`indexerConfig.versionInfo`** / **`ensdb_version` metadata** — ENSDb, Ponder, ENSIndexer semver (already in manifest enrichment). If the target `ensnode.metadata` (for the same `ens_indexer_schema_name`) or a CLI flag **`--expected-ensdb-version`** disagrees with the snapshot for the schemas being restored → `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_VERSION_MISMATCH` (or split per product if needed). +- **Build / git / image id** — If `ensindexer_public_config` or other metadata embeds a **build id** or git SHA used operationally, treat it like semver: snapshot vs target mismatch is an **error** by default in v1 (operators can use `--skip-preflight` to bypass all checks). This keeps v1 simple and strict; a future `--allow-build-mismatch` flag can relax it if needed. + +**Rules of thumb:** -**Override:** Pass `--force-or-confirm` to skip these preflight failures and proceed (operator asserts the target is disposable or they accept clobbering). Implementation may require interactive confirmation when TTY is available; non-interactive mode requires the flag. This is separate from `--drop-existing`, which only governs dropping **named target indexer schemas** (and optionally `ponder_sync` / `ensnode` if explicitly documented) **after** preflight passes or is overridden. +- **Full restore into an empty database:** After wipe + restore, `ensnode` comes entirely from the snapshot dump, so **Drizzle row mismatch on target** does not apply pre-restore. Still check **PostgreSQL major** compatibility with the dump (`pg_restore` / server). +- **Selective + `--bootstrap-ensnode`:** Replaces `ensnode` from the snapshot (after optional drop); fingerprint in manifest should **match** the restored dump (self-consistent). +- **Selective + migrate-first or shared `ensnode`:** Target **`__drizzle_migrations` must match** the snapshot’s `ensnode.drizzleMigrations` fingerprint (or target must be a strict superset with identical applied tags for shared migrations — pick one deterministic rule in code; **default strict equality** is simplest). Otherwise the restored indexer tables and live migration history can disagree. + +**`--drop-existing` scope:** + +- **Full** restore: drops **all schemas that will be restored** — every indexer schema in the manifest + `ponder_sync` + `ensnode`. Preflight recognizes this flag and suppresses "non-empty" checks for schemas that will be dropped. +- **Selective** restore: drops only the **named `--schemas`** targets. If `--bootstrap-ensnode` is also set, additionally drops `ensnode`. Drops `ponder_sync` only when `ponder_sync` is being restored (i.e. default behavior, not `--without-ponder-sync`). + +**`--skip-preflight`:** skips **all** preflight checks (freshness, conflicts, version / Drizzle / build compatibility). Use only when the operator explicitly accepts overwriting shared state, clobbering metadata, and version skew. Log a clear **stderr warning** when this flag is used (no interactive confirmation — the flag name is the confirmation). `--drop-existing` does **not** bypass preflight; it only suppresses non-empty checks for schemas it will drop. `--skip-preflight` bypasses everything. **Order of operations:** -1. Preflight (unless skipped via `--force-or-confirm`) -2. If restoring indexer schemas and targets exist: apply `--drop-existing` drops for those schemas only (as today) -3. `pg_restore` / metadata upsert +1. Preflight (unless skipped via **`--skip-preflight`**) +2. If `--drop-existing` is set and targets exist: **full** drops all indexer schemas + `ponder_sync` + `ensnode`; **selective** drops named `--schemas` + `ponder_sync` (when being restored) + `ensnode` (when `--bootstrap-ensnode` is set) +3. `pg_restore` for dumps (full restore includes `ensnode` from `ensnode.dump.tar.zst`). **Selective:** restore indexer dumps (+ optional `ponder_sync`); then either **`--bootstrap-ensnode`** (`pg_restore` `ensnode` + prune metadata) **or** JSON upsert only onto an `ensnode` that already has migrations (migrate-first path). **Surfacing errors:** Print distinct messages per failure class; include the `ENSDB_CLI_ERR_PREFLIGHT_*` identifier in the message (and optionally `process.exit` with dedicated codes, e.g. `2` / `3` / `4`, if the team wants scriptable CI — document in README). -**Selective restore:** Preflight must ensure metadata upsert will not clobber other indexers: the `ensnode.metadata` row check above is mandatory before replaying filtered `ensnode_metadata.json`. +**Selective restore:** Preflight must ensure metadata upsert will not clobber other indexers on shared DBs: the `ensnode.metadata` row check above is mandatory before replaying filtered `ensnode_metadata.json`. On **clean** DBs, preflight must ensure **`__drizzle_migrations` will exist** after the command (migrate-first completed, or `--bootstrap-ensnode` with dump on disk). ### Snapshot Format @@ -123,11 +189,13 @@ Use `pg_dump` with `--format=directory` and `--jobs=N` for parallel dump/restore - Archive: `tar --zstd -cf /.dump.tar.zst -C .dumpdir` - Restore: unpack `.dump.tar.zst` to a temp directory, then run `pg_restore --format=directory --jobs=4 --schema= /.dumpdir` -The implementation should explicitly budget temporary disk usage for both the compressed archive and the unpacked directory during restore. +The implementation should explicitly budget temporary disk usage for both the compressed archive and the unpacked directory during restore. To reduce peak disk usage, process schemas **sequentially** during `snapshot create`: dump one schema to a directory, archive it, delete the directory, then proceed to the next. During `snapshot restore`, similarly unpack and restore one archive at a time, deleting the unpacked directory after each `pg_restore` completes. On failure or interrupt, clean up temp directories (register a process exit handler / signal trap). + +**Checksum verification:** Verify `checksums.sha256` at two points: (1) after `snapshot pull` completes (before returning success), and (2) at the start of `snapshot restore --input` before any `pg_restore`. If the snapshot was created locally via `snapshot create`, the restore verification catches corruption from disk issues. **Tooling prerequisites:** Archiving uses `tar` with zstd compression (`tar --zstd` or pipe to `zstd`). The Docker image and operator docs must include `tar`, `zstd`, and PostgreSQL client tools (`pg_dump`, `pg_restore`) compatible with the server major version. -### S3 Storage Layout +### S3-compatible Storage Layout Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containing a `manifest.json` and per-schema dump files: @@ -136,19 +204,20 @@ Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containi {snapshot-id}/ manifest.json # snapshot metadata (all schemas, sizes, versions) {schema-name}.dump.tar.zst # archived pg_dump directory output (one per indexer schema) - ponder_sync.dump.tar.zst # archived pg_dump of ponder_sync - ensnode.dump.tar.zst # archived pg_dump of full ensnode schema - ensnode_metadata.json # all ensnode.metadata rows (JSON; for manifest + selective metadata replay) + ponder_sync.dump.tar.zst # archived dump of ponder_sync + ensnode.dump.tar.zst # full pg_dump of schema ensnode (metadata + __drizzle_migrations + ...) + ensnode_metadata.json # all ensnode.metadata rows (JSON; listing + selective replay; auxiliary on full restore) checksums.sha256 # integrity verification ``` -- `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` for metadata display. -- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ponder_sync.dump.tar.zst` + `ensnode_metadata.json`. Downloading `ensnode.dump.tar.zst` is optional for selective restore if metadata replay from JSON is sufficient; include it when doing a full `ensnode` schema restore. +- `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` **in parallel** (with a concurrency limit, e.g. 10) for metadata display. Supports `--limit ` (default: 20) to cap the number of snapshots shown and avoid slow listing when many snapshots exist. Results are sorted by `createdAt` descending (newest first). +- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ensnode_metadata.json` + `ensnode.dump.tar.zst` (always included; negligible size) and (by default) `ponder_sync.dump.tar.zst`. Add `--without-ponder-sync` for read-only consumers that do not plan to run ENSIndexer after restore. +- `snapshot pull` with no `--schemas` downloads **all** artifacts including `ensnode.dump.tar.zst`. ### Technology - **CLI framework**: yargs (consistent with ENSRainbow's `apps/ensrainbow/src/cli.ts`) -- **S3**: `@aws-sdk/client-s3` + `@aws-sdk/lib-storage` (multipart uploads for large files) +- **S3-compatible storage**: `@aws-sdk/client-s3` + `@aws-sdk/lib-storage` (multipart uploads for large files). Uses the standard AWS SDK credential chain (env vars `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` / `AWS_REGION`, shared config files, IAM roles). No custom auth flags. - **Database**: `pg` for connection validation, shells out to `pg_dump`/`pg_restore` for actual operations - **Runtime**: tsx (consistent with other apps) - **Validation**: zod @@ -176,45 +245,57 @@ ensdb-cli schema drop [--ensdb-url ] --schema [--force] ### Snapshot Operations ``` -ensdb-cli snapshot create [--ensdb-url ] --output [--exclude-schemas ] - Export all discovered indexer schemas + ponder_sync + full ensnode schema + ensnode.metadata JSON. - Runs pg_dump with parallel jobs for each schema. Use --exclude-schemas to skip unrelated app schemas. - -ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--drop-existing] [--force-or-confirm] - Restore selected schema(s) from a local snapshot into a fresh or isolated database. - Restores the selected indexer schema dump(s) + ponder_sync + filtered ensnode.metadata rows. - Runs preflight (see above) before pg_restore; fails if shared state or metadata conflicts unless --force-or-confirm. +ensdb-cli snapshot create [--ensdb-url ] --output [--ignore-schemas ] [--jobs ] + Export all discovered indexer schemas + ponder_sync + full ensnode schema (ensnode.dump.tar.zst) + ensnode_metadata.json. + Runs pg_dump with parallel jobs per dumped schema. Use --ignore-schemas to skip unrelated app schemas. + +ensdb-cli snapshot restore [--ensdb-url ] --input [--drop-existing] [--skip-preflight] [--jobs ] + Full restore: all indexer dumps in the snapshot + ponder_sync + ensnode (from ensnode.dump.tar.zst). + Omit --schemas. Preflight requires a fresh/isolated target unless `--skip-preflight` is used, or `--drop-existing` is set for the schemas that will be replaced. + Unpacks archives, then pg_restore with parallel jobs. ensnode_metadata.json is not the source of truth for ensnode. + +ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--bootstrap-ensnode] [--drop-existing] [--skip-preflight] [--jobs ] + Selective restore into a fresh or isolated database. + Restores the selected indexer schema dump(s) + ponder_sync when that artifact is present under --input. + **Clean DB:** either run migrations first, then apply filtered ensnode.metadata from JSON; or pass --bootstrap-ensnode (requires ensnode.dump.tar.zst in --input) to pg_restore ensnode (includes __drizzle_migrations) and prune metadata to --schemas. + **Shared ensnode:** omit --bootstrap-ensnode; upsert filtered metadata from JSON only. + Runs preflight before pg_restore; fails if shared state or metadata conflicts unless --skip-preflight. Fails if target indexer schema already exists unless --drop-existing is passed (after preflight). Unpacks `.dump.tar.zst` archives to temp storage, then runs pg_restore with parallel jobs. -ensdb-cli snapshot restore [--ensdb-url ] --input --ponder-sync-only [--drop-existing] [--force-or-confirm] - Restore only ponder_sync (no indexer schemas, no ensnode.metadata). - Preflight still applies to non-empty ponder_sync unless --force-or-confirm. +ensdb-cli snapshot restore [--ensdb-url ] --input --ponder-sync-only [--drop-existing] [--skip-preflight] [--jobs ] + Restore only ponder_sync (no indexer schemas, no ensnode dump, no ensnode.metadata changes). + Preflight still applies to non-empty ponder_sync unless --skip-preflight. Enables the developer workflow described in #833: quickly bootstrap a local ponder_sync so a new indexer can skip RPC re-fetching. ensdb-cli snapshot push --input --bucket [--endpoint ] [--prefix ] Upload a local snapshot to S3-compatible storage. Uses multipart upload. -ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] [--schemas ] [--with-ensnode-schema] - Download from S3. If --schemas specified, downloads those indexer dumps + ponder_sync + ensnode_metadata.json; - pass --with-ensnode-schema to also fetch ensnode.dump.tar.zst for a full ensnode pg_restore. - If --schemas omitted, downloads the full snapshot (all artifacts). +ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] [--schemas ] [--without-ponder-sync] + Download from S3-compatible storage. If --schemas specified, downloads those indexer dumps + ensnode_metadata.json + ensnode.dump.tar.zst and (by default) ponder_sync. + pass --without-ponder-sync to skip ponder_sync (trade-offs: restored indexer may re-fetch RPC state). + If --schemas omitted, downloads the full snapshot. --prefix scopes all keys to `{prefix}/{snapshot-id}/...` (same as list/push/info/delete). ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] --ponder-sync-only - Download only ponder_sync.dump.tar.zst from a remote snapshot (#833). + Download only ponder_sync.dump.tar.zst from a remote snapshot in S3-compatible storage (#833). Skips all indexer schema dumps and ensnode_metadata.json. -ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] - List available snapshots from S3 with metadata summary (uses ListObjects + manifest.json). +ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] [--limit ] + List available snapshots from S3-compatible storage with metadata summary (uses ListObjects + manifest.json). + Default --limit 20, sorted newest first. ensdb-cli snapshot info --snapshot-id --bucket [--endpoint ] [--prefix ] Show detailed metadata for a specific remote snapshot (fetches and displays manifest.json under `{prefix}/{snapshot-id}/`). ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint ] [--prefix ] [--force] - Delete a snapshot and all its artifacts from S3. Requires --force or interactive confirmation. + Delete a snapshot and all its artifacts from S3-compatible storage. Requires --force or interactive confirmation. Removes all objects under `{prefix}/{snapshot-id}/`. + +ensdb-cli snapshot verify --input + Validate a local snapshot before restore: check manifest shape/version and verify `checksums.sha256`. + Does not connect to PostgreSQL or modify data. ``` ### Common Options @@ -224,13 +305,32 @@ ensdb-cli snapshot delete --snapshot-id --bucket [--endpoint - `--bucket` / `ENSDB_SNAPSHOT_BUCKET` -- S3 bucket name - `--endpoint` / `ENSDB_SNAPSHOT_ENDPOINT` -- S3-compatible endpoint (for R2, MinIO) - `--prefix` / `ENSDB_SNAPSHOT_PREFIX` -- key prefix inside the bucket (default empty). All snapshot S3 commands (`push`, `pull`, `list`, `info`, `delete`) must resolve object keys as `{prefix}/{snapshot-id}/...` so behavior matches when omitted (empty prefix) or when a shared prefix is used. -- `--force-or-confirm` -- snapshot restore only: skip preflight failures (non-empty `ponder_sync`, `ensnode.metadata` conflicts, unexpected schemas). Use only when the operator accepts overwriting shared state. +- `--skip-preflight` -- `snapshot restore` only: skip **all** preflight checks (non-empty `ponder_sync`, `ensnode.metadata` conflicts, unexpected schemas, PostgreSQL / Drizzle / ensdb / build-id compatibility). Dangerous; emits a stderr warning. v1 has no narrower “skip only version checks” flag — use this explicitly or fix the target DB first. - `--verbose` / `-v` -- detailed output ## Manifest Schema Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by reading `ensindexer_public_config` from `ensnode.metadata` -- no manual input needed for namespace, plugins, or chain IDs. +**Manifest version check:** On any command that reads a manifest (`snapshot list`, `snapshot restore`, `snapshot info`, `snapshot pull`, `snapshot verify`), the CLI must check the `version` field and fail with a clear error (e.g. "manifest version 2 is not supported by this CLI; upgrade ensdb-cli") if it encounters a version it does not support. + +### Deriving capabilities for UI + +ENSDb should compute “what this snapshot enables” dynamically at display time from: +- the manifest’s artifact list (which dump files are present) +- each schema’s `indexerConfig` (plugins, namespace, `isSubgraphCompatible`, etc.) + +Define a single function (used by `snapshot list` / `snapshot info` output formatting) that implements this deterministic logic: + +`deriveCapabilities({ manifest, schemaName? }) -> { flags, intendedUseCases }` + +Example outputs (computed, not stored): +- `fastBootstrap: true` (if required artifacts are present to avoid full reindex) +- `includesPonderSync: true` (if `ponder_sync.dump.tar.zst` exists) +- `selectiveRestoreSupported: true` (if `ensnode_metadata.json` exists and schema dumps are per-schema) +- `includesFullEnsnodeSchema: true` (if `ensnode.dump.tar.zst` is present in the manifest) +- `intendedUseCases: ["subgraphCompat", "alpha", "v2", "ciSmokeTests"]` (derived from `indexerConfig`) + ```json { "version": 1, @@ -261,16 +361,21 @@ Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by r ], "ponderSync": { "sizeBytes": 8000000000, - "dumpFile": "ponder_sync.dump.tar.zst" + "dumpFile": "ponder_sync.dump.tar.zst", + "chainIdsPresent": [1, 8453] }, - "ensnodeSchema": { - "sizeBytes": 1200000, - "dumpFile": "ensnode.dump.tar.zst" + "ensnode": { + "sizeBytes": 65536, + "dumpFile": "ensnode.dump.tar.zst", + "drizzleMigrations": { + "appliedTagsInOrder": ["0000_initial", "0001_..."] + } }, "metadata": { "file": "ensnode_metadata.json", "indexerSchemas": ["mainnetSchema1.9.0"] }, + "ignoredSchemas": [], "totalSizeBytes": 53000000000, "checksumFile": "checksums.sha256" } @@ -285,11 +390,18 @@ The `indexerConfig` is extracted from the three `ensnode.metadata` keys: This means `snapshot list` can show rich summaries like: ``` -ID Namespace Plugins Chains Created -mainnetSchema1.9.0-2026-04-06-abc123 mainnet subgraph 1 2026-04-06 -alphaSchema1.9.0-2026-04-06-def456 mainnet subgraph+basenames+6 6 2026-04-06 +Snapshot ID Schemas Total Size Created +ensdb-2026-04-06T120000Z-abc123 5 50 GB 2026-04-06 + mainnetSchema1.9.0 mainnet subgraph 1 chain + alphaSchema1.9.0 alpha subgraph+basenames+6 6 chains + sepoliaSchema1.9.0 sepolia subgraph 1 chain + ... +ensdb-2026-04-05T080000Z-def456 3 35 GB 2026-04-05 + ... ``` +Each row is a snapshot (not a schema). Schemas are listed as sub-entries. + ## Project Structure ``` @@ -306,16 +418,18 @@ apps/ensdb-cli/ snapshot-restore.ts # snapshot restore snapshot-push.ts # push to S3 snapshot-pull.ts # pull from S3 + snapshot-verify.ts # verify local snapshot manifest + checksums snapshot-list.ts # list remote snapshots snapshot-info.ts # remote snapshot info snapshot-delete.ts # delete remote snapshot prefix lib/ database.ts # pg connection, schema queries - preflight-restore.ts # fresh/isolated DB checks before pg_restore + preflight-restore.ts # fresh/isolated DB, conflicts, Drizzle/version/build compatibility before pg_restore pgdump.ts # pg_dump/pg_restore wrapper - s3.ts # S3 client, multipart upload/download + s3.ts # S3-compatible client, multipart upload/download manifest.ts # manifest read/write, validation snapshot.ts # snapshot directory management + checksum.ts # checksum generation and verification types.ts # shared types ``` @@ -331,18 +445,18 @@ apps/ensdb-cli/ ### Phase 2: Local Snapshot Create + Restore - Implement `pg_dump` wrapper with parallel jobs and progress reporting -- Implement `snapshot create` (dump all indexer schemas + ponder_sync + full metadata extraction) +- Implement `snapshot create` (dump all indexer schemas + ponder_sync + full `ensnode` schema dump + `ensnode_metadata.json` export) - Implement archive packaging and unpacking for directory-format dumps -- Implement `snapshot restore` (preflight in `preflight-restore.ts`, then pg_restore with parallel jobs) +- Implement `snapshot restore` (preflight in `preflight-restore.ts`, full + selective + `--bootstrap-ensnode` path, then pg_restore / metadata prune / JSON upsert) - Manifest generation and validation - Checksum generation and verification -### Phase 3: S3 Push + Pull + List + Delete +### Phase 3: S3-compatible Push + Pull + List + Delete -- S3 client with multipart upload support -- Shared helper: resolve `{prefix}/{snapshot-id}/` from `--prefix` / `ENSDB_SNAPSHOT_PREFIX` for every snapshot S3 command (`push`, `pull`, `list`, `info`, `delete`) +- S3-compatible client with multipart upload support +- Shared helper: resolve `{prefix}/{snapshot-id}/` from `--prefix` / `ENSDB_SNAPSHOT_PREFIX` for every snapshot S3-compatible command (`push`, `pull`, `list`, `info`, `delete`) - `snapshot push` with manifest and artifact upload only -- `snapshot pull` with integrity verification +- `snapshot pull` with integrity verification (optionally add `--resumable` + `.part` downloads + retries) - `snapshot list` and `snapshot info` for browsing remote snapshots - `snapshot delete` (list objects under prefix, batch delete, `--force` / confirmation) @@ -350,6 +464,7 @@ apps/ensdb-cli/ - Dockerfile (include `postgresql-client` for pg_dump/pg_restore) - Progress bars for large operations +- `snapshot verify --input ` command: verify local snapshot integrity (checksums) without restoring - Dry-run mode for destructive operations - Comprehensive error messages and recovery guidance - Documentation @@ -374,17 +489,20 @@ ensdb-cli snapshot pull \ --bucket $ENSDB_SNAPSHOT_BUCKET \ --output /tmp/snapshot -# 2. Restore into an isolated test database +# 2a. Restore into an isolated empty test database (bootstrap ensnode + Drizzle migrations from snapshot) ensdb-cli snapshot restore \ --ensdb-url $TEST_DB_URL \ --input /tmp/snapshot \ - --schemas mainnetSchema1.9.0 + --schemas mainnetSchema1.9.0 \ + --bootstrap-ensnode + +# 2b. Alternative: run ENSDb migrations against $TEST_DB_URL first, then restore without --bootstrap-ensnode # 3. Run smoke tests against the restored database ENSDB_URL=$TEST_DB_URL ENSINDEXER_SCHEMA_NAME=mainnetSchema1.9.0 pnpm test:smoke ``` -Each matrix entry pulls and restores a different schema, then runs ENSApi smoke tests against it. The selective pull avoids downloading the full 50-100GB snapshot for each matrix entry -- only the relevant schema dump plus `ponder_sync` are transferred. +Each matrix entry pulls and restores a different schema, then runs ENSApi smoke tests against it. The selective pull avoids downloading every indexer dump — only the chosen schema, `ponder_sync`, `ensnode.dump.tar.zst` (always included; negligible size), and `ensnode_metadata.json`. The `snapshot list` and `snapshot info` commands can also be used in CI to discover the latest available snapshot ID before pulling. @@ -419,13 +537,16 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa ## Resolved Decisions -1. **Discovery**: No shared `index.json`. Use S3 `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. -2. **Snapshot granularity**: `snapshot create` dumps all discovered indexer deployment schemas + `ponder_sync` + full `ensnode` schema + `ensnode_metadata.json`, with optional `--exclude-schemas` for unrelated app schemas. `snapshot pull` and `snapshot restore` let the user select which indexer schema(s) they want from that full snapshot; full `ensnode` schema archive is optional on pull when metadata JSON replay is enough. -3. **Restore safety**: `snapshot restore` assumes a fresh or isolated database; **preflight** enforces this (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas) unless `--force-or-confirm` is passed. -4. **Restore behavior**: Preflight runs first, then fail if a target indexer schema already exists unless `--drop-existing` is passed. `--drop-existing` does not bypass preflight; only `--force-or-confirm` does. Prevents accidental data loss while keeping an explicit escape hatch. -5. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. -6. **Snapshot IDs**: Snapshot IDs are auto-generated and immutable in v1 (no operator override). -7. **Streaming uploads**: v1 stays local-first (`snapshot create` then `snapshot push`). No streaming/pipe-to-S3 mode in v1. +1. **Discovery**: No shared `index.json`. Use S3-compatible `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. +2. **Snapshot granularity**: `snapshot create` dumps all discovered indexer deployment schemas + `ponder_sync` + **full `ensnode` schema** (`ensnode.dump.tar.zst`) + `ensnode_metadata.json`, with optional `--ignore-schemas` for unrelated app schemas. **Full** `snapshot pull` / `snapshot restore` include the `ensnode` dump. **Selective** `pull` always includes `ensnode.dump.tar.zst` (negligible size). **Selective** `restore` either **upserts** metadata from JSON onto an existing migrated `ensnode`, or uses **`--bootstrap-ensnode`** to `pg_restore` the dump and prune metadata to `--schemas` so `__drizzle_migrations` matches the snapshot. +3. **Restore safety**: `snapshot restore` assumes a fresh or isolated database; **preflight** enforces this (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas, version compatibility) unless **`--skip-preflight`** is passed. +4. **Restore behavior**: Preflight runs first, then fail if a target indexer schema already exists unless `--drop-existing` is passed. `--drop-existing` does not bypass preflight but **suppresses non-empty checks** for schemas it will drop; only **`--skip-preflight`** bypasses all checks. Prevents accidental data loss while keeping an explicit escape hatch. +5. **`--drop-existing` scope**: Full restore drops all indexer schemas + `ponder_sync` + `ensnode`. Selective restore drops named `--schemas` targets + `ponder_sync` (when being restored) + `ensnode` (when `--bootstrap-ensnode` is set). +6. **`ponder_sync` default**: Included by default for selective restore. Opt out with `--without-ponder-sync`. +7. **`--bootstrap-ensnode` + `--drop-existing`**: When both are set and `ensnode` already exists, `--drop-existing` drops `ensnode` before bootstrap restore. +8. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. +9. **Snapshot IDs**: Snapshot IDs are auto-generated and immutable in v1 (no operator override). +10. **Streaming uploads**: v1 stays local-first (`snapshot create` then `snapshot push`). No streaming/pipe-to-S3 mode in v1. **Rationale (kept for future roadmap):** @@ -441,4 +562,6 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa ## Open Questions for Stakeholders 1. **Snapshot ID format**: Confirm the exact auto-generated format (e.g. `ensdb-YYYY-MM-DDTHHMMSSZ-` vs `{primarySchemaName}-...`). v1 does not allow overriding the generated ID. - +2. **Clean DB selective restore default for CI/docs:** Prefer **migrate-first** (requires running the repo migration command with a matching ENSDb version) or **`--bootstrap-ensnode`** (self-contained from snapshot; `ensnode.dump.tar.zst` is always present in pulled snapshots)? +3. **ponder_sync chain IDs:** Is there a stable, canonical way to derive `ponderSync.chainIdsPresent` from the current `ponder_sync` schema (table/column to read), or should the CLI treat this as a purely best-effort hint with no guarantees? +4. **pg_dump parallel jobs:** Is `pg_dump --format=directory --jobs=N` actually faster than single-threaded dump for our schemas? Each indexer schema has ~12 tables, so parallelism across tables within a single schema may yield limited benefit. Benchmark before committing to directory format as the only path — `pg_dump --format=custom` (single file, no parallel restore) would simplify the archive/unpack pipeline significantly. If directory format is not measurably faster, consider switching to custom format for v1. From ff68be61fcb9b98629e0023795300028c48e739e Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 13 Apr 2026 01:30:44 +0200 Subject: [PATCH 08/10] Expand ENSdb CLI plan to include new considerations for empty-database-only and full-snapshot-only restore options, simplifying the restore command and addressing trade-offs related to preflight checks and CI test matrix implications. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 94bbf42c6..5aa3da8d5 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -565,3 +565,5 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa 2. **Clean DB selective restore default for CI/docs:** Prefer **migrate-first** (requires running the repo migration command with a matching ENSDb version) or **`--bootstrap-ensnode`** (self-contained from snapshot; `ensnode.dump.tar.zst` is always present in pulled snapshots)? 3. **ponder_sync chain IDs:** Is there a stable, canonical way to derive `ponderSync.chainIdsPresent` from the current `ponder_sync` schema (table/column to read), or should the CLI treat this as a purely best-effort hint with no guarantees? 4. **pg_dump parallel jobs:** Is `pg_dump --format=directory --jobs=N` actually faster than single-threaded dump for our schemas? Each indexer schema has ~12 tables, so parallelism across tables within a single schema may yield limited benefit. Benchmark before committing to directory format as the only path — `pg_dump --format=custom` (single file, no parallel restore) would simplify the archive/unpack pipeline significantly. If directory format is not measurably faster, consider switching to custom format for v1. +5. **v1 scope: empty-database-only restore?** If `snapshot restore` only targets **empty databases**, the entire preflight matrix (non-empty `ponder_sync`, `ensnode` conflicts, unexpected schemas, `--drop-existing`, version/Drizzle/build compatibility checks) collapses to a single "is the DB empty?" check. This removes `--drop-existing`, `--skip-preflight`, and the preflight-aware `--drop-existing` suppression logic. Trade-off: operators who want to restore into an existing database would need to wipe it first (outside the CLI) or wait for a future version. +6. **v1 scope: full-snapshot-only restore (no `--schemas`)?** If selective restore is deferred to v2, the restore command becomes trivial: `pg_restore` every dump in the snapshot. This eliminates `--schemas`, `--bootstrap-ensnode`, the JSON upsert path, metadata pruning, and open question 2. `--ponder-sync-only` can remain as a simple special case for #833. Trade-off: CI test matrix (#1127) would need to restore full snapshots for each matrix entry (expensive) or create separate smaller snapshots per test config. If question 5 is also "yes," the restore command has **zero flags** (besides `--ensdb-url` and `--input`). From a9d28c1e249009014cf007af9c4aaf6424098aef Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 20 Apr 2026 13:02:19 +0200 Subject: [PATCH 09/10] Refine ENSdb CLI plan to clarify snapshot creation and restoration processes, emphasizing whole-database snapshots and empty-database-only restores. Update documentation to include two-DB deployment guidance and enhance command descriptions for selective snapshot operations. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 428 +++++++++--------- 1 file changed, 219 insertions(+), 209 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index 5aa3da8d5..e8ff986e7 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -1,6 +1,6 @@ --- name: ENSDb CLI Tool -overview: Create a new `apps/ensdb-cli` application that provides database inspection, schema management, and snapshot import/export/push/pull operations for ENSDb, targeting 50-100GB PostgreSQL databases with S3-compatible storage and safe restore flows for fresh or isolated databases. +overview: Create a new `apps/ensdb-cli` application that provides database inspection, schema management, and snapshot create / push / pull / restore operations for ENSDb. v1 is intentionally scoped to **whole-database snapshots** and **restore-into-empty-database only**, against a production deployment that splits indexers across two physically separate PostgreSQL databases (mainnet and testnets) to keep `ponder_sync` manageable. todos: - id: scaffold content: "Scaffold apps/ensdb-cli: package.json, tsconfig, vitest config, yargs entry point, workspace integration" @@ -15,10 +15,10 @@ todos: content: Implement pg_dump/pg_restore wrapper with parallel jobs and progress reporting status: pending - id: snapshot-create - content: "Implement snapshot create: dump indexer schemas + ponder_sync + ensnode + ensnode_metadata.json; manifest includes postgresVersion, ensnode.drizzleMigrations fingerprint, checksums" + content: "Implement snapshot create (whole DB only): dump every indexer schema + ponder_sync + full ensnode schema + ensnode_metadata.json; manifest enrichment + checksums" status: pending - id: snapshot-restore - content: "Implement snapshot restore: preflight (fresh DB, conflicts, ENSNODE_BOOTSTRAP_REQUIRED, version/Drizzle/build compatibility vs manifest), --skip-preflight escape hatch; full + selective + --bootstrap-ensnode paths" + content: "Implement snapshot restore (empty DB only): single freshness preflight, then full or selective indexer-schema restore (selective always restores ensnode + full ponder_sync, then prunes ensnode.metadata)" status: pending - id: s3-client content: Implement S3-compatible client layer with multipart upload/download support @@ -27,7 +27,7 @@ todos: content: "Implement snapshot push: upload snapshot artifacts and manifest to S3-compatible storage" status: pending - id: snapshot-pull - content: "Implement snapshot pull: download from S3-compatible storage, verify checksums" + content: "Implement snapshot pull: download from S3-compatible storage, verify checksums, support selective + ponder-sync-only modes" status: pending - id: snapshot-verify content: "Implement snapshot verify: validate a local snapshot manifest and checksums without restoring" @@ -42,7 +42,7 @@ todos: content: Create Dockerfile with postgresql-client for pg_dump/pg_restore status: pending - id: docs - content: Add documentation and README + content: Add documentation and README, including the two-DB (mainnet / testnets) deployment guidance status: pending isProject: false --- @@ -51,153 +51,137 @@ isProject: false ## Context -ENSNode production databases are 50-100GB PostgreSQL instances. Each chain deployment gets its own indexer schema following the naming convention `{deployment}Schema{version}`. Three schema types coexist in one database: +ENSNode production databases are large PostgreSQL instances. Each chain deployment gets its own indexer schema following the naming convention `{deployment}Schema{version}`. Three schema types coexist within a single ENSDb: -**Production database (7 schemas):** +- **Indexer schemas** (one per deployment, named via `ENSINDEXER_SCHEMA_NAME`): e.g. `mainnetSchema1.9.0`, `sepoliaSchema1.9.0`. +- **`ensnode`** -- application schema containing `metadata` (rows scoped by `ens_indexer_schema_name`) and `__drizzle_migrations` (Drizzle migration journal). Must be preserved on full restore. +- **`ponder_sync`** -- shared RPC cache and sync state, needed by every indexer that runs against the DB. -- **Indexer schemas (5):** - - `alphaSchema1.9.0` -- alpha deployment (all chains) - - `alphaSepoliaSchema1.9.0` -- alpha sepolia - - `mainnetSchema1.9.0` -- mainnet - - `sepoliaSchema1.9.0` -- sepolia - - `v2SepoliaSchema1.9.0` -- v2 sepolia -- **Shared schemas (2):** - - `ensnode` -- application schema: `metadata` table (rows scoped by `ens_indexer_schema_name`) + `__drizzle_migrations` (Drizzle migration journal). Full schema must be preserved on full restore. - - `ponder_sync` -- shared RPC cache and sync state (needed by every indexer) - -Schema names are set via `ENSINDEXER_SCHEMA_NAME` env var in the blue-green deploy workflow (`[.github/workflows/deploy_ensnode_blue_green.yml](.github/workflows/deploy_ensnode_blue_green.yml)`). Old schemas are orphaned on redeploy and must be dropped manually to reclaim space. Also, all orphaned records in ENSNode Metadata tables must be deleted manually. +Schema names are set via the blue-green deploy workflow ([`.github/workflows/deploy_ensnode_blue_green.yml`](.github/workflows/deploy_ensnode_blue_green.yml)). Old schemas are orphaned on redeploy and must be dropped manually to reclaim space, along with orphaned rows in `ensnode.metadata`. The goal is to enable fast ENSNode bootstrap (hours instead of 2-3 days) by snapshotting and restoring database state. -### Related Issues - -- [#833](https://github.com/namehash/ensnode/issues/833) -- Simplify downloading of `ponder_sync` for internal developers. The CLI's `snapshot pull` and `snapshot restore` commands directly address this by supporting selective download of just `ponder_sync` from a remote snapshot. -- [#1127](https://github.com/namehash/ensnode/issues/1127) -- Matrix ENSApi smoke tests across subgraph-compat, alpha-style, and v2 configs. The CLI's snapshot infrastructure enables setting up isolated test databases with specific indexer configurations for CI smoke testing. See "CI Test Matrix Support" section below. -- [#279](https://github.com/namehash/ensnode/issues/279) -- Count Unknown Names & Unknown Labels. A future roadmap extension: the CLI's database access and inspect infrastructure can be extended with an `analyze` command to compute analytical metrics over indexed data. See "Future Roadmap" section below. +### Production deployment: two physically separate databases -## Architecture Decisions - -### Snapshot UX principles - -The snapshot system should optimize for a fast, low-friction developer experience while staying operationally safe for 50–100GB databases. - -- **Restore modes**: keep the user-facing model simple. - - **Full** (omit `--schemas`): restore every indexer dump in the snapshot + `ponder_sync` + **`ensnode` via `pg_restore` from `ensnode.dump.tar.zst`** (includes `metadata` and `__drizzle_migrations`). Do **not** rebuild `ensnode` from JSON alone in this mode; `ensnode_metadata.json` is redundant for data integrity but still useful for listing and verification. - - `ponder_sync` only (`--ponder-sync-only`) - - **Selective**: chosen indexer schema(s) (`--schemas ...`) plus the matching rows from `ensnode.metadata` (replayed from `ensnode_metadata.json`); optional `ponder_sync`. Does **not** `pg_restore` the full `ensnode` dump (would clobber other indexers’ metadata). - - `ponder_sync` is **included by default** for selective restore. Pass **`--without-ponder-sync`** to opt out. - - - **Read-only consumers** (ENSApi serving, analytics, any app that will not run ENSIndexer): pass `--without-ponder-sync` — it is not needed. - - **Indexer operators** (restore then run ENSIndexer to keep the DB updated): keep `ponder_sync` (the default) — it preserves sync/RPC cache state and speeds up reaching a healthy following state. -- **Manifest-driven tooling**: `manifest.json` is the source of truth for: - - which artifacts exist under `{prefix}/{snapshot-id}/` - - per-artifact sizes and checksums - - metadata required to derive UI “capabilities” (via `deriveCapabilities(...)`, not stored in the manifest) -- **Resumable + retry downloads (roadmap)**: Large snapshot downloads should tolerate flaky networks. - - download to a `.part` file - - resume via HTTP range requests - - retry with backoff on failure +`ponder_sync` grows with the union of RPC caches for **all chains indexed across all schemas in the same DB**. A single shared DB containing both mainnet and testnet indexers ends up with a `ponder_sync` carrying both production L1/L2 history and many testnets, which makes snapshots large and slow. -S3-compatible storage supports range reads, so `snapshot pull` can add an optional resumable mode in a later iteration (v1 can stay simple). +To keep `ponder_sync` (and therefore snapshots) manageable, **production runs two physically separate PostgreSQL databases**, one per network family: -### Snapshot Composition +```mermaid +flowchart LR + subgraph mainnetDb [Mainnet ENSDb] + mainnetSchemas["mainnetSchema1.9.0\n(production L1 + L2 schemas)"] + mainnetEnsnode[ensnode] + mainnetPonderSync["ponder_sync\n(mainnet + L2 RPC cache)"] + end -A full snapshot contains **separate pg_dump archives** for: + subgraph testnetsDb [Testnets ENSDb] + testnetSchemas["alphaSepoliaSchema1.9.0\nsepoliaSchema1.9.0\nv2SepoliaSchema1.9.0"] + testnetsEnsnode[ensnode] + testnetsPonderSync["ponder_sync\n(sepolia + testnet L2 RPC cache)"] + end -1. **Every indexer deployment schema** currently present in the database (e.g. `mainnetSchema1.9.0`, `alphaSchema1.9.0`). The CLI discovers these by enumerating non-system schemas and **excluding** `ponder_sync`, `ensnode`, and PostgreSQL system schemas. If unrelated application schemas exist in the same database, add `--ignore-schemas` so they are not dumped by mistake. When `--ignore-schemas` is used, the manifest should record which schemas were **ignored** (so consumers can tell whether the source DB had additional schemas that were intentionally excluded). -2. The `ponder_sync` schema (full) -3. The **`ensnode` schema (full `pg_dump`)** → `ensnode.dump.tar.zst`, same directory-format + tar+zstd pipeline as other schemas. This captures **`metadata`, `__drizzle_migrations`,** and any other objects under `ensnode` so a **full** restore is faithful to the source DB. -4. **`ensnode_metadata.json`** -- export of all `ensnode.metadata` rows (JSON). Used for manifest enrichment, `snapshot list` summaries, and **selective** restore (replay only the rows for chosen indexer schema names). It is **not** a substitute for `ensnode.dump.tar.zst` on full restore. + mainnetSnapshot["mainnet snapshot\ns3://.../mainnet/..."] --- mainnetDb + testnetsSnapshot["testnets snapshot\ns3://.../testnets/..."] --- testnetsDb +``` -**Optional manifest enrichment (best-effort):** +- **Mainnet ENSDb** holds production indexer schemas (e.g. `mainnetSchema1.9.0`) plus its own `ensnode` and `ponder_sync` (containing only mainnet + production L2 RPC caches). +- **Testnets ENSDb** holds testnet indexer schemas (`alphaSepoliaSchema1.9.0`, `sepoliaSchema1.9.0`, `v2SepoliaSchema1.9.0`) plus its own `ensnode` and `ponder_sync` (containing only testnet RPC caches). -- `ponderSync.chainIdsPresent` -- list of chain IDs observed in the `ponder_sync` RPC cache at snapshot time (if derivable cheaply and deterministically from the current `ponder_sync` schema). This should be **best-effort**: if the CLI can’t derive it reliably (schema differs, missing columns, etc.), omit the field rather than failing snapshot creation. +The CLI itself operates against **one database at a time** (whichever `--ensdb-url` / `ENSDB_URL` points at). The mainnet vs. testnets distinction is purely a deployment / operational convention encoded by the connection string and by storing snapshots under separate S3 prefixes (e.g. `s3://ensdb-snapshots/mainnet/...` vs `s3://ensdb-snapshots/testnets/...`). Snapshots are produced and consumed **per database** -- a "mainnet snapshot" only covers the mainnet DB, never the testnets DB. -**Full snapshot workflows:** +> Note: the home of multichain-but-mostly-mainnet schemas like `alphaSchema1.9.0` is intentionally not pinned in this plan -- it depends on which RPC chain caches it shares. The two-DB split applies regardless: each indexer schema lives in whichever DB carries the matching `ponder_sync`. -1. `snapshot pull` with no `--schemas` downloads all artifacts, including `ensnode.dump.tar.zst` and `ensnode_metadata.json`. -2. `snapshot restore` with no `--schemas` runs preflight for a full clone, then `pg_restore`s all indexer dumps, `ponder_sync`, and **`ensnode`** from `ensnode.dump.tar.zst`. JSON is not the source of truth for `ensnode` in this path. +### Related Issues -**Selective workflows:** +- [#833](https://github.com/namehash/ensnode/issues/833) -- Simplify downloading of `ponder_sync` for internal developers. The CLI's `snapshot pull --ponder-sync-only` and `snapshot restore --ponder-sync-only` directly address this by supporting selective download / restore of just `ponder_sync` from a remote snapshot. +- [#1127](https://github.com/namehash/ensnode/issues/1127) -- Matrix ENSApi smoke tests across subgraph-compat, alpha-style, and v2 configs. The CLI's snapshot infrastructure enables setting up isolated empty test databases with specific indexer configurations for CI smoke testing. See "CI Test Matrix Support" section below. +- [#279](https://github.com/namehash/ensnode/issues/279) -- Count Unknown Names & Unknown Labels. A future roadmap extension: the CLI's database access and inspect infrastructure can be extended with an `analyze` command to compute analytical metrics over indexed data. See "Future Roadmap" section below. -1. `snapshot pull --schemas ...` downloads the selected indexer archives + `ensnode_metadata.json` + `ensnode.dump.tar.zst` and (by default) `ponder_sync`. The `ensnode` dump is always included because it is very small (tens of KB). For a **clean** target database, selective restore can bootstrap Drizzle state from the snapshot (see below). -2. `snapshot restore --schemas ...` restores the selected indexer schema dumps and reconciles `ensnode.metadata`. On a **clean** DB this **must** also establish **`ensnode.__drizzle_migrations`** (JSON alone cannot do that). Two supported approaches (document which is default for CI): - - **Migrate-first:** Run ENSDb/ENSIndexer migrations against the empty database **before** `snapshot restore --schemas`, creating `ensnode` + `__drizzle_migrations`; then the CLI restores indexer dumps and **upserts** filtered rows from `ensnode_metadata.json`. - - **Bootstrap-from-snapshot:** Pass **`--bootstrap-ensnode`** (requires `ensnode.dump.tar.zst` under `--input`). The CLI `pg_restore`s the full `ensnode` schema (migrations + `metadata` as captured), **deletes** `ensnode.metadata` rows whose `ens_indexer_schema_name` is **not** in `--schemas`, then restores the selected indexer dumps. Optionally still apply JSON upsert for the target schemas to match checksums exactly. +## v1 Scope and Constraints -Because `ponder_sync` is shared state, `snapshot restore` is intended for **fresh or isolated target databases only**. The CLI enforces that with **preflight checks** (below) instead of relying on operator discipline alone. +To keep v1 small and safe, the CLI commits to two scope reductions: -**Implementation notes:** +1. **`snapshot create` is whole-DB only.** No per-schema / partial create. The CLI discovers every non-system schema and dumps them all alongside `ponder_sync` and `ensnode`. `--ignore-schemas` is still supported defensively (skip unrelated app schemas if any). +2. **`snapshot restore` only targets an empty database.** The CLI refuses to touch a database that already contains user objects. There is **no** `--drop-existing`, **no** `--skip-preflight`, **no** `--bootstrap-ensnode` flag in v1 -- they would all be no-ops or unsafe given the empty-DB constraint. -- **Full restore:** `pg_restore` the `ensnode` dump so `__drizzle_migrations` matches the snapshot source. If JSON and dump ever disagree, **the dump wins** (JSON is auxiliary). -- **Selective restore (shared `ensnode` already present):** Do not `pg_restore` `ensnode.dump.tar.zst` (would clobber `__drizzle_migrations` / other indexers' metadata). Replay only the relevant `ensnode.metadata` rows from `ensnode_metadata.json` after preflight passes. **Upsert semantics:** `INSERT ... ON CONFLICT (ens_indexer_schema_name, key) DO UPDATE SET value = EXCLUDED.value` (primary key from `ensdb-sdk`). -- **Selective restore (clean database):** You **must** get `__drizzle_migrations` from somewhere: either **migrate-first** (app creates `ensnode`) or **`--bootstrap-ensnode`** from `ensnode.dump.tar.zst` + metadata prune. Failing that, preflight should fail with a dedicated error (e.g. `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_BOOTSTRAP_REQUIRED`) if `ensnode` / `__drizzle_migrations` is missing when `--bootstrap-ensnode` was not used and migrations were not run. Exact rules should be validated against how ENSIndexer/ENSApi expect `ensnode` after a partial restore. +These two constraints eliminate large families of failure modes (Drizzle / version skew on an existing `ensnode`, metadata conflicts, partial overwrites of `ponder_sync`, scope of `--drop-existing`, version compatibility checks against live target state) and let v1 ship with very few flags. -### Preflight checks (`snapshot restore`) +Selective **restore** of a subset of indexer schemas from a whole-DB snapshot is still supported, because that is the workflow CI matrix tests (#1127) need. In selective restore, **`ponder_sync` is always fully restored** and `ensnode` is always restored from the snapshot dump (since the target DB is empty, there is nothing else to bootstrap migrations from). After restore, `ensnode.metadata` is **pruned** to keep only rows whose `ens_indexer_schema_name` is in the chosen schema set. -Runs in the **restore command handler** immediately after validating CLI args, loading the manifest, and computing the **effective restore plan** (target schemas, whether `ponder_sync` will be restored, whether `ensnode` will be replaced, and whether `--drop-existing` applies). Preflight still runs **before** any destructive action and **before** any `pg_restore`. +## Architecture Decisions -**Checks (fail closed by default):** +### Snapshot UX principles -1. **ponder_sync non-empty:** If schema `ponder_sync` exists, check it **deterministically**: enumerate all base tables in `ponder_sync`; if any table contains at least one row, fail with identifier `ENSDB_CLI_ERR_PREFLIGHT_PONDER_SYNC_NONEMPTY`. If the schema exists but has no base tables, treat it as empty. Do **not** rely on `pg_stat_user_tables` estimates for this guardrail. -2. **ensnode / metadata conflicts:** - - **Full** restore (no `--schemas`): treat **`ensnode` as a unit**. If schema `ensnode` exists and has any user objects, fail with `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_NONEMPTY` — **unless** `--drop-existing` is also set (preflight recognizes it will be dropped in the next step) **or** `--skip-preflight`. - - **Selective** restore (`--schemas`): if `ensnode.metadata` exists, fail if any row’s `ens_indexer_schema_name` is **not** in the target schema set (`ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_CONFLICT`). If **`ensnode` is absent** (or `__drizzle_migrations` missing / empty when required) and **`--bootstrap-ensnode` is not set**, fail with `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_BOOTSTRAP_REQUIRED` (message: run migrations first or pass `--bootstrap-ensnode`). If **`--bootstrap-ensnode` is set** but **`ensnode` already exists** with objects, fail unless **`--drop-existing`** is set (will drop `ensnode` before bootstrap restore). -3. **Unexpected non-system schemas / objects:** Enumerate schemas outside PostgreSQL system namespaces (`pg_*`, `information_schema`, etc.). For the intended restore set, fail if a **non-target indexer schema** already exists (tables present or schema present) (`ENSDB_CLI_ERR_PREFLIGHT_UNEXPECTED_SCHEMA`). Optionally extend with a stricter mode: fail if `public` (or other default) contains unexpected user tables — keep the rule deterministic in code and documented. -4. **Version / compatibility (target already has `ensnode` or you use migrate-first):** When restoring **into** a database that will keep an existing `ensnode` schema (selective restore **without** `--bootstrap-ensnode`, or any path where Drizzle state is not fully replaced by the snapshot’s `ensnode` dump), compare **snapshot manifest** (and optionally `ensnode_metadata.json`) to **live target** state. **Fail closed** if incompatible unless **`--skip-preflight`** is passed. +- **Snapshot create is always whole-DB.** The result is a directory of per-schema dumps + `ponder_sync` + the full `ensnode` schema dump + an `ensnode_metadata.json` export, plus `manifest.json` and `checksums.sha256`. +- **Snapshot restore always targets an empty database** and supports three modes: + - **Full restore** (no `--schemas`, no `--ponder-sync-only`): restore every dump in the snapshot. + - **Selective restore** (`--schemas A,B`): restore only the named indexer schema dumps; **also always** restore `ensnode` (so `__drizzle_migrations` exists) and **always fully** restore `ponder_sync`. After dumps come back, prune `ensnode.metadata` to rows in the chosen schema set. An optional `--without-ponder-sync` flag is provided for read-only consumers (e.g. ENSApi-only setups) that will not run ENSIndexer and do not need `ponder_sync`. + - **`ponder_sync`-only** (`--ponder-sync-only`): restore only `ponder_sync.dump.tar.zst`. Useful for #833 (developer wants to bootstrap `ponder_sync` and start indexing fresh schemas on top). +- **No `--drop-existing`, no `--skip-preflight`, no `--bootstrap-ensnode`.** Empty-DB-only means there is nothing to drop, no preflight to skip, and `ensnode` is always bootstrapped from the snapshot dump (because the DB is empty when restore starts). +- **Manifest-driven tooling**: `manifest.json` is the source of truth for: + - which artifacts exist under `{prefix}/{snapshot-id}/` + - per-artifact sizes and checksums + - metadata required to derive UI "capabilities" (via `deriveCapabilities(...)`, not stored in the manifest) +- **Resumable + retry downloads (roadmap)**: Large snapshot downloads should tolerate flaky networks. v1 can stay simple; a later iteration of `snapshot pull` can add `.part` files + HTTP range resume + retry-with-backoff. -### Version and build compatibility (`snapshot restore`) +### Snapshot composition -Restoring beside **existing** migration or metadata state is unsafe if versions diverge: the indexer schema dump may assume tables/columns from migration set **A** while the target’s `__drizzle_migrations` reflects set **B**, or `ensnode.metadata` may carry **versionInfo** / build identifiers that no longer match the app you intend to run. +A whole-DB snapshot directory contains: -**At `snapshot create`**, record enough in `manifest.json` to compare on restore (exact shape validated during implementation): +``` +{snapshot-id}/ + manifest.json + {indexerSchemaName}.dump.tar.zst # one per indexer schema discovered in the source DB + ponder_sync.dump.tar.zst + ensnode.dump.tar.zst # full pg_dump of ensnode (metadata + __drizzle_migrations + ...) + ensnode_metadata.json # all ensnode.metadata rows (JSON; auxiliary) + checksums.sha256 +``` -- **`postgresVersion`** (already planned) — target server should be **same major** (and ideally same minor) as the source; mismatch → `ENSDB_CLI_ERR_PREFLIGHT_PG_VERSION_MISMATCH`. -- **`ensnode.drizzleMigrations`** — fingerprint of source `ensnode.__drizzle_migrations` at snapshot time, e.g. ordered list of migration **tags** (or hashes) from Drizzle’s journal. Enables comparing to the **target** `__drizzle_migrations` **before** `pg_restore` when `ensnode` already exists. -- **`indexerConfig.versionInfo`** / **`ensdb_version` metadata** — ENSDb, Ponder, ENSIndexer semver (already in manifest enrichment). If the target `ensnode.metadata` (for the same `ens_indexer_schema_name`) or a CLI flag **`--expected-ensdb-version`** disagrees with the snapshot for the schemas being restored → `ENSDB_CLI_ERR_PREFLIGHT_ENSNODE_METADATA_VERSION_MISMATCH` (or split per product if needed). -- **Build / git / image id** — If `ensindexer_public_config` or other metadata embeds a **build id** or git SHA used operationally, treat it like semver: snapshot vs target mismatch is an **error** by default in v1 (operators can use `--skip-preflight` to bypass all checks). This keeps v1 simple and strict; a future `--allow-build-mismatch` flag can relax it if needed. +Composition rules: -**Rules of thumb:** +1. The CLI enumerates non-system schemas and **excludes** `ponder_sync`, `ensnode`, and any names listed in `--ignore-schemas`. Whatever remains is treated as an indexer schema and gets its own `.dump.tar.zst`. +2. `ponder_sync` is dumped in full to `ponder_sync.dump.tar.zst`. +3. `ensnode` is dumped in full to `ensnode.dump.tar.zst`. This is the **source of truth for `ensnode`** on every restore (full or selective). +4. `ensnode_metadata.json` is an export of all rows in `ensnode.metadata` (JSON). It is **auxiliary**: drives `snapshot list` / `snapshot info` summaries, populates manifest enrichment, and lets selective restore consumers see which schemas a snapshot covers without unpacking the dump. It is **not** the source of truth for `ensnode` on restore. +5. If `--ignore-schemas` was used, the manifest records the ignored names under `ignoredSchemas` so consumers can tell that the source DB had additional schemas excluded on purpose. -- **Full restore into an empty database:** After wipe + restore, `ensnode` comes entirely from the snapshot dump, so **Drizzle row mismatch on target** does not apply pre-restore. Still check **PostgreSQL major** compatibility with the dump (`pg_restore` / server). -- **Selective + `--bootstrap-ensnode`:** Replaces `ensnode` from the snapshot (after optional drop); fingerprint in manifest should **match** the restored dump (self-consistent). -- **Selective + migrate-first or shared `ensnode`:** Target **`__drizzle_migrations` must match** the snapshot’s `ensnode.drizzleMigrations` fingerprint (or target must be a strict superset with identical applied tags for shared migrations — pick one deterministic rule in code; **default strict equality** is simplest). Otherwise the restored indexer tables and live migration history can disagree. +**Optional manifest enrichment (best-effort):** -**`--drop-existing` scope:** +- `ponderSync.chainIdsPresent` -- list of chain IDs observed in the `ponder_sync` RPC cache at snapshot time, if derivable cheaply and deterministically. Best-effort: omit the field rather than failing snapshot creation if it cannot be derived reliably. In the two-DB deployment this lets a viewer see at a glance whether a snapshot is "mainnet-flavored" or "testnets-flavored". -- **Full** restore: drops **all schemas that will be restored** — every indexer schema in the manifest + `ponder_sync` + `ensnode`. Preflight recognizes this flag and suppresses "non-empty" checks for schemas that will be dropped. -- **Selective** restore: drops only the **named `--schemas`** targets. If `--bootstrap-ensnode` is also set, additionally drops `ensnode`. Drops `ponder_sync` only when `ponder_sync` is being restored (i.e. default behavior, not `--without-ponder-sync`). +### Empty-database preflight (`snapshot restore`) -**`--skip-preflight`:** skips **all** preflight checks (freshness, conflicts, version / Drizzle / build compatibility). Use only when the operator explicitly accepts overwriting shared state, clobbering metadata, and version skew. Log a clear **stderr warning** when this flag is used (no interactive confirmation — the flag name is the confirmation). `--drop-existing` does **not** bypass preflight; it only suppresses non-empty checks for schemas it will drop. `--skip-preflight` bypasses everything. +Run **once**, immediately after CLI args are validated and the manifest is loaded, **before** any `pg_restore`. -**Order of operations:** +**Definition of "empty"** (deterministic, fail-closed): -1. Preflight (unless skipped via **`--skip-preflight`**) -2. If `--drop-existing` is set and targets exist: **full** drops all indexer schemas + `ponder_sync` + `ensnode`; **selective** drops named `--schemas` + `ponder_sync` (when being restored) + `ensnode` (when `--bootstrap-ensnode` is set) -3. `pg_restore` for dumps (full restore includes `ensnode` from `ensnode.dump.tar.zst`). **Selective:** restore indexer dumps (+ optional `ponder_sync`); then either **`--bootstrap-ensnode`** (`pg_restore` `ensnode` + prune metadata) **or** JSON upsert only onto an `ensnode` that already has migrations (migrate-first path). +- No user schemas exist except: PostgreSQL system schemas (`pg_*`, `information_schema`) and the default `public` schema. +- The `public` schema, if present, must contain **zero user tables / views / sequences / functions**. +- Specifically: `ensnode`, `ponder_sync`, and any indexer schema (anything matching the discovery rules in "Snapshot composition") **must not exist**. -**Surfacing errors:** Print distinct messages per failure class; include the `ENSDB_CLI_ERR_PREFLIGHT_*` identifier in the message (and optionally `process.exit` with dedicated codes, e.g. `2` / `3` / `4`, if the team wants scriptable CI — document in README). +If any of these checks fail, `snapshot restore` aborts with `ENSDB_CLI_ERR_RESTORE_DB_NOT_EMPTY` and prints which schemas / objects were found. There is **no** `--skip-preflight` escape hatch in v1; operators who want to restore into a populated DB must wipe it first (e.g. `DROP DATABASE` + `CREATE DATABASE`, or use the existing `schema drop` command for each schema). -**Selective restore:** Preflight must ensure metadata upsert will not clobber other indexers on shared DBs: the `ensnode.metadata` row check above is mandatory before replaying filtered `ensnode_metadata.json`. On **clean** DBs, preflight must ensure **`__drizzle_migrations` will exist** after the command (migrate-first completed, or `--bootstrap-ensnode` with dump on disk). +This single check replaces the entire preflight matrix that would otherwise be needed for non-empty targets (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas, version / Drizzle / build compatibility, scope-of-`--drop-existing`, etc.). All of those are deferred to the future roadmap (see "Future Roadmap"). -### Snapshot Format +### Snapshot format -Use `pg_dump` with `--format=directory` and `--jobs=N` for parallel dump/restore. This is the only format that supports parallelism, which is critical for 50-100GB databases. Each directory-format dump is then archived as a `.dump.tar.zst` file for storage and transfer, and unpacked to a temporary directory before restore. +Use `pg_dump` with `--format=directory` and `--jobs=N` for parallel dump/restore (subject to question 4 below -- benchmark before committing). Each directory-format dump is then archived as a `.dump.tar.zst` file for storage and transfer, and unpacked to a temporary directory before restore. - Dump: `pg_dump --format=directory --jobs=4 --schema= --file /.dumpdir` - Archive: `tar --zstd -cf /.dump.tar.zst -C .dumpdir` - Restore: unpack `.dump.tar.zst` to a temp directory, then run `pg_restore --format=directory --jobs=4 --schema= /.dumpdir` -The implementation should explicitly budget temporary disk usage for both the compressed archive and the unpacked directory during restore. To reduce peak disk usage, process schemas **sequentially** during `snapshot create`: dump one schema to a directory, archive it, delete the directory, then proceed to the next. During `snapshot restore`, similarly unpack and restore one archive at a time, deleting the unpacked directory after each `pg_restore` completes. On failure or interrupt, clean up temp directories (register a process exit handler / signal trap). +To reduce peak disk usage, process schemas **sequentially** during `snapshot create`: dump one schema to a directory, archive it, delete the directory, then proceed to the next. During `snapshot restore`, similarly unpack and restore one archive at a time, deleting the unpacked directory after each `pg_restore` completes. On failure or interrupt, clean up temp directories (register a process exit handler / signal trap). **Checksum verification:** Verify `checksums.sha256` at two points: (1) after `snapshot pull` completes (before returning success), and (2) at the start of `snapshot restore --input` before any `pg_restore`. If the snapshot was created locally via `snapshot create`, the restore verification catches corruption from disk issues. **Tooling prerequisites:** Archiving uses `tar` with zstd compression (`tar --zstd` or pipe to `zstd`). The Docker image and operator docs must include `tar`, `zstd`, and PostgreSQL client tools (`pg_dump`, `pg_restore`) compatible with the server major version. -### S3-compatible Storage Layout +### S3-compatible storage layout -Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containing a `manifest.json` and per-schema dump files: +Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a sub-prefix containing a `manifest.json` and per-schema dump files: ``` {prefix}/ @@ -206,17 +190,44 @@ Discovery via `ListObjects` on `{prefix}/` -- each snapshot is a prefix containi {schema-name}.dump.tar.zst # archived pg_dump directory output (one per indexer schema) ponder_sync.dump.tar.zst # archived dump of ponder_sync ensnode.dump.tar.zst # full pg_dump of schema ensnode (metadata + __drizzle_migrations + ...) - ensnode_metadata.json # all ensnode.metadata rows (JSON; listing + selective replay; auxiliary on full restore) + ensnode_metadata.json # all ensnode.metadata rows (JSON; listing + selective metadata pruning hint) checksums.sha256 # integrity verification ``` -- `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` **in parallel** (with a concurrency limit, e.g. 10) for metadata display. Supports `--limit ` (default: 20) to cap the number of snapshots shown and avoid slow listing when many snapshots exist. Results are sorted by `createdAt` descending (newest first). -- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ensnode_metadata.json` + `ensnode.dump.tar.zst` (always included; negligible size) and (by default) `ponder_sync.dump.tar.zst`. Add `--without-ponder-sync` for read-only consumers that do not plan to run ENSIndexer after restore. -- `snapshot pull` with no `--schemas` downloads **all** artifacts including `ensnode.dump.tar.zst`. +In the production two-DB deployment, mainnet and testnets snapshots are kept under **separate prefixes** (or even separate buckets) so they list and manage independently: + +``` +s3://ensdb-snapshots/ + mainnet/ + ensdb-YYYY-MM-DDTHHMMSSZ-/ + manifest.json + mainnetSchema1.9.0.dump.tar.zst + ponder_sync.dump.tar.zst + ensnode.dump.tar.zst + ensnode_metadata.json + checksums.sha256 + testnets/ + ensdb-YYYY-MM-DDTHHMMSSZ-/ + manifest.json + alphaSepoliaSchema1.9.0.dump.tar.zst + sepoliaSchema1.9.0.dump.tar.zst + v2SepoliaSchema1.9.0.dump.tar.zst + ponder_sync.dump.tar.zst + ensnode.dump.tar.zst + ensnode_metadata.json + checksums.sha256 +``` + +The CLI does not know about "mainnet" vs "testnets" -- it just respects whatever `--bucket` + `--prefix` (or `ENSDB_SNAPSHOT_BUCKET` + `ENSDB_SNAPSHOT_PREFIX`) it is given. + +- `snapshot list` uses `ListObjectsV2` with delimiter `/` to enumerate snapshot prefixes, then fetches each `manifest.json` **in parallel** (with a concurrency limit, e.g. 10) for metadata display. Supports `--limit ` (default: 20) to cap the number of snapshots shown. Results sorted by `createdAt` descending (newest first). +- `snapshot pull` with no `--schemas` / `--ponder-sync-only` downloads **all** artifacts. +- `snapshot pull --schemas ...` downloads the selected indexer dump(s) + `ensnode_metadata.json` + `ensnode.dump.tar.zst` (always included; negligible size) + `ponder_sync.dump.tar.zst` (default; opt out with `--without-ponder-sync`). +- `snapshot pull --ponder-sync-only` downloads only `manifest.json` + `checksums.sha256` + `ponder_sync.dump.tar.zst`. ### Technology -- **CLI framework**: yargs (consistent with ENSRainbow's `apps/ensrainbow/src/cli.ts`) +- **CLI framework**: yargs (consistent with ENSRainbow's [`apps/ensrainbow/src/cli.ts`](apps/ensrainbow/src/cli.ts)) - **S3-compatible storage**: `@aws-sdk/client-s3` + `@aws-sdk/lib-storage` (multipart uploads for large files). Uses the standard AWS SDK credential chain (env vars `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` / `AWS_REGION`, shared config files, IAM roles). No custom auth flags. - **Database**: `pg` for connection validation, shells out to `pg_dump`/`pg_restore` for actual operations - **Runtime**: tsx (consistent with other apps) @@ -246,41 +257,42 @@ ensdb-cli schema drop [--ensdb-url ] --schema [--force] ``` ensdb-cli snapshot create [--ensdb-url ] --output [--ignore-schemas ] [--jobs ] - Export all discovered indexer schemas + ponder_sync + full ensnode schema (ensnode.dump.tar.zst) + ensnode_metadata.json. - Runs pg_dump with parallel jobs per dumped schema. Use --ignore-schemas to skip unrelated app schemas. - -ensdb-cli snapshot restore [--ensdb-url ] --input [--drop-existing] [--skip-preflight] [--jobs ] - Full restore: all indexer dumps in the snapshot + ponder_sync + ensnode (from ensnode.dump.tar.zst). - Omit --schemas. Preflight requires a fresh/isolated target unless `--skip-preflight` is used, or `--drop-existing` is set for the schemas that will be replaced. - Unpacks archives, then pg_restore with parallel jobs. ensnode_metadata.json is not the source of truth for ensnode. - -ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--bootstrap-ensnode] [--drop-existing] [--skip-preflight] [--jobs ] - Selective restore into a fresh or isolated database. - Restores the selected indexer schema dump(s) + ponder_sync when that artifact is present under --input. - **Clean DB:** either run migrations first, then apply filtered ensnode.metadata from JSON; or pass --bootstrap-ensnode (requires ensnode.dump.tar.zst in --input) to pg_restore ensnode (includes __drizzle_migrations) and prune metadata to --schemas. - **Shared ensnode:** omit --bootstrap-ensnode; upsert filtered metadata from JSON only. - Runs preflight before pg_restore; fails if shared state or metadata conflicts unless --skip-preflight. - Fails if target indexer schema already exists unless --drop-existing is passed (after preflight). - Unpacks `.dump.tar.zst` archives to temp storage, then runs pg_restore with parallel jobs. - -ensdb-cli snapshot restore [--ensdb-url ] --input --ponder-sync-only [--drop-existing] [--skip-preflight] [--jobs ] - Restore only ponder_sync (no indexer schemas, no ensnode dump, no ensnode.metadata changes). - Preflight still applies to non-empty ponder_sync unless --skip-preflight. - Enables the developer workflow described in #833: quickly bootstrap a local ponder_sync - so a new indexer can skip RPC re-fetching. + WHOLE-DB snapshot. v1 has no partial create mode. + Discovers every non-system schema in the source DB and dumps: + - each indexer schema -> {schema}.dump.tar.zst + - ponder_sync -> ponder_sync.dump.tar.zst + - ensnode -> ensnode.dump.tar.zst (includes metadata + __drizzle_migrations) + - ensnode.metadata -> ensnode_metadata.json (auxiliary) + Use --ignore-schemas to skip unrelated app schemas (recorded in manifest.ignoredSchemas). + +ensdb-cli snapshot restore [--ensdb-url ] --input [--jobs ] + FULL restore into an EMPTY database. + Restores every dump in the snapshot: all indexer schemas + ponder_sync + ensnode (from ensnode.dump.tar.zst). + Aborts with ENSDB_CLI_ERR_RESTORE_DB_NOT_EMPTY if the target DB is not empty. + +ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--without-ponder-sync] [--jobs ] + SELECTIVE restore into an EMPTY database. + Restores: chosen indexer schemas + ensnode (always, from ensnode.dump.tar.zst) + ponder_sync (always, in full). + After restore, prunes ensnode.metadata to rows whose ens_indexer_schema_name is in --schemas. + --without-ponder-sync: opt out of restoring ponder_sync (read-only consumers like ENSApi-only setups). + Aborts if the target DB is not empty. + +ensdb-cli snapshot restore [--ensdb-url ] --input --ponder-sync-only [--jobs ] + Restore ONLY ponder_sync.dump.tar.zst into an EMPTY database (no indexer schemas, no ensnode). + Enables the developer workflow described in #833: bootstrap a local ponder_sync, then run ENSIndexer + with a fresh schema name on top. + Aborts if the target DB is not empty. ensdb-cli snapshot push --input --bucket [--endpoint ] [--prefix ] Upload a local snapshot to S3-compatible storage. Uses multipart upload. ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] [--schemas ] [--without-ponder-sync] - Download from S3-compatible storage. If --schemas specified, downloads those indexer dumps + ensnode_metadata.json + ensnode.dump.tar.zst and (by default) ponder_sync. - pass --without-ponder-sync to skip ponder_sync (trade-offs: restored indexer may re-fetch RPC state). - If --schemas omitted, downloads the full snapshot. + Download from S3-compatible storage. If --schemas is given, downloads those indexer dumps + ensnode_metadata.json + ensnode.dump.tar.zst (always included) + ponder_sync.dump.tar.zst (default; opt out with --without-ponder-sync). + If --schemas is omitted, downloads the full snapshot. --prefix scopes all keys to `{prefix}/{snapshot-id}/...` (same as list/push/info/delete). ensdb-cli snapshot pull --snapshot-id --output --bucket [--endpoint ] [--prefix ] --ponder-sync-only - Download only ponder_sync.dump.tar.zst from a remote snapshot in S3-compatible storage (#833). - Skips all indexer schema dumps and ensnode_metadata.json. + Download only manifest.json + checksums.sha256 + ponder_sync.dump.tar.zst (#833). ensdb-cli snapshot list --bucket [--endpoint ] [--prefix ] [--limit ] List available snapshots from S3-compatible storage with metadata summary (uses ListObjects + manifest.json). @@ -300,31 +312,34 @@ ensdb-cli snapshot verify --input ### Common Options -- `--ensdb-url` / `ENSDB_URL` -- PostgreSQL connection string for the source/target ENSDb. Optional; defaults to `process.env.ENSDB_URL`. +- `--ensdb-url` / `ENSDB_URL` -- PostgreSQL connection string for the source/target ENSDb. Optional; defaults to `process.env.ENSDB_URL`. In the two-DB deployment, point at `ENSDB_URL_MAINNET` or `ENSDB_URL_TESTNETS` depending on which DB you are operating on. - `--jobs` / `-j` -- parallelism for pg_dump/pg_restore (default: 4) - `--bucket` / `ENSDB_SNAPSHOT_BUCKET` -- S3 bucket name - `--endpoint` / `ENSDB_SNAPSHOT_ENDPOINT` -- S3-compatible endpoint (for R2, MinIO) -- `--prefix` / `ENSDB_SNAPSHOT_PREFIX` -- key prefix inside the bucket (default empty). All snapshot S3 commands (`push`, `pull`, `list`, `info`, `delete`) must resolve object keys as `{prefix}/{snapshot-id}/...` so behavior matches when omitted (empty prefix) or when a shared prefix is used. -- `--skip-preflight` -- `snapshot restore` only: skip **all** preflight checks (non-empty `ponder_sync`, `ensnode.metadata` conflicts, unexpected schemas, PostgreSQL / Drizzle / ensdb / build-id compatibility). Dangerous; emits a stderr warning. v1 has no narrower “skip only version checks” flag — use this explicitly or fix the target DB first. +- `--prefix` / `ENSDB_SNAPSHOT_PREFIX` -- key prefix inside the bucket (default empty). All snapshot S3 commands (`push`, `pull`, `list`, `info`, `delete`) resolve object keys as `{prefix}/{snapshot-id}/...`. In production, set this to `mainnet` or `testnets` to keep the two DBs' snapshots separated under one bucket. - `--verbose` / `-v` -- detailed output ## Manifest Schema Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by reading `ensindexer_public_config` from `ensnode.metadata` -- no manual input needed for namespace, plugins, or chain IDs. -**Manifest version check:** On any command that reads a manifest (`snapshot list`, `snapshot restore`, `snapshot info`, `snapshot pull`, `snapshot verify`), the CLI must check the `version` field and fail with a clear error (e.g. "manifest version 2 is not supported by this CLI; upgrade ensdb-cli") if it encounters a version it does not support. +**Manifest version check:** On any command that reads a manifest (`snapshot list`, `snapshot restore`, `snapshot info`, `snapshot pull`, `snapshot verify`), the CLI checks the `version` field and fails with a clear error (e.g. "manifest version 2 is not supported by this CLI; upgrade ensdb-cli") if it encounters a version it does not support. + +`ensnode.drizzleMigrations` is recorded for **informational / debugging** purposes (and to feed `snapshot info` output). v1 does **not** compare it to a live target's `__drizzle_migrations` because restore always lands on an empty DB and Drizzle state always comes from the snapshot dump itself. ### Deriving capabilities for UI -ENSDb should compute “what this snapshot enables” dynamically at display time from: -- the manifest’s artifact list (which dump files are present) -- each schema’s `indexerConfig` (plugins, namespace, `isSubgraphCompatible`, etc.) +ENSDb should compute "what this snapshot enables" dynamically at display time from: + +- the manifest's artifact list (which dump files are present) +- each schema's `indexerConfig` (plugins, namespace, `isSubgraphCompatible`, etc.) Define a single function (used by `snapshot list` / `snapshot info` output formatting) that implements this deterministic logic: `deriveCapabilities({ manifest, schemaName? }) -> { flags, intendedUseCases }` Example outputs (computed, not stored): + - `fastBootstrap: true` (if required artifacts are present to avoid full reindex) - `includesPonderSync: true` (if `ponder_sync.dump.tar.zst` exists) - `selectiveRestoreSupported: true` (if `ensnode_metadata.json` exists and schema dumps are per-schema) @@ -387,21 +402,6 @@ The `indexerConfig` is extracted from the three `ensnode.metadata` keys: - `ensindexer_public_config` -- namespace, plugins, chains, version info, label set, subgraph compatibility - `ensindexer_indexing_status` -- per-chain sync status (block numbers, timestamps, chain-following state) -This means `snapshot list` can show rich summaries like: - -``` -Snapshot ID Schemas Total Size Created -ensdb-2026-04-06T120000Z-abc123 5 50 GB 2026-04-06 - mainnetSchema1.9.0 mainnet subgraph 1 chain - alphaSchema1.9.0 alpha subgraph+basenames+6 6 chains - sepoliaSchema1.9.0 sepolia subgraph 1 chain - ... -ensdb-2026-04-05T080000Z-def456 3 35 GB 2026-04-05 - ... -``` - -Each row is a snapshot (not a schema). Schemas are listed as sub-entries. - ## Project Structure ``` @@ -414,8 +414,8 @@ apps/ensdb-cli/ commands/ inspect.ts # inspect command schema-drop.ts # schema drop command - snapshot-create.ts # snapshot create - snapshot-restore.ts # snapshot restore + snapshot-create.ts # snapshot create (whole-DB only) + snapshot-restore.ts # snapshot restore (empty-DB only; full / selective / ponder-sync-only) snapshot-push.ts # push to S3 snapshot-pull.ts # pull from S3 snapshot-verify.ts # verify local snapshot manifest + checksums @@ -424,7 +424,7 @@ apps/ensdb-cli/ snapshot-delete.ts # delete remote snapshot prefix lib/ database.ts # pg connection, schema queries - preflight-restore.ts # fresh/isolated DB, conflicts, Drizzle/version/build compatibility before pg_restore + preflight-restore.ts # single "is the target database empty?" check pgdump.ts # pg_dump/pg_restore wrapper s3.ts # S3-compatible client, multipart upload/download manifest.ts # manifest read/write, validation @@ -445,9 +445,10 @@ apps/ensdb-cli/ ### Phase 2: Local Snapshot Create + Restore - Implement `pg_dump` wrapper with parallel jobs and progress reporting -- Implement `snapshot create` (dump all indexer schemas + ponder_sync + full `ensnode` schema dump + `ensnode_metadata.json` export) +- Implement `snapshot create` (whole-DB: all indexer schemas + ponder_sync + full `ensnode` schema dump + `ensnode_metadata.json` export) - Implement archive packaging and unpacking for directory-format dumps -- Implement `snapshot restore` (preflight in `preflight-restore.ts`, full + selective + `--bootstrap-ensnode` path, then pg_restore / metadata prune / JSON upsert) +- Implement empty-DB preflight (`preflight-restore.ts`) +- Implement `snapshot restore` (full / selective / ponder-sync-only paths; selective performs `ensnode.metadata` prune after pg_restore) - Manifest generation and validation - Checksum generation and verification @@ -456,61 +457,83 @@ apps/ensdb-cli/ - S3-compatible client with multipart upload support - Shared helper: resolve `{prefix}/{snapshot-id}/` from `--prefix` / `ENSDB_SNAPSHOT_PREFIX` for every snapshot S3-compatible command (`push`, `pull`, `list`, `info`, `delete`) - `snapshot push` with manifest and artifact upload only -- `snapshot pull` with integrity verification (optionally add `--resumable` + `.part` downloads + retries) +- `snapshot pull` with integrity verification (optionally add `--resumable` + `.part` downloads + retries later) - `snapshot list` and `snapshot info` for browsing remote snapshots - `snapshot delete` (list objects under prefix, batch delete, `--force` / confirmation) ### Phase 4: Polish + Production Readiness -- Dockerfile (include `postgresql-client` for pg_dump/pg_restore) +- Dockerfile (include `postgresql-client` for pg_dump/pg_restore, plus `tar`/`zstd`) - Progress bars for large operations - `snapshot verify --input ` command: verify local snapshot integrity (checksums) without restoring -- Dry-run mode for destructive operations +- Dry-run mode for destructive operations (e.g. `snapshot delete --dry-run`) - Comprehensive error messages and recovery guidance -- Documentation +- Documentation: usage, plus the two-DB (mainnet / testnets) deployment recipe ## CI Test Matrix Support (#1127) -The snapshot infrastructure directly enables the matrix smoke tests described in [#1127](https://github.com/namehash/ensnode/issues/1127). The production database contains indexer schemas with three distinct configurations that map to the test matrix: +The snapshot infrastructure directly enables the matrix smoke tests described in [#1127](https://github.com/namehash/ensnode/issues/1127). A whole-DB snapshot per network family contains every indexer schema needed for the matrix entries against that family. -- **Subgraph-compat**: `mainnetSchema1.9.0` (plugins: `["subgraph"]`, `isSubgraphCompatible: true`) -- **Alpha-style**: `alphaSchema1.9.0` (plugins: `["subgraph","basenames","lineanames","threedns",...]`, `isSubgraphCompatible: false`) -- **V2**: `v2SepoliaSchema1.9.0` (plugins: `["ensv2","protocol-acceleration"]`, `isSubgraphCompatible: false`) - -The manifest's `indexerConfig` on each schema entry includes `plugins`, `namespace`, `isSubgraphCompatible`, and `indexedChainIds`, which provides enough information for CI to select the correct schema for each test variant. +The manifest's `indexerConfig` on each schema entry includes `plugins`, `namespace`, `isSubgraphCompatible`, and `indexedChainIds`, which is enough information for CI to select the correct schema for each test variant. **CI workflow pattern:** -``` -# 1. Pull only the schema needed for this matrix entry +```bash +# 1. Pull only the indexer schema needed for this matrix entry from the relevant snapshot. +# --without-ponder-sync because smoke tests only read; they do not run ENSIndexer. ensdb-cli snapshot pull \ - --snapshot-id \ + --snapshot-id \ --schemas mainnetSchema1.9.0 \ + --without-ponder-sync \ --bucket $ENSDB_SNAPSHOT_BUCKET \ + --prefix mainnet \ --output /tmp/snapshot -# 2a. Restore into an isolated empty test database (bootstrap ensnode + Drizzle migrations from snapshot) +# 2. Restore into an isolated EMPTY test database. Selective restore implicitly +# bootstraps ensnode (from ensnode.dump.tar.zst) and prunes ensnode.metadata +# to the chosen schema. --without-ponder-sync skips ponder_sync since smoke +# tests do not run the indexer. ensdb-cli snapshot restore \ --ensdb-url $TEST_DB_URL \ --input /tmp/snapshot \ --schemas mainnetSchema1.9.0 \ - --bootstrap-ensnode + --without-ponder-sync -# 2b. Alternative: run ENSDb migrations against $TEST_DB_URL first, then restore without --bootstrap-ensnode - -# 3. Run smoke tests against the restored database +# 3. Run smoke tests against the restored database. ENSDB_URL=$TEST_DB_URL ENSINDEXER_SCHEMA_NAME=mainnetSchema1.9.0 pnpm test:smoke ``` -Each matrix entry pulls and restores a different schema, then runs ENSApi smoke tests against it. The selective pull avoids downloading every indexer dump — only the chosen schema, `ponder_sync`, `ensnode.dump.tar.zst` (always included; negligible size), and `ensnode_metadata.json`. +Each matrix entry pulls and restores a different schema (selective pull avoids downloading every indexer dump) into a fresh empty test DB. -The `snapshot list` and `snapshot info` commands can also be used in CI to discover the latest available snapshot ID before pulling. +The `snapshot list` and `snapshot info` commands can also be used in CI to discover the latest available snapshot ID for a given prefix before pulling. ## Future Roadmap +### Restoring into a non-empty database + +v1 hard-requires an empty target database. A future version can lift this restriction by reintroducing the larger preflight matrix: + +- non-empty `ponder_sync` detection +- conflicting `ensnode.metadata` row detection (selective restore) +- unexpected non-target schemas +- PostgreSQL major version compatibility +- `ensnode.__drizzle_migrations` fingerprint comparison vs `manifest.ensnode.drizzleMigrations` +- `ensnode.metadata` `versionInfo` / `ensdb_version` comparison vs the snapshot +- escape hatches: `--drop-existing` (scoped to the schemas being restored), `--skip-preflight` (last-resort override with stderr warning), `--bootstrap-ensnode` (when migrating-first vs bootstrapping-from-dump becomes a meaningful choice again) + +The data captured in v1's manifest (`postgresVersion`, `ensnode.drizzleMigrations`, per-schema `versionInfo`) is already designed to feed those checks, so v2 will not have to evolve the manifest format. + +### Per-schema snapshot create + +v1 always creates whole-DB snapshots. A future `snapshot create --schemas A,B` mode could produce smaller artifacts when an operator wants to publish only one schema's dump. The two-DB production split already partially solves the size problem at the database level; per-schema snapshots can wait until there is concrete demand. + +### Streaming uploads + +v1 stays local-first (`snapshot create` then `snapshot push`). No streaming/pipe-to-S3 mode in v1. Rationale (kept for future roadmap): directory-format dumps are multi-file, so the standard pipeline writes each `.dump.tar.zst` to disk and uploads it. Streaming directly to S3 multipart is possible but more moving parts; defer until needed. + ### Analytical Queries (#279) -[#279](https://github.com/namehash/ensnode/issues/279) requires counting Unknown Names and Unknown Labels by iterating through domain data. This will be a separate **`analyze`** command, not part of **`inspect`**. +[#279](https://github.com/namehash/ensnode/issues/279) requires counting Unknown Names and Unknown Labels by iterating through domain data. This will be a separate **`analyze`** command, not part of `inspect`. **Why separate from `inspect`:** @@ -531,39 +554,26 @@ ensdb-cli analyze unknown-labels [--ensdb-url ] --schema [--top-n 10 Supports progress reporting for long-running scans. ``` -The snapshot create/restore workflow enables **offline analysis**: snapshot production, restore into an isolated database, run analysis without impacting production. The `ensindexer_public_config` metadata (available in manifests) identifies which schemas are subgraph-compatible, which is relevant to #279 since the metrics are anchored to the ENS Subgraph definition of Unknown Labels. +The snapshot create/restore workflow enables **offline analysis**: snapshot production, restore into an isolated empty database, run analysis without impacting production. The `ensindexer_public_config` metadata (available in manifests) identifies which schemas are subgraph-compatible, which is relevant to #279 since the metrics are anchored to the ENS Subgraph definition of Unknown Labels. This is explicitly **out of scope for v1** but the plan ensures the CLI's database access layer (`lib/database.ts`, `@ensnode/ensdb-sdk` integration) is designed to support it. ## Resolved Decisions -1. **Discovery**: No shared `index.json`. Use S3-compatible `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. -2. **Snapshot granularity**: `snapshot create` dumps all discovered indexer deployment schemas + `ponder_sync` + **full `ensnode` schema** (`ensnode.dump.tar.zst`) + `ensnode_metadata.json`, with optional `--ignore-schemas` for unrelated app schemas. **Full** `snapshot pull` / `snapshot restore` include the `ensnode` dump. **Selective** `pull` always includes `ensnode.dump.tar.zst` (negligible size). **Selective** `restore` either **upserts** metadata from JSON onto an existing migrated `ensnode`, or uses **`--bootstrap-ensnode`** to `pg_restore` the dump and prune metadata to `--schemas` so `__drizzle_migrations` matches the snapshot. -3. **Restore safety**: `snapshot restore` assumes a fresh or isolated database; **preflight** enforces this (non-empty `ponder_sync`, conflicting `ensnode.metadata` rows, unexpected schemas, version compatibility) unless **`--skip-preflight`** is passed. -4. **Restore behavior**: Preflight runs first, then fail if a target indexer schema already exists unless `--drop-existing` is passed. `--drop-existing` does not bypass preflight but **suppresses non-empty checks** for schemas it will drop; only **`--skip-preflight`** bypasses all checks. Prevents accidental data loss while keeping an explicit escape hatch. -5. **`--drop-existing` scope**: Full restore drops all indexer schemas + `ponder_sync` + `ensnode`. Selective restore drops named `--schemas` targets + `ponder_sync` (when being restored) + `ensnode` (when `--bootstrap-ensnode` is set). -6. **`ponder_sync` default**: Included by default for selective restore. Opt out with `--without-ponder-sync`. -7. **`--bootstrap-ensnode` + `--drop-existing`**: When both are set and `ensnode` already exists, `--drop-existing` drops `ensnode` before bootstrap restore. +1. **Production deployment uses two physically separate ENSDbs** -- one for mainnet, one for testnets -- to keep `ponder_sync` (and therefore each snapshot) small. The CLI itself operates on one DB at a time; mainnet vs. testnets is encoded by `--ensdb-url` and S3 `--prefix`. +2. **`snapshot create` is whole-DB only in v1.** No partial / per-schema dumps. `--ignore-schemas` remains for defensively skipping unrelated app schemas, recorded in `manifest.ignoredSchemas`. +3. **`snapshot restore` only targets an empty target database in v1.** Single deterministic preflight: refuse to run if any user schemas / objects exist. No `--drop-existing`, no `--skip-preflight`, no `--bootstrap-ensnode` flags. +4. **Selective restore is supported (and required for the CI matrix).** It always restores `ensnode` from the snapshot dump (only viable source of `__drizzle_migrations` in an empty DB) and **always fully restores `ponder_sync`**. Read-only consumers can opt out with `--without-ponder-sync`. After restore, `ensnode.metadata` is pruned to rows whose `ens_indexer_schema_name` is in `--schemas`. +5. **`--ponder-sync-only` restore mode is supported** for the developer workflow in #833 (bootstrap a local `ponder_sync`, then run ENSIndexer fresh on top). +6. **`ensnode.dump.tar.zst` is the source of truth for `ensnode` on every restore.** `ensnode_metadata.json` is auxiliary (drives listing, summaries, and lets selective restore know which schemas a snapshot covers without unpacking the dump). +7. **Discovery**: No shared `index.json`. Use S3-compatible `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index. 8. **Retention policy**: `snapshot delete` command added for manual cleanup. `snapshot list` shows all snapshots; operators manage retention manually. -9. **Snapshot IDs**: Snapshot IDs are auto-generated and immutable in v1 (no operator override). -10. **Streaming uploads**: v1 stays local-first (`snapshot create` then `snapshot push`). No streaming/pipe-to-S3 mode in v1. - - **Rationale (kept for future roadmap):** - - - **Why directory format complicates streaming:** `pg_dump --format=directory` writes many files under a tree; you normally archive that tree to a single blob (e.g. `.dump.tar.zst`) before upload. That implies at least one local staging step per schema unless you stream `tar` output directly to S3 multipart (possible but more moving parts). - - **Other `pg_dump` formats do not remove all constraints:** - - **Custom** (`pg_dump --format=custom` / `-Fc`): single-file output and can be streamed (e.g. piped into multipart upload). Loses parallel `pg_restore` compared to directory format unless you accept those trade-offs at 50-100GB scale. - - **Plain** (`pg_dump --format=plain`): single SQL stream; stream-friendly, but restore is typically slower and less suited to huge DBs than the directory workflow already chosen for this plan. - - **Checksums and manifest:** The plan includes `checksums.sha256` and a `manifest.json` with per-artifact sizes. For end-to-end streaming without a local file: - - Per-artifact SHA-256 can still be computed by hashing bytes as they pass through the upload pipeline (digest alongside the stream), then writing the digest into `checksums.sha256` and the manifest after that artifact finishes. - - Alternatively, S3 object checksums (multipart part ETags, or `ChecksumSHA256` on `PutObject` where supported) can supplement or replace client-side files, but the manifest must state what is verified (client hash vs object checksum). - - The full snapshot manifest cannot be finalized until all artifacts are complete, so uploads can be incremental, but manifest upload is always last (or use a two-phase manifest: provisional then final). +9. **Snapshot IDs** are auto-generated and immutable in v1 (no operator override). +10. **Streaming uploads** are deferred (see Future Roadmap). ## Open Questions for Stakeholders 1. **Snapshot ID format**: Confirm the exact auto-generated format (e.g. `ensdb-YYYY-MM-DDTHHMMSSZ-` vs `{primarySchemaName}-...`). v1 does not allow overriding the generated ID. -2. **Clean DB selective restore default for CI/docs:** Prefer **migrate-first** (requires running the repo migration command with a matching ENSDb version) or **`--bootstrap-ensnode`** (self-contained from snapshot; `ensnode.dump.tar.zst` is always present in pulled snapshots)? -3. **ponder_sync chain IDs:** Is there a stable, canonical way to derive `ponderSync.chainIdsPresent` from the current `ponder_sync` schema (table/column to read), or should the CLI treat this as a purely best-effort hint with no guarantees? -4. **pg_dump parallel jobs:** Is `pg_dump --format=directory --jobs=N` actually faster than single-threaded dump for our schemas? Each indexer schema has ~12 tables, so parallelism across tables within a single schema may yield limited benefit. Benchmark before committing to directory format as the only path — `pg_dump --format=custom` (single file, no parallel restore) would simplify the archive/unpack pipeline significantly. If directory format is not measurably faster, consider switching to custom format for v1. -5. **v1 scope: empty-database-only restore?** If `snapshot restore` only targets **empty databases**, the entire preflight matrix (non-empty `ponder_sync`, `ensnode` conflicts, unexpected schemas, `--drop-existing`, version/Drizzle/build compatibility checks) collapses to a single "is the DB empty?" check. This removes `--drop-existing`, `--skip-preflight`, and the preflight-aware `--drop-existing` suppression logic. Trade-off: operators who want to restore into an existing database would need to wipe it first (outside the CLI) or wait for a future version. -6. **v1 scope: full-snapshot-only restore (no `--schemas`)?** If selective restore is deferred to v2, the restore command becomes trivial: `pg_restore` every dump in the snapshot. This eliminates `--schemas`, `--bootstrap-ensnode`, the JSON upsert path, metadata pruning, and open question 2. `--ponder-sync-only` can remain as a simple special case for #833. Trade-off: CI test matrix (#1127) would need to restore full snapshots for each matrix entry (expensive) or create separate smaller snapshots per test config. If question 5 is also "yes," the restore command has **zero flags** (besides `--ensdb-url` and `--input`). +2. **`ponder_sync` chain IDs:** Is there a stable, canonical way to derive `ponderSync.chainIdsPresent` from the current `ponder_sync` schema (table/column to read), or should the CLI treat this as a purely best-effort hint with no guarantees? In the two-DB deployment this would be a useful `snapshot info` signal ("mainnet-flavored" vs "testnets-flavored"). +3. **`pg_dump` parallel jobs:** Is `pg_dump --format=directory --jobs=N` actually faster than single-threaded dump for our schemas? Each indexer schema has roughly a dozen tables, so parallelism across tables within a single schema may yield limited benefit. Benchmark before committing to directory format as the only path -- `pg_dump --format=custom` (single file, no parallel restore) would simplify the archive/unpack pipeline significantly. If directory format is not measurably faster, consider switching to custom format for v1. +4. **Where do multichain schemas like `alphaSchema1.9.0` live in the two-DB split?** It indexes mainnet + production L2s, so it would naturally share the mainnet DB's `ponder_sync`, but the placement (and whether alpha continues to exist as a separate deployment) is an operational decision outside the CLI's scope. The CLI works either way as long as each schema lives in the DB whose `ponder_sync` matches its chain set. From c89ef3ea01dcd24b00fb407b6994d191c6b8a921 Mon Sep 17 00:00:00 2001 From: djstrong Date: Mon, 20 Apr 2026 14:13:27 +0200 Subject: [PATCH 10/10] Clarify snapshot restore process in ENSdb CLI plan, specifying that `ponder_sync` is restored by default during selective restores. Update documentation to reflect changes in command behavior and enhance understanding of restoration options for empty databases. --- .cursor/plans/ensdb_cli_tool_422abf99.plan.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md index e8ff986e7..7fa9b3d13 100644 --- a/.cursor/plans/ensdb_cli_tool_422abf99.plan.md +++ b/.cursor/plans/ensdb_cli_tool_422abf99.plan.md @@ -18,7 +18,7 @@ todos: content: "Implement snapshot create (whole DB only): dump every indexer schema + ponder_sync + full ensnode schema + ensnode_metadata.json; manifest enrichment + checksums" status: pending - id: snapshot-restore - content: "Implement snapshot restore (empty DB only): single freshness preflight, then full or selective indexer-schema restore (selective always restores ensnode + full ponder_sync, then prunes ensnode.metadata)" + content: "Implement snapshot restore (empty DB only): single freshness preflight, then full or selective indexer-schema restore (selective always restores ensnode; restores ponder_sync by default; then prunes ensnode.metadata)" status: pending - id: s3-client content: Implement S3-compatible client layer with multipart upload/download support @@ -107,7 +107,7 @@ To keep v1 small and safe, the CLI commits to two scope reductions: These two constraints eliminate large families of failure modes (Drizzle / version skew on an existing `ensnode`, metadata conflicts, partial overwrites of `ponder_sync`, scope of `--drop-existing`, version compatibility checks against live target state) and let v1 ship with very few flags. -Selective **restore** of a subset of indexer schemas from a whole-DB snapshot is still supported, because that is the workflow CI matrix tests (#1127) need. In selective restore, **`ponder_sync` is always fully restored** and `ensnode` is always restored from the snapshot dump (since the target DB is empty, there is nothing else to bootstrap migrations from). After restore, `ensnode.metadata` is **pruned** to keep only rows whose `ens_indexer_schema_name` is in the chosen schema set. +Selective **restore** of a subset of indexer schemas from a whole-DB snapshot is still supported, because that is the workflow CI matrix tests (#1127) need. In selective restore, `ensnode` is always restored from the snapshot dump (since the target DB is empty, there is nothing else to bootstrap migrations from). `ponder_sync` is restored **by default** (in full) to preserve RPC cache / sync state; pass `--without-ponder-sync` for read-only consumers (e.g. ENSApi-only setups) that will not run ENSIndexer and do not need it. After restore, `ensnode.metadata` is **pruned** to keep only rows whose `ens_indexer_schema_name` is in the chosen schema set. ## Architecture Decisions @@ -116,7 +116,7 @@ Selective **restore** of a subset of indexer schemas from a whole-DB snapshot is - **Snapshot create is always whole-DB.** The result is a directory of per-schema dumps + `ponder_sync` + the full `ensnode` schema dump + an `ensnode_metadata.json` export, plus `manifest.json` and `checksums.sha256`. - **Snapshot restore always targets an empty database** and supports three modes: - **Full restore** (no `--schemas`, no `--ponder-sync-only`): restore every dump in the snapshot. - - **Selective restore** (`--schemas A,B`): restore only the named indexer schema dumps; **also always** restore `ensnode` (so `__drizzle_migrations` exists) and **always fully** restore `ponder_sync`. After dumps come back, prune `ensnode.metadata` to rows in the chosen schema set. An optional `--without-ponder-sync` flag is provided for read-only consumers (e.g. ENSApi-only setups) that will not run ENSIndexer and do not need `ponder_sync`. + - **Selective restore** (`--schemas A,B`): restore only the named indexer schema dumps; **also always** restore `ensnode` (so `__drizzle_migrations` exists) and restore `ponder_sync` **by default** (in full). Pass `--without-ponder-sync` for read-only consumers (e.g. ENSApi-only setups) that will not run ENSIndexer and do not need `ponder_sync`. After dumps come back, prune `ensnode.metadata` to rows in the chosen schema set. - **`ponder_sync`-only** (`--ponder-sync-only`): restore only `ponder_sync.dump.tar.zst`. Useful for #833 (developer wants to bootstrap `ponder_sync` and start indexing fresh schemas on top). - **No `--drop-existing`, no `--skip-preflight`, no `--bootstrap-ensnode`.** Empty-DB-only means there is nothing to drop, no preflight to skip, and `ensnode` is always bootstrapped from the snapshot dump (because the DB is empty when restore starts). - **Manifest-driven tooling**: `manifest.json` is the source of truth for: @@ -272,7 +272,7 @@ ensdb-cli snapshot restore [--ensdb-url ] --input [--jobs ] ensdb-cli snapshot restore [--ensdb-url ] --input --schemas [--without-ponder-sync] [--jobs ] SELECTIVE restore into an EMPTY database. - Restores: chosen indexer schemas + ensnode (always, from ensnode.dump.tar.zst) + ponder_sync (always, in full). + Restores: chosen indexer schemas + ensnode (always, from ensnode.dump.tar.zst) + ponder_sync (by default, in full). After restore, prunes ensnode.metadata to rows whose ens_indexer_schema_name is in --schemas. --without-ponder-sync: opt out of restoring ponder_sync (read-only consumers like ENSApi-only setups). Aborts if the target DB is not empty. @@ -329,7 +329,7 @@ Each snapshot has a `manifest.json`. The CLI auto-populates `indexerConfig` by r ### Deriving capabilities for UI -ENSDb should compute "what this snapshot enables" dynamically at display time from: +`ensdb-cli` should compute "what this snapshot enables" dynamically at display time from: - the manifest's artifact list (which dump files are present) - each schema's `indexerConfig` (plugins, namespace, `isSubgraphCompatible`, etc.) @@ -563,7 +563,7 @@ This is explicitly **out of scope for v1** but the plan ensures the CLI's databa 1. **Production deployment uses two physically separate ENSDbs** -- one for mainnet, one for testnets -- to keep `ponder_sync` (and therefore each snapshot) small. The CLI itself operates on one DB at a time; mainnet vs. testnets is encoded by `--ensdb-url` and S3 `--prefix`. 2. **`snapshot create` is whole-DB only in v1.** No partial / per-schema dumps. `--ignore-schemas` remains for defensively skipping unrelated app schemas, recorded in `manifest.ignoredSchemas`. 3. **`snapshot restore` only targets an empty target database in v1.** Single deterministic preflight: refuse to run if any user schemas / objects exist. No `--drop-existing`, no `--skip-preflight`, no `--bootstrap-ensnode` flags. -4. **Selective restore is supported (and required for the CI matrix).** It always restores `ensnode` from the snapshot dump (only viable source of `__drizzle_migrations` in an empty DB) and **always fully restores `ponder_sync`**. Read-only consumers can opt out with `--without-ponder-sync`. After restore, `ensnode.metadata` is pruned to rows whose `ens_indexer_schema_name` is in `--schemas`. +4. **Selective restore is supported (and required for the CI matrix).** It always restores `ensnode` from the snapshot dump (only viable source of `__drizzle_migrations` in an empty DB) and restores `ponder_sync` **by default** (in full). Read-only consumers can opt out with `--without-ponder-sync`. After restore, `ensnode.metadata` is pruned to rows whose `ens_indexer_schema_name` is in `--schemas`. 5. **`--ponder-sync-only` restore mode is supported** for the developer workflow in #833 (bootstrap a local `ponder_sync`, then run ENSIndexer fresh on top). 6. **`ensnode.dump.tar.zst` is the source of truth for `ensnode` on every restore.** `ensnode_metadata.json` is auxiliary (drives listing, summaries, and lets selective restore know which schemas a snapshot covers without unpacking the dump). 7. **Discovery**: No shared `index.json`. Use S3-compatible `ListObjects` to discover snapshots by reading `manifest.json` from each snapshot prefix. Most robust -- no concurrent writer races, no stale index.