diff --git a/docs/configuration/pgdog.toml/general.md b/docs/configuration/pgdog.toml/general.md index 7ec007d..da6f325 100644 --- a/docs/configuration/pgdog.toml/general.md +++ b/docs/configuration/pgdog.toml/general.md @@ -479,6 +479,47 @@ Available options: Default: **`auto`** +### `cutover_traffic_stop_threshold` + +Replication lag threshold (in bytes) at which PgDog will pause traffic automatically during a [traffic cutover](../../features/sharding/resharding/cutover.md#pause-queries). + +Default: **`1_000_000`** (1 MiB) + +### `cutover_save_config` + +Save the swapped configuration to disk after a [traffic cutover](../../features/sharding/resharding/cutover.md#swap-the-configuration). When enabled, PgDog will backup both configuration files as `pgdog.bak.toml` and `users.bak.toml`, and write the new configuration to `pgdog.toml` and `users.toml`. + +Default: **`false`** (disabled) + +### `cutover_replication_lag_threshold` + +Replication lag (in bytes) that must be reached before PgDog will [swap the configuration](../../features/sharding/resharding/cutover.md#swap-the-configuration) during a cutover. + +Default: **`0`** (0 bytes) + +### `cutover_last_transaction_delay` + +Time (in milliseconds) since the last transaction on any table in the publication before PgDog will [swap the configuration](../../features/sharding/resharding/cutover.md#swap-the-configuration) during a cutover. + +Default: **`1_000`** (1s) + +### `cutover_timeout` + +Maximum amount of time (in milliseconds) to wait for the [cutover thresholds](../../features/sharding/resharding/cutover.md#thresholds) to be met. If exceeded, PgDog will take the action specified by [`cutover_timeout_action`](#cutover_timeout_action). + +Default: **`30_000`** (30s) + +### `cutover_timeout_action` + +Action to take when [`cutover_timeout`](#cutover_timeout) is exceeded. + +Available options: + +- `abort` (default): abort the cutover and resume traffic on the source database +- `cutover`: proceed with the cutover to the destination database + +Default: **`abort`** + ## Logging ### `log_connections` diff --git a/docs/features/sharding/resharding/cutover.md b/docs/features/sharding/resharding/cutover.md index 6b479e9..a8cd718 100644 --- a/docs/features/sharding/resharding/cutover.md +++ b/docs/features/sharding/resharding/cutover.md @@ -48,6 +48,15 @@ In order for the traffic to be safely moved to the new, sharded database, it mus To suspend traffic, PgDog turns on [maintenance mode](../../../administration/maintenance_mode.md). This pauses all queries for all databases in the configuration until the maintenance mode is turned off. Clients will wait, with their queries buffered in their respective TCP connection streams. To the clients, it looks like the PgDog deployment is frozen and not responsive. +The replication lag threshold at which PgDog will pause traffic automatically is configurable in [`pgdog.toml`](../../../configuration/pgdog.toml/general.md#cutover_traffic_stop_threshold): + +```toml +[general] +cutover_traffic_stop_threshold = 1_000_000 # 1 MiB +``` + +By default, it's set to 1MB, which is low enough that when traffic is paused, the two databases will synchronize very quickly. + ### Synchronize databases With the traffic paused, the logical replication stream will drain any remaining transactions into the destination database, bringing the replication lag down to zero. At this point in the cutover process, the two databases are byte-for-byte identical and traffic can be safely moved to the destination database. @@ -69,17 +78,46 @@ When enabled, PgDog will backup both configuration files, `pgdog.toml` as `pgdog !!! note "Multi-node deployments" If you're running more than one PgDog node, you should consider deploying our [Enterprise Edition](../../../enterprise_edition/index.md), which has support for saving the configuration files on multiple PgDog nodes at the same time. + +#### Thresholds + +Before swapping the configuration, PgDog waits for the two databases to be completely identical. These thresholds are configurable as follows: + +```toml +[general] +cutover_replication_lag_threshold = 0 # 0 bytes +cutover_last_transaction_delay = 1_000 # 1 second +``` + +Due to vacuum activity and transactions affecting other tables not in the publication, the replication lag between the two databases may never reach zero. For this reason, PgDog provides two triggers for the configuration swap: + +1. Replication lag, set to 0 bytes by default +2. Time since last transaction executed on any table in the publication + +The latter is computed from messages received via the replication stream and is a reliable metric of database activity for the tables in the publication. + +##### Timeout + +If these thresholds are not hit within a reasonable amount of time, PgDog will abort the cutover and resume traffic on the source database. This behavior is configurable: + +```toml +[general] +cutover_timeout = 30_000 # 30 seconds +cutover_timeout_action = "abort" # or "cutover" +``` + +If `cutover_timeout_action` is set to `"cutover"` instead, PgDog will flip the traffic to the destination database. This is an acceptable course of action in environments where data integrity is not paramount or the operator is absolutely certain that both databases are identical. ### Reverse replication To allow for rollbacks in case of any issues, prior to allowing queries on the new database, PgDog creates logical replication streams from the new database back to the original database. This replicates any writes made to the new database back to the source, keeping the two databases in-sync until the operator is satisfied that the new database is performing adequately.
- Cross-shard queries + Reverse replication
The reverse replication is created while the queries to both databases are paused, so it doesn't require any additional data copying or synchronization. ### Resume queries -With the reverse replication setup, it is now safe to move traffic to the destination (now source) database. PgDog does this by turning off [maintenance mode](../../../administration/maintenance_mode.md), and this step concludes the cutover. The entire process takes less than a second, typically, and allows PgDog to reshard Postgres databases without downtime. +With the reverse replication set up, it is now safe to move traffic to the destination (now source) database. PgDog does this by turning off [maintenance mode](../../../administration/maintenance_mode.md), and this step concludes the cutover. The entire process takes less than a second, typically, and allows PgDog to reshard Postgres databases without downtime. diff --git a/docs/features/sharding/resharding/hash.md b/docs/features/sharding/resharding/hash.md index 487397c..ce2a821 100644 --- a/docs/features/sharding/resharding/hash.md +++ b/docs/features/sharding/resharding/hash.md @@ -8,12 +8,41 @@ Moving data from the source to the destination database is done using logical re The underlying mechanism is very similar to Postgres [subscriptions](https://www.postgresql.org/docs/current/sql-createsubscription.html), with some improvements, and happens in two steps: -1. Copy data in the [publication](schema.md#publication) to the destination database -2. Stream row changes in real-time +| Step | Description | +|-|-| +| [Copy data](#how-it-works) | Copy data from all tables in the [publication](schema.md#publication) to the destination database. | +| [Stream row changes](#streaming-updates) | Stream row changes in real-time, keeping both source and destination databases in-sync. | + +Once the replication stream synchronizes the two database clusters, the data on the destination cluster will be identical to the source cluster with a few milliseconds of delay. + +## Performing the move + +Moving data can be done in one of two ways: + +1. Using an [admin database](../../../administration/index.md) command +2. Using a CLI command + +### Admin database command + +The admin database provides a way to execute commands, without having to spawn an independent PgDog process. To move data and replicate rows from the source database to the destination, you can run the following command: + +``` +COPY_DATA []; +``` + +This will spawn a background task that will copy all tables in the [publication](schema.md#publication) to the destination database, while redistributing rows between shards. Once the copy is complete, PgDog will proceed to stream row changes, keeping the two databases in close synchronization. + +#### Example -Once the replication stream synchronizes the two database clusters, the data on the destination cluster will be identical, within a few milliseconds, to the source cluster. +To copy data from database `"prod"` to database `"prod_sharded"` and the `"all_tables"` publication, execute the following command: -## CLI +``` +COPY_DATA prod prod_sharded all_tables; +``` + +The name of the replication slot will be automatically generated. + +### CLI PgDog has a command line interface you can call by running it directly: @@ -37,7 +66,7 @@ Required (*) and optional parameters for this command are as follows: ## How it works -The first thing PgDog will do when data sync is started is create a replication slot on each primary database in the source cluster. This will prevent Postgres from removing the WAL, while we copy data for each table to the destination. +The first thing PgDog will do when data sync is started is create a replication slot on each primary database in the source cluster. This will prevent Postgres from removing the WAL, while PgDog copies data for each table to the destination. Next, each table will be copied, in parallel, to the destination database, using [sharded COPY](../cross-shard-queries/copy.md). Once that's done, table changes are synchronized, in real-time, with logical replication from the replication slot created earlier. @@ -48,7 +77,7 @@ The whole process happens entirely online, and doesn't require database reboots PostgreSQL replication works on the basis of slots. They are virtual annotations in the Write-Ahead Log which prevent Postgres from recycling WAL segments and deleting the history of changes made to the database.
- Cross-shard queries + Replication slot
With logical replication, any client that speaks the protocol (like PgDog) can connect to the server and stream changes made to the database, starting at the position marked by the slot. @@ -69,7 +98,7 @@ Tables are copied from source to destination database using standard PostgreSQL If you are running PostgreSQL 16 or later and have configured replicas on the source database, PgDog can copy multiple tables in parallel, dramatically accelerating this process.
- Cross-shard queries + Parallel table copy
To set this up, make sure to add your read replicas to [`pgdog.toml`](../../../configuration/pgdog.toml/databases.md), for example: @@ -92,6 +121,16 @@ PgDog will distribute the table copy load evenly between all replicas in the con To prevent the resharding process from impacting production queries, you can create a separate set of replicas just for resharding. Managed clouds (e.g., AWS RDS) make this especially easy, but require a warm-up period to fetch all the data from the backup snapshot, before they can read data at full speed of their storage volumes. + + To make sure dedicated replicas are not used for read queries in production, you can configure PgDog to use them for resharding only: + + ```toml + [[databases]] + name = "prod" + host = "10.0.0.1" + role = "replica" + resharding_only = true + ``` #### Binary `COPY` @@ -105,26 +144,79 @@ PgDog uses the binary `COPY` format to send and receive data, which has been sho Once all tables are copied and resharded on the destination database cluster, PgDog will begin streaming real-time row updates from the [replication slot](#replication-slot). +#### Integer primary keys + +If your primary keys are using the `INTEGER` data type (like in older Rails versions, for example), PgDog will automatically migrate them to use `BIGINT` during the resharding process. This is required because PgDog's [unique ID](../unique-ids.md) generation function, which replaces sequences in sharded databases, produces 64-bit integers. + +If this is the case, the binary `COPY` will not work, and you need to use text copy protocol instead. This can be configured in [`pgdog.toml`](../../../configuration/pgdog.toml/general.md#resharding_copy_format): + +```toml +[general] +resharding_copy_format = "text" +``` + +#### Monitoring progress + +The table copies can take some time and PgDog provides a real-time view to monitor the progress in the admin database: + +=== "Admin command" + ``` + SHOW TABLE_COPIES; + ``` +=== "Output" + ``` + -[ RECORD 1 ]-------+-------------------------------------------------- + schema | public + table | orders + status | running + rows | 845000 + rows_human | 845,000 + bytes | 124500000 + bytes_human | 118.73 MB + bytes_per_sec | 4150000 + bytes_per_sec_human | 3.96 MB + elapsed | 00:00:30:000 + elapsed_ms | 30000 + sql | COPY public.orders (id, user_id, total, created_at) + ``` + ### Streaming updates -Once tables are copied over to the destination database, PgDog will stream any changes made to those tables from the [replication slot](#replication-slot) created previously. If it took a while to copy tables and the source database received a large volume of writes, this process could take some time. +Once tables are copied over to the destination database, PgDog will stream any changes made to those tables from the [replication slot](#replication-slot) created in the previous step. If it took a while to copy tables and the source database received a large volume of writes, this process could take some time. -You can check on the streaming process and estimate its ETA by querying the `pg_replication_slots` view on the __source__ database: +You can check on the streaming process and estimate its ETA by querying admin database: -=== "Source database" - ```postgresql - SELECT confirmed_flush_lsn, pg_current_wal_lsn() FROM pg_replication_slots; +=== "Admin command" + ``` + SHOW REPLICATION_SLOTS; + ``` +=== "Output" + ``` + -[ RECORD 1 ]-------+------------------------------- + host | 10.0.1.5 + port | 5432 + database_name | prod + name | pgdog_slot_0 + lsn | 0/1A3B4C80 + lag | 2.35 MB + lag_bytes | 2465792 + copy_data | t + last_transaction | 2026-03-03 22:22:08.341 UTC + last_transaction_ms | 1250 ``` -| Column | Description | -|-|-| -| `confirmed_flush_lsn` | The transaction identifier that has been written to the destination database cluster. | -| `pg_current_wal_lsn()` | Current position in the Write-Ahead Log for this database. | +Some of this information can be obtained by querying the source database as well, for example: +```postgresql +SELECT + pg_replication_slots.*, + pg_current_wal_lsn() +FROM pg_replication_slots; +``` The replication delay between the two database clusters is measured in bytes. When that number reaches zero, the two databases are byte-for-byte identical, and traffic can be [cut over](cutover.md) to the destination database. ## Next steps {{ next_steps_links([ - ("Traffic cutover", "cutover.md", "Switch live traffic to the new shard configuration."), + ("Traffic cutover", "cutover.md", "Switch live traffic to the destination database."), ]) }} diff --git a/mkdocs.yml b/mkdocs.yml index 5154fa1..4808097 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -33,7 +33,6 @@ theme: - navigation.top - navigation.footer - search.suggest - - search.highlight - search.share - toc.follow