Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/current/molt/migration-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ It's important to fully [prepare the migration](#prepare-for-migration) in order

## Capacity planning

### Cluster sizing

To size the target CockroachDB cluster, consider your data volume and workload characteristics:

- What is the total size of the data you will migrate?
Expand All @@ -64,6 +66,10 @@ If you are migrating to a CockroachDB {{ site.data.products.core }} cluster:
- Refer to our [storage recommendations]({% link {{ site.current_cloud_version }}/recommended-production-settings.md %}#storage) to determine the amount of storage to provision on each node.
- For guidance on sizing for connection pools, see the CockroachDB {{ site.data.products.core }} [Production Checklist]({% link {{ site.current_cloud_version }}/recommended-production-settings.md %}#connection-pooling).

### Memory allocation

MOLT Fetch buffers data in memory regardless of the [data path]({% link molt/molt-fetch.md %}#data-path) used. For memory sizing requirements, refer to [Memory requirements]({% link molt/molt-fetch.md %}#memory-requirements).

## Application changes

As you develop your migration plan, consider the application changes that you will need to make. These may include the following changes:
Expand Down
32 changes: 17 additions & 15 deletions src/current/molt/molt-fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,12 +372,6 @@ For example, if you are migrating to CockroachDB {{ site.data.products.cloud }},

`--direct-copy` specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store:

- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight:

~~~
average size of each row * --row-batch-size * --export-concurrency * --table-concurrency
~~~

- Direct copy does not support compression or [continuation](#fetch-continuation).
- The [`--use-copy`](#data-load-mode) flag is redundant with `--direct-copy`.

Expand Down Expand Up @@ -1095,23 +1089,31 @@ To verify that your connections and configuration work properly, run MOLT Fetch

- If a PostgreSQL database is set as a [source](#source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase](#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`](#monitoring) for all exported tables.

### Optimize performance
### Memory requirements

- {% include molt/molt-drop-constraints-indexes.md %}
- MOLT Fetch buffers data in memory before flushing to files or the target database, regardless of which [data path](#data-path) you use. To prevent memory outages during [data export](#data-export-phase), estimate the amount of memory used to export tables:

~~~
average size of each row * --row-batch-size * --export-concurrency * --table-concurrency
~~~

- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance.
If you are exporting multiple tables concurrently (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), estimate the memory requirement using the table with the largest average row size. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding).

- To prevent memory outages during `READ COMMITTED` [data export](#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of the READ COMMITTED detail as it didn't seem that relevant. Let me know if I should add it back.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good without it!

Insufficient memory can cause degraded throughput, process crashes, or migration failures. To prevent memory issues:

~~~
--row-batch-size * --export-concurrency * average size of the table rows
~~~
- Allocate enough memory for MOLT Fetch, based on the preceding formula.
- Adjust [`--row-batch-size`](#global-flags), [`--export-concurrency`](#global-flags), or [`--table-concurrency`](#global-flags) if memory is constrained.
- Monitor memory utilization during the migration.

### Optimize performance

- {% include molt/molt-drop-constraints-indexes.md %}

If you are exporting more than one table at a time (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust `--row-batch-size` accordingly. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding).
- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`](https://www.postgresql.org/docs/current/sql-analyze.html) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance.

- If a table in the source database is much larger than the other tables, [filter and export the largest table](#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task.

- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding).
- Ensure that the machine running MOLT Fetch is equipped to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding).

### Import and continuation handling

Expand Down
Loading