diff --git a/content/en/altinity-kb-useful-queries/detached-parts.md b/content/en/altinity-kb-useful-queries/detached-parts.md index 269e7e6586..95b1a8132a 100644 --- a/content/en/altinity-kb-useful-queries/detached-parts.md +++ b/content/en/altinity-kb-useful-queries/detached-parts.md @@ -15,9 +15,11 @@ This article explains what detached parts are in ClickHouse® (why they appear, Detached parts act like the “Recycle Bin” in Windows. When ClickHouse® deems some data unneeded—often during internal reconciliations at server startup—it moves the data to the detached area instead of deleting it immediately. -Recovery: If you’re missing data due to misconfiguration or an error (such as connecting to the wrong ZooKeeper), check the detached parts. The missing data might be recoverable through manual intervention. +You can perform two main operations with detached parts: -Cleanup: Otherwise, clean up the detached parts periodically to free disk space. +- **Recovery**: If you’re missing data due to misconfiguration or an error (such as connecting to the wrong ZooKeeper), check the detached parts. The missing data might be recoverable through manual intervention. + +- **Cleanup**: Otherwise, clean up the detached parts periodically to free disk space. Regarding detached parts and the absence of an automatic cleanup feature within ClickHouse®: this was a deliberate decision, as there is a possibility that data may appear there due to a bug in ClickHouse®'s code, a hardware error (such as a memory error or disk failure), etc. In such cases, automatic cleanup is not desirable. @@ -37,27 +39,38 @@ ClickHouse® users should monitor for detached parts and act quickly when they a Note on **unexpected** vs **ignored** (simple rule of thumb): **unexpected** is like a “we found this in the attic” tag, while **ignored** is like “we already replaced this, keep it aside.” In ReplicatedMergeTree startup sanity checks, parts that are unexpected relative to ZooKeeper are typically renamed to **ignored**. So a part found on disk but missing in ZooKeeper will usually appear as **ignored**, not **unexpected**, even though **unexpected** is a valid reason in the codebase. -Important distinction for ReplicatedMergeTree: ClickHouse® tracks expected parts from ZooKeeper and unexpected parts found locally. Broken expected parts increment the `max_suspicious_broken_parts` counter (can block startup). Broken unexpected parts use a separate counter and do not block startup. +Important distinction for ReplicatedMergeTree: ClickHouse® tracks expected parts from ZooKeeper and unexpected parts found locally: + - Broken expected parts increment the `max_suspicious_broken_parts` counter (can block startup). + - Broken unexpected parts use a separate counter and do not block startup. -**Safe to delete (after validation):** **ignored**, **clone**. +### Detailed actions based on the `status` of detached parts: -**Temporary - do not delete while in progress:** **attaching**, **deleting**, **tmp-fetch**. -**Investigate before deleting:** **broken**, **broken-on-start**, **broken-from-backup**, **covered-by-broken**, **noquorum**, **merge-not-byte-identical**, **mutate-not-byte-identical**. +- **Safe to delete (after validation):** + - ignored + - clone. -If the `system.part_log` table is enabled you can find some information there. Otherwise you will need to look in `clickhouse-server.log` for what happened when the parts were detached. -If there is another way you could confirm that there is no data loss in the affected tables, you could simply delete all detached parts. +- **Temporary, do not delete while in progress:** + - attaching + - deleting + - tmp-fetch. -Again, it is important to monitor for detached parts and act quickly when they appear. If `clickhouse-server.log` is lost it might be impossible to figure out what happened and why the parts were detached. -You can use `system.asynchronous_metrics` or `system.detached_parts` for monitoring. -```sql -select metric from system.asynchronous_metrics where metric ilike '%detach%' +- **Investigate before deleting:** + - broken + - broken-on-start + - broken-from-backup + - covered-by-broken + - noquorum + - merge-not-byte-identical + - mutate-not-byte-identical -NumberOfDetachedByUserParts -NumberOfDetachedParts -``` -Here is a quick way to find out if you have detached parts along with the reason why. +### Monitoring of detached parts + +You can find information in `clickhouse-server.log`, for what happened when the parts were detached during startup. If `clickhouse-server.log` is lost it might be impossible to figure out what happened and why the parts were detached. + +Also `system.detached_parts` table contains useful information: + ```sql SELECT database, table, reason, count() FROM system.detached_parts @@ -65,60 +78,74 @@ GROUP BY database, table, reason ORDER BY database ASC, table ASC, reason ASC ``` -### drop detached -The DROP DETACHED command in ClickHouse® is used to remove parts or partitions that have previously been detached (i.e., moved to the detached directory and forgotten by the server). The syntax is: +It is important to monitor for detached parts and act quickly when they appear. You can use `system.asynchronous_metric/metric_log` to track some metrics. -``` -ALTER TABLE table_name [ON CLUSTER cluster] DROP DETACHED PARTITION|PART ALL|partition_expr +Use `system.asynchronous_metrics` for current values: + +```sql +SELECT metric, value +FROM system.asynchronous_metrics +WHERE metric IN ('NumberOfDetachedParts', 'NumberOfDetachedByUserParts') +ORDER BY metric; ``` -This command removes the specified part or all parts of the specified partition from the detached directory. For more details on how to specify the partition expression, see the documentation on how to set the partition expression DROP DETACHED PARTITION|PART. +Use `system.asynchronous_metric_log` for history/trends: -Note: You must have the allow_drop_detached setting enabled to use this command allow_drop_detached +```sql +SELECT + event_time, + metric, + value +FROM system.asynchronous_metric_log +WHERE metric IN ('NumberOfDetachedParts', 'NumberOfDetachedByUserParts') + AND event_time > now() - INTERVAL 24 HOUR +ORDER BY event_time DESC, metric; +``` -### drop all script +### DROP DETACHED command +The DROP DETACHED command in ClickHouse® is used to remove parts or partitions that have previously been detached (i.e., moved to the detached directory and forgotten by the server). The syntax is: -Here is a query that can help with investigations. It looks for active parts containing the same data blocks as the detached parts. It -generates commands to drop the detached parts. +{{% alert title="Warning" color="warning" %}} +Be careful before dropping any detached part or partition. Validate that data is no longer needed and keep a backup before running destructive commands. +{{% /alert %}} ```sql -with ['broken','unexpected','noquorum','ignored','broken-on-start','clone','attaching','deleting','tmp-fetch', - 'covered-by-broken','merge-not-byte-identical','mutate-not-byte-identical','broken-from-backup'] as DETACH_REASONS -select a.*, - concat('alter table ',database,'.',table,' drop detached part ''',a.name,''' settings allow_drop_detached=1;') as drop, - concat('sudo rm -r ',a.path) as rm -from (select * replace(part[1] as partition_id, toInt64(part[2]) as min_block_number, toInt64(part[3]) as max_block_number), - arrayFilter(x -> x not in DETACH_REASONS, splitByChar('_',name)) as part -from system.detached_parts) a -left join (select database, table, partition_id, name, active, min_block_number, max_block_number from system.parts where active) b -on a.database=b.database and a.table=b.table and a.partition_id=b.partition_id -where a.min_block_number >= b.min_block_number - and a.max_block_number <= b.max_block_number -order by table, min_block_number, max_block_number -settings join_use_nulls=1 +ALTER TABLE table_name [ON CLUSTER cluster] DROP DETACHED PARTITION|PART ALL|partition_expr ``` -### Other reasons +This command removes the specified part or all parts of the specified partition from the detached directory. For more details on how to specify the partition expression, see the documentation on how to set the partition expression DROP DETACHED PARTITION|PART. -``` -broken -unexpected -ignored -noquorum - merge-not-byte-identical - mutate-not-byte-identical -broken-on-start -broken-from-backup -clone -attaching -deleting -tmp-fetch - covered-by-broken +Note: You must have the `allow_drop_detached` setting enabled to use this command. + +#### DROP ALL DML + +{{% alert title="Warning" color="warning" %}} +Review generated `DROP DETACHED` commands carefully before executing them. They can cause data loss if used incorrectly. Ensure you have a valid backup before destructive operations. +{{% /alert %}} + +Here is a query that can help with investigations. It looks for active parts containing the same data blocks as the detached parts and generates commands to drop the detached parts. + +```sql +SELECT a.*, + concat('ALTER TABLE ',a.database,'.',a.table,' DROP DETACHED PART ''',a.name,''' SETTINGS allow_drop_detached=1;') AS drop, +/* concat('sudo rm -r ',a.path) AS rm */ +FROM system.detached_parts AS a +LEFT JOIN ( + SELECT database, table, partition_id, name, active, min_block_number, max_block_number + FROM system.parts + WHERE active +) b +ON a.database = b.database AND a.table = b.table AND a.partition_id = b.partition_id +WHERE a.min_block_number IS NOT NULL + AND a.max_block_number IS NOT NULL + AND a.min_block_number >= b.min_block_number + AND a.max_block_number <= b.max_block_number +ORDER BY a.table, a.min_block_number, a.max_block_number +SETTINGS join_use_nulls=1 ``` -**covered-by-broken** means ClickHouse® detected a broken part during initialization of a replicated table and decided to refetch it from healthy replicas. The broken part is detached as `broken`, and if that part was a result of merge or mutation, all previous generations are marked `covered-by-broken`. Once the healthy final part is restored, you do not need the `covered-by-broken` parts. -The list of DETACH_REASONS: https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreePartInfo.h#L163 +The list of `DETACH_REASONS`: [MergeTreePartInfo.h#L163](https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreePartInfo.h#L163) ### Further reading @@ -128,16 +155,16 @@ Altinity blog: *Understanding Detached Parts in ClickHouse®* - https://altinity | Detached part type | Source code reference | | --- | --- | -| `broken` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L2306-L2334 | -| `unexpected` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L5389-L5393 | -| `ignored` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeSettings.cpp#L507-L512 | -| `noquorum` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp#L264-L284 | -| `broken-on-start` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L2301-L2399 | -| `clone` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L3510-L3518 | -| `attaching` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L7541-L7671 | -| `deleting` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L7541-L7583 | -| `tmp-fetch` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/DataPartsExchange.cpp#L408-L413 | -| `covered-by-broken` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L4571-L4588 | -| `merge-not-byte-identical` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeFromLogEntryTask.cpp#L441-L443 | -| `mutate-not-byte-identical` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MutateFromLogEntryTask.cpp#L278-L280 | -| `broken-from-backup` | https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L6919-L6934 | +| `broken` | [StorageReplicatedMergeTree.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L2306-L2334) | +| `unexpected` | [MergeTreeData.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L5389-L5393) | +| `ignored` | [MergeTreeSettings.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeSettings.cpp#L507-L512) | +| `noquorum` | [ReplicatedMergeTreeRestartingThread.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp#L264-L284) | +| `broken-on-start` | [MergeTreeData.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L2301-L2399) | +| `clone` | [StorageReplicatedMergeTree.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L3510-L3518) | +| `attaching` | [MergeTreeData.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L7541-L7671) | +| `deleting` | [MergeTreeData.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L7541-L7583) | +| `tmp-fetch` | [DataPartsExchange.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/DataPartsExchange.cpp#L408-L413) | +| `covered-by-broken` | [StorageReplicatedMergeTree.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/StorageReplicatedMergeTree.cpp#L4571-L4588) | +| `merge-not-byte-identical` | [MergeFromLogEntryTask.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeFromLogEntryTask.cpp#L441-L443) | +| `mutate-not-byte-identical` | [MutateFromLogEntryTask.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MutateFromLogEntryTask.cpp#L278-L280) | +| `broken-from-backup` | [MergeTreeData.cpp](https://github.com/ClickHouse/ClickHouse/blob/53e451c70f33f167efe57dbf455ff9776d6e880f/src/Storages/MergeTree/MergeTreeData.cpp#L6919-L6934) |