Skip to content

Fixing an extra FORMAT JSON in the query when attempting to insert a record#262

Closed
AlexandrFiner wants to merge 1 commit into
smi2:masterfrom
AlexandrFiner:master
Closed

Fixing an extra FORMAT JSON in the query when attempting to insert a record#262
AlexandrFiner wants to merge 1 commit into
smi2:masterfrom
AlexandrFiner:master

Conversation

@AlexandrFiner

Copy link
Copy Markdown

Summary

  • If the query contains the substring ON CLUSTER, an extra FORMAT JSON is added, which breaks the query

Problem

  • Checking for the presence of ON CLUSTER in a query does not take into account that it may be part of the data

Solution

  • A fast stripos() check is used as a first-level filter to detect the presence of ON CLUSTER in the SQL string.
  • preg_match() is only executed when the substring is present, providing a strict validation of the ON CLUSTER keyword.
  • Uses a regex to detect ON CLUSTER only as a SQL keyword, ignoring occurrences inside string literals.

@jspeedz

jspeedz commented May 19, 2026

Copy link
Copy Markdown
Contributor

I was working on something like this patch as a stab at getting this compatible with all possible statements. But I'm not convinced yet that this is the way to go or not. There are also compatibility issues with clickhouse 21 with my patch.

Index: src/Transport/Http.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/Transport/Http.php b/src/Transport/Http.php
--- a/src/Transport/Http.php	(revision dbdffa3c5149ecf10ad0e888d8411fafe8e822f5)
+++ b/src/Transport/Http.php	(date 1779216111407)
@@ -682,22 +682,47 @@
 
         $query = $this->prepareQuery($sql, $bindings);
 
-        if (strpos($sql, 'ON CLUSTER') === false) {
-            return $this->getRequestWrite($query, $querySettings);
-        }
-        if (
-            !str_starts_with($sql, 'CREATE')
-            && !str_starts_with($sql, 'DROP')
-            && !str_starts_with($sql, 'ALTER')
-            && !str_starts_with($sql, 'RENAME')
-            && !str_starts_with($sql, 'GRANT')
-        ) {
+        if ($this->supportsOutputFormat($sql)) {
             $query->setFormat('JSON');
         }
 
         return $this->getRequestWrite($query, $querySettings);
     }
 
+    private function supportsOutputFormat(string $sql): bool
+    {
+        $sql = ltrim($sql);
+
+        if (preg_match(
+            '/^(?:WITH\b.*\bSELECT\b|SELECT\b|SHOW\b|DESC(?:RIBE)?\b|EXISTS\b|CHECK\s+TABLE\b|EXPLAIN\b|'
+            . 'ATTACH\b|DETACH\b|RENAME\b|TRUNCATE\b|UNDROP\b|OPTIMIZE\b|WATCH\b|BACKUP\b|RESTORE\b|SNAPSHOT\b)/ius',
+            $sql
+        )) {
+            return true;
+        }
+
+        if (preg_match('/^KILL\s+QUERY\b/iu', $sql)) {
+            return true;
+        }
+
+        if (preg_match(
+            '/^CREATE\s+(?:(?:OR\s+REPLACE|IF\s+NOT\s+EXISTS)\s+)?'
+            . '(?:USER|ROLE|QUOTA|(?:ROW\s+)?POLICY|(?:SETTINGS\s+)?PROFILE|FUNCTION|NAMED\s+COLLECTION)\b/iu',
+            $sql
+        )) {
+            return false;
+        }
+
+        if (preg_match(
+            '/^(?:ALTER|DROP)\s+(?:USER|ROLE|QUOTA|(?:ROW\s+)?POLICY|(?:SETTINGS\s+)?PROFILE|FUNCTION|NAMED\s+COLLECTION)\b/iu',
+            $sql
+        )) {
+            return false;
+        }
+
+        return preg_match('/^(?:CREATE|ALTER|DROP)\b/iu', $sql) === 1;
+    }
+
     /**
      * @return bool
      * @throws \ClickHouseDB\Exception\TransportException

isublimity added a commit that referenced this pull request Jun 10, 2026
… JSON

Distributed DDL (... ON CLUSTER ...) returns a per-host status result set.
Previously the client appended "FORMAT JSON" to the SQL text, which caused
three problems:

- CREATE USER / GRANT / ALTER ON CLUSTER failed with SYNTAX_ERROR because
  ClickHouse does not accept a FORMAT clause on those statements (#241)
- After the FORMAT-keyword inversion, CREATE/ALTER ON CLUSTER results came
  back unformatted and Statement parsing threw "Can`t find meta" (#263)
- A bare "ON CLUSTER" substring inside INSERT data (e.g. 'REGION CLUSTER')
  was treated as the keyword, breaking the insert (#262)

Now prepareWrite() requests JSON output via the default_format=JSON query
setting (leaving the SQL untouched) and tags the request so the Statement
parses it. Detection uses \b word boundaries to match the real keyword only.

Testing:
- new tests/ClickHouse26/OnClusterTest.php (CREATE/ALTER/CREATE USER ON CLUSTER),
  skips gracefully without a Keeper/ZooKeeper backend
- embedded ClickHouse Keeper config for the test env so the single-node
  `default` cluster can run distributed DDL (docker-compose + CI)
- testInsertWithOnClusterInData on both CH21 and CH26

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@isublimity

Copy link
Copy Markdown
Contributor

Спасибо за разбор и тест-кейс с 'REGION CLUSTER' (regiON CLUSTER) — отличная находка! 🙏

Закрываю в пользу единого фикса в master (a2783e6, релиз 1.26.610), который покрывает этот случай вместе с #241 и #263. Корень всех трёх проблем — дописывание FORMAT JSON в текст SQL. Теперь формат запрашивается через настройку default_format=JSON, а SQL не модифицируется:

Ваш тест testInsertWithOnClusterInData добавлен (в обоих сьютах CH21/CH26).

@isublimity isublimity closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants