fix: cast enums to integers during qrep for mysql without binlog metadata …#4057
fix: cast enums to integers during qrep for mysql without binlog metadata …#4057
Conversation
|
|
||
| func buildSelectedColumns(cols []*protos.FieldDescription, exclude []string, isBinlogMetadataSupported bool) string { | ||
| columns := []string{} | ||
| selectAsterisk := true |
There was a problem hiding this comment.
This is not really necessary (we could always query columns explicitly).
But I decided that this way it will be consistent with the current behaviour (no transformations -> select *)
There was a problem hiding this comment.
Pull request overview
This PR addresses #3950 by making MySQL QRep snapshot output for ENUM columns consistent with CDC output on MySQL/MariaDB versions that don’t support binlog_row_metadata=full.
Changes:
- Add enum-to-integer casting during MySQL QRep snapshot when binlog row metadata isn’t supported.
- Add unit coverage for the column-selection logic used by QRep.
- Add an e2e test intended to validate snapshot/CDC enum consistency into ClickHouse.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| flow/connectors/mysql/qrep.go | Introduces buildSelectedColumns and uses it in QRep pulls; adds enum casting when binlog metadata is unsupported. |
| flow/connectors/mysql/mysql.go | Adds IsBinlogMetadataSupported helper to centralize version gating logic. |
| flow/connectors/mysql/cdc.go | Reuses IsBinlogMetadataSupported in CDC instead of duplicating version compare logic. |
| flow/connectors/mysql/qrep_test.go | Adds unit tests for buildSelectedColumns. |
| flow/e2e/clickhouse_mysql_test.go | Adds an e2e test for snapshot vs CDC enum consistency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
❌ 4 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
🔄 Flaky Test DetectedAnalysis: Both MySQL_CH and MySQL_CH_Cluster variants of Test_MySQL_Enum_Consistency timed out waiting for STATUS_SNAPSHOT to complete, a classic transient CI timing failure in end-to-end tests with external dependencies. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Test_MySQL_Enum_Consistency timed out waiting for STATUS_SNAPSHOT in the e2e MySQL→ClickHouse test suite, a transient timing failure not indicative of a code defect. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Both ✅ Automatically retrying the workflow |
handle unsigned enum in QValueFromMysqlFieldValue
🔄 Flaky Test DetectedAnalysis: Both Test_MySQL_Enum_Consistency tests failed with "UNEXPECTED STATUS TIMEOUT STATUS_SNAPSHOT", indicating the MySQL→ClickHouse snapshot phase timed out — a classic intermittent e2e timing failure under CI load rather than a code regression. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: The e2e test suite timed out at exactly 900 seconds (the configured ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Both ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Test_Types_CH fails due to a 1-microsecond time precision mismatch (QValueTime 18h30m29.962609s vs .962608s) between PostgreSQL and ClickHouse time values, a sub-microsecond rounding sensitivity that is not deterministic across runs. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Test_Types_CH timed out after 200s in a WaitFor polling loop waiting for replicated data to match in ClickHouse, with no actual assertion mismatch — classic flaky behavior in timing-sensitive e2e replication tests under high concurrency. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: TestApiMongo/TestFlowStatusUpdate failed with "Failed to connect temporal client: failed reaching server: context deadline exceeded" — a transient Temporal server connectivity timeout after the test suite ran for ~750 seconds under high concurrency, not a code regression. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: The e2e test ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Two e2e tests failed due to timing issues: a snapshot phase timeout in TestGenericBQ/Test_Simple_Flow and a race condition during teardown (active replication slot) in TestPeerFlowE2ETestSuitePG_CH/Test_Normalize_Metadata_With_Retry. ✅ Automatically retrying the workflow |
| v := fv.AsUint64() | ||
| switch qkind { | ||
| case types.QValueKindEnum: | ||
| return types.QValueEnum{Val: strconv.FormatUint(v, 10)}, nil |
There was a problem hiding this comment.
Right now the values would be converted to strings and we're really making them integers on purpose. Let's introduce a QValueKindUint16Enum and return it from QkindFromMysqlColumnType in type_conversion.go, propagating all the necessary information there.
If something's confusing, I gave it a shot on uint16enum branch, that should be complete (but you don't have to look there :)
…nums are streamed as integers)
🔄 Flaky Test DetectedAnalysis: The e2e ClickHouse test failed due to a 1-microsecond precision difference in a time value comparison (18h46m45.972254s vs 18h46m45.972253s), indicating a non-deterministic rounding issue during time type round-tripping rather than a real bug. ✅ Automatically retrying the workflow |
🔄 Possible Flaky TestAnalysis: The e2e integration test suite failed after 723 seconds (80% of the 900s timeout) against MariaDB, which is consistent with a timeout or race condition in a distributed test environment, but the truncated logs don't expose the specific failure message to confirm definitively. |
…s-mysql-without-metadata-v2
🔄 Flaky Test DetectedAnalysis: TestApiPg/TestResyncFailed failed during teardown due to a race condition where the peer-flow worker was still holding the PostgreSQL replication slot active when the test tried to drop it (SQLSTATE 55006). ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: TestGenericBQ/Test_Inheritance_Table_With_Dynamic_Setting timed out after 197s due to row count mismatch (expected 3, got 8) caused by concurrent test pollution in the shared BigQuery dataset under high parallelism on PG16. ✅ Automatically retrying the workflow |
ilidemi
left a comment
There was a problem hiding this comment.
Awesome progress, only last bit of testing left
| require.NoError(s.t, err) | ||
| require.Len(s.t, rows.Records, 2) | ||
| require.Equal(s.t, rows.Records[0][0].Value(), rows.Records[1][0].Value(), | ||
| "snapshot and CDC enum values should be consistent") |
There was a problem hiding this comment.
Add an assert for the actual value. Can do
if os.Getenv("CI_MYSQL_VERSION") == "mysql-pos" {
require.Equal(s.t, 1, rows.Records[0][0].Value())
} else {
require.Equal(s.t, 'active', rows.Records[0][0].Value())
}| RequireEnvCanceled(s.t, env) | ||
| } | ||
|
|
||
| func (s ClickHouseSuite) Test_MySQL_Enum_Consistency() { |
There was a problem hiding this comment.
Add a test for the previous version as well (existing mirrors/pipes). Example:
peerdb/flow/e2e/clickhouse_test.go
Line 1804 in 257cc9a
Test itself would be basically the same except for asserting the "active" and "1" depending on the row
…s-mysql-without-metadata-v2
- add test for older mirror versions
❌ Test FailureAnalysis: Real bug: the PR branch |
🔄 Flaky Test DetectedAnalysis: Both failures are timeout-based ("UNEXPECTED STATUS TIMEOUT STATUS_SNAPSHOT" and "UNEXPECTED TIMEOUT finish") in distributed integration tests involving Temporal workflows and ClickHouse, consistent with flaky timing-sensitive behavior rather than a code regression. ✅ Automatically retrying the workflow |
This PR addressed this issue #3950.
It converts enum string values to integers during QRep for MySQL versions without binlog metadata support.