Skip to content

[regression-test](streaming-job) add cdc cases for source/jdbc timezone and TIMESTAMP/timestamptz pk#63543

Open
JNSimba wants to merge 1 commit into
apache:masterfrom
JNSimba:feat/streaming-mysql-source-timezone-case
Open

[regression-test](streaming-job) add cdc cases for source/jdbc timezone and TIMESTAMP/timestamptz pk#63543
JNSimba wants to merge 1 commit into
apache:masterfrom
JNSimba:feat/streaming-mysql-source-timezone-case

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented May 22, 2026

What problem does this PR solve?

Add CDC streaming-job regression coverage for timezone behavior and TIMESTAMP / timestamptz chunk-key paths that the existing suites do not exercise.

Cases added

Case Tables Guards
*_source_timezone (mysql + pg) TIMESTAMP/timestamptz multi-precision + DATETIME/timestamp + DATE/date (pg: + timetz) source-side multi-tz INSERT (+08 / -05 / UTC), NULL, UPDATE crossing tz, epoch boundary; pg timetz column kept as a regression guard for the upstream JVM-tz handling
*_jdbc_servertimezone (mysql + pg) TIMESTAMP/timestamptz + DATETIME/timestamp recommended end-to-end config — align jdbc_url's serverTimezone/timezone with Doris session time_zone (read at runtime so it works on any default tz)
*_timestamp_pk (mysql + pg) mysql TIMESTAMP(6) + composite (TIMESTAMP, id); pg timestamp(6) + timestamptz(6) + composite (timestamptz, id) chunk-key split + INSERT/UPDATE/DELETE locating on TIMESTAMP/timestamptz PK; depends on #63471 for LocalDateTime / OffsetDateTime chunk-bound restore in AbstractCdcSourceReader.convertBound

Release note

Add CDC streaming-job regression suites for source/jdbc timezone and TIMESTAMP / timestamptz chunk-key.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test
    • No need to test
  • Behavior changed:

    • No
    • Yes
  • Does this need documentation?

    • No
    • Yes

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 22, 2026

run buildall

@JNSimba JNSimba requested a review from Copilot May 22, 2026 10:59
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 22, 2026

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found test-stability issues that should be fixed before this regression-only PR is merged.

Critical checkpoint conclusions:

  • Goal/test proof: The PR aims to add CDC timezone and timestamp/timestamptz PK regression coverage. The coverage is useful, but two suites can fail for reasons unrelated to the intended behavior.
  • Scope/focus: The change is focused on regression tests and expected result files.
  • Concurrency/lifecycle/config/storage/transaction compatibility: Not applicable; this PR does not change production concurrency, lifecycle, config, persistence, or storage behavior.
  • Parallel code paths: MySQL and PostgreSQL variants are mostly covered in parallel; one timezone-output stability concern applies to both jdbc_servertimezone suites.
  • Test coverage/results: Coverage is broad, but test_streaming_postgres_job_source_timezone includes a timetz expectation while current CDC conversion still lacks explicit io.debezium.time.ZonedTime handling, and the jdbc timezone suites derive runtime timezone-dependent output while committing fixed +08:00 .out files.
  • Observability/performance: Not applicable for test-only changes beyond existing Awaitility diagnostics.

User focus: No additional user-provided review focus was supplied.

tstz0 timestamptz(0),
tstz3 timestamptz(3),
tstz6 timestamptz(6),
ttz time with time zone,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds time with time zone to a P0 regression suite, but the current CDC converter still does not have a specific branch for Debezium io.debezium.time.ZonedTime (the nearby comment also says the expected values assume an upstream-fix behavior). In the current code path, named schemas not matched by Time, Timestamp, ZonedTimestamp, etc. fall through to dbzObj.toString(), so this test can fail or encode the offset differently from the committed .out (+08 / -05). Please either add/land the actual ZonedTime conversion before enabling this assertion, or remove ttz from this P0 case until the behavior is implemented.

String driver_url = "https://${bucket}.${s3_endpoint}/regression/jdbc_driver/mysql-connector-j-8.4.0.jar"

// Read Doris session tz so the cdc job aligns with it.
def dorisTz = (sql "select @@time_zone")[0][0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test reads @@time_zone at runtime and uses it as serverTimezone, but the committed .out is fixed for Doris +08:00 (ts0 is expected as 2024-06-15T18:00). On any runner whose Doris session timezone is not +08:00, the CDC job will correctly render a different wall clock and this regression will fail. The PostgreSQL jdbc_servertimezone case has the same pattern. Please make the suite deterministic, for example by setting the Doris session timezone to the value used by the .out before reading it, or by using a fixed timezone in both the URL and expected output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants