Skip to content

server: check connection is available in SQLKiller (#60685)#68201

Open
ti-chi-bot wants to merge 2 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-60685-to-release-8.5
Open

server: check connection is available in SQLKiller (#60685)#68201
ti-chi-bot wants to merge 2 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-60685-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented May 7, 2026

This is an automated cherry-pick of #60685

What problem does this PR solve?

Issue Number: close #57531

Problem Summary:

What changed and how does it work?

Check List

Tests

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes
    • Improved detection and handling of disconnected client connections so running queries are terminated promptly when clients drop, reducing stranded queries and freeing resources.
  • Tests
    • Added an end-to-end test that verifies killed connections cause long-running queries to stop.
  • Refactor
    • Enhanced connection liveness checks to make disconnection detection more robust.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels May 7, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member Author

@Defined2014 This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 7, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 7, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds buffered-read connection liveness probing, exposes an IsConnectionAlive callback in SQLKiller that is periodically invoked to interrupt queries when the TCP connection is closed, wires the callback into statement execution paths, and adds an integration test plus helper updates to validate the behavior.

Changes

Connection Liveness Detection

Layer / File(s) Summary
BufferedReadConn Liveness Infrastructure
pkg/server/internal/util/buffered_read_conn.go
Adds mu *sync.Mutex, Peek(n int), and IsAlive() which uses a short read deadline and a 1-byte peek to infer TCP connection liveness.
SQLKiller Periodic Check
pkg/util/sqlkiller/sqlkiller.go, pkg/util/sqlkiller/BUILD.bazel
Adds atomic lastCheckTime and IsConnectionAlive function pointer fields, updates HandleSignal to periodically call the callback and set QueryInterrupted when dead; adds //pkg/util/intest to deps.
Server Handler Integration
pkg/server/conn.go, pkg/server/conn_stmt.go
Sets SQLKiller.IsConnectionAlive to a closure over cc.bufReadConn.IsAlive() during statement execution and defers clearing it alongside other SQLKiller session state.
Tests / Helpers
pkg/server/tests/commontest/tidb_test.go, pkg/util/deeptest/statictesthelper.go
Adds TestIssue57531 (extracts underlying net.Conn using unsafe, runs long sleep, closes TCP conn, asserts query is killed) and updates deep-clone helper to treat reflect.UnsafePointer like reflect.Ptr.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • pingcap/tidb#65499: Related — also implements BufferedReadConn.IsAlive and SQLKiller liveness checks.
  • pingcap/tidb#67285: Related — touches SQLKiller-driven stmt cancellation behavior and reset handling.

Suggested labels

cherry-pick-approved, ok-to-test

Suggested reviewers

  • wjhuang2016
  • nolouch
  • windtalker

Poem

🐰 I peek a single byte to see,

if silence means the client flees.
A mutex guards the furtive glance,
SQLKiller waits—then ends the dance.
Queries stop when sockets cease.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly describes the main change: implementing connection availability checking in SQLKiller to handle client disconnections.
Description check ✅ Passed Description covers the issue reference, problem statement, and test approaches, though details on implementation could be more explicit.
Linked Issues check ✅ Passed Changes implement connection liveness detection to close server-side connections immediately after client disconnect, directly addressing issue #57531's requirement.
Out of Scope Changes check ✅ Passed All changes relate to implementing connection liveness detection for the SQLKiller feature; changes to deeptest helper appear minimal and support the main objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.1)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/util/deeptest/statictesthelper.go (1)

145-158: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

reflect.Value.Elem() panics for reflect.UnsafePointer — the else branch will crash at runtime.

Merging reflect.Ptr and reflect.UnsafePointer into a single case at line 145 creates a critical bug. The reflect.Value.Elem() method only accepts Ptr and Interface kinds; calling it on reflect.UnsafePointer causes an immediate panic. The else branch at lines 156–158 calls valA.Elem() and valB.Elem() unconditionally, so any unsafe.Pointer field not covered by WithPointerComparePath will panic when this code path executes.

(Note: IsNil() and Pointer() are both safe for UnsafePointer; only Elem() is the problem.)

Since unsafe.Pointer carries no type information about what it points to, recursive descent via Elem() is impossible anyway. Split the case and omit the Elem() call for UnsafePointer:

🐛 Proposed fix
-	case reflect.Ptr, reflect.UnsafePointer:
+	case reflect.Ptr:
 		if valA.IsNil() && valB.IsNil() {
 			return
 		}
 		// both of them are not nil
 		require.NotEqual(t, 0, valA.Pointer(), path+" should not be nil")
 		require.NotEqual(t, 0, valB.Pointer(), path+" should not be nil")
 
 		if h.shouldComparePointer(path) {
 			require.Equal(t, valA.Pointer(), valB.Pointer(), path+" should be the same")
 		} else {
 			require.NotEqual(t, valA.Pointer(), valB.Pointer(), path+" should be different")
 			h.assertDeepClonedEqual(t, valA.Elem(), valB.Elem(), path)
 		}
+	case reflect.UnsafePointer:
+		// unsafe.Pointer has no type info; we can only check nil-ness and pointer identity.
+		if valA.IsNil() && valB.IsNil() {
+			return
+		}
+		require.NotEqual(t, 0, valA.Pointer(), path+" should not be nil")
+		require.NotEqual(t, 0, valB.Pointer(), path+" should not be nil")
+		if h.shouldComparePointer(path) {
+			require.Equal(t, valA.Pointer(), valB.Pointer(), path+" should be the same")
+		} else {
+			require.NotEqual(t, valA.Pointer(), valB.Pointer(), path+" should be different")
+			// Cannot recurse into Elem() for unsafe.Pointer — no type info available.
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/util/deeptest/statictesthelper.go` around lines 145 - 158, The merged
reflect.Ptr and reflect.UnsafePointer case in statictesthelper.go causes a panic
because reflect.Value.Elem() is invalid for UnsafePointer; update the switch in
assertDeepClonedEqual so Ptr stays as-is (use valA.Elem()/valB.Elem() in the
else branch when handling reflect.Ptr and still call shouldComparePointer for
pointer identity checks), but handle reflect.UnsafePointer in a separate case
that only uses IsNil and Pointer() checks and does NOT call
valA.Elem()/valB.Elem(); adjust logic around shouldComparePointer(path) for
UnsafePointer to either assert equal pointers or assert they differ without
descending recursively.
pkg/server/tests/commontest/tidb_test.go (1)

3403-3772: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Resolve merge-conflict markers in test file before merge.

Line 3403 through Line 3772 still contains unresolved conflict markers, so the test package cannot compile.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/tests/commontest/tidb_test.go` around lines 3403 - 3772, The file
contains unresolved Git conflict markers (<<<<<<<, =======, >>>>>>>) around two
test blocks; remove the conflict markers and produce a single coherent version
containing the intended tests (either keep both
TestAuditPluginInfoForStarting/TestAuditPluginRetrying and TestIssue57531, or
the correct one per branch), making sure to preserve function bodies and unique
symbols (TestAuditPluginInfoForStarting, TestAuditPluginRetrying,
TestIssue57531, plus any uses of net, reflect, unsafe, plugin.LoadPluginForTest,
servertestkit.CreateTidbTestSuite). After resolving, run gofmt/goimports and go
test to ensure the package compiles and there are no missing imports or
duplicate definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/conn_stmt.go`:
- Around line 236-249: The file contains unresolved merge markers around the
code block in executePlanCacheStmt; remove the conflict markers (<<<<<<<,
=======, >>>>>>>) and keep the intended code that defines fn, uses
cc.bufReadConn to check IsAlive, stores the function into
cc.ctx.GetSessionVars().SQLKiller.IsConnectionAlive, defers storing nil, and
preserves the "//nolint:forcetypeassert" comment; ensure there are no duplicate
or partial copies of this block and that the closure variable names (fn,
cc.bufReadConn, cc.ctx.GetSessionVars().SQLKiller.IsConnectionAlive) remain
unchanged so the file compiles.

In `@pkg/server/internal/util/buffered_read_conn.go`:
- Around line 76-83: In IsAlive (or the method surrounding conn.Peek) change the
return semantics so that a successful conn.Peek (err == nil) yields 1
(alive/responsive), io.EOF and any non-timeout errors yield 0 (dead), and only
net.Error timeouts yield 1 (temporary/unconfirmed); specifically, after calling
conn.Peek(1) return 1 on nil err, return 0 for io.EOF or for errors that are not
net.Error timeouts, and return 1 only when err is a net.Error with
Timeout()==true.

---

Outside diff comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3403-3772: The file contains unresolved Git conflict markers
(<<<<<<<, =======, >>>>>>>) around two test blocks; remove the conflict markers
and produce a single coherent version containing the intended tests (either keep
both TestAuditPluginInfoForStarting/TestAuditPluginRetrying and TestIssue57531,
or the correct one per branch), making sure to preserve function bodies and
unique symbols (TestAuditPluginInfoForStarting, TestAuditPluginRetrying,
TestIssue57531, plus any uses of net, reflect, unsafe, plugin.LoadPluginForTest,
servertestkit.CreateTidbTestSuite). After resolving, run gofmt/goimports and go
test to ensure the package compiles and there are no missing imports or
duplicate definitions.

In `@pkg/util/deeptest/statictesthelper.go`:
- Around line 145-158: The merged reflect.Ptr and reflect.UnsafePointer case in
statictesthelper.go causes a panic because reflect.Value.Elem() is invalid for
UnsafePointer; update the switch in assertDeepClonedEqual so Ptr stays as-is
(use valA.Elem()/valB.Elem() in the else branch when handling reflect.Ptr and
still call shouldComparePointer for pointer identity checks), but handle
reflect.UnsafePointer in a separate case that only uses IsNil and Pointer()
checks and does NOT call valA.Elem()/valB.Elem(); adjust logic around
shouldComparePointer(path) for UnsafePointer to either assert equal pointers or
assert they differ without descending recursively.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cac22fb6-4180-4d7f-b166-a90450c95fc6

📥 Commits

Reviewing files that changed from the base of the PR and between a1f00ae and 18bfc8c.

📒 Files selected for processing (7)
  • pkg/server/conn.go
  • pkg/server/conn_stmt.go
  • pkg/server/internal/util/buffered_read_conn.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/util/deeptest/statictesthelper.go
  • pkg/util/sqlkiller/BUILD.bazel
  • pkg/util/sqlkiller/sqlkiller.go

Comment thread pkg/server/conn_stmt.go Outdated
Comment thread pkg/server/internal/util/buffered_read_conn.go
Signed-off-by: King-Dylan <702299521@qq.com>
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 8, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3729-3786: The test currently waits only for
processlistCount(dbt)==2 which can be true before the long-running query starts;
modify the setup so that before closing netConn you poll the processlist and
wait until the specific session shows a non-empty Info containing "sleep" (e.g.,
query text from SHOW PROCESSLIST or information_schema.processlist) for the
connection created by conn (use its connection id or other unique marker), then
close netConn and assert the goroutine tied to queryDone exits due to that
disconnect; update the require.Eventually call around processlistCount(dbt) to
instead check the presence of the "sleep" Info for the target session and keep
the later assertions (waiting on queryDone and verifying processlistCount
returns to 1) intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 059c85d1-1ee4-4408-9fe0-805462e7e57b

📥 Commits

Reviewing files that changed from the base of the PR and between 18bfc8c and 083ea93.

📒 Files selected for processing (4)
  • pkg/server/conn_stmt.go
  • pkg/server/internal/util/buffered_read_conn.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/util/deeptest/statictesthelper.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/server/conn_stmt.go
  • pkg/util/deeptest/statictesthelper.go
  • pkg/server/internal/util/buffered_read_conn.go

Comment on lines +3729 to +3786
conn, err := dbt.GetDB().Conn(context.Background())
require.NoError(t, err)
defer func() {
_ = conn.Close()
}()

// get the TCP connection
err = conn.Raw(func(driverConn any) error {
v := reflect.ValueOf(driverConn)
if v.Kind() == reflect.Ptr {
v = v.Elem()
}
f := v.FieldByName("netConn")
if f.IsValid() && f.Type().Implements(reflect.TypeOf((*net.Conn)(nil)).Elem()) {
netConn = *(*net.Conn)(unsafe.Pointer(f.UnsafeAddr()))
}
return nil
})
require.NoError(t, err)
require.NotNil(t, netConn)

// execute `select sleep(300)`
queryDone := make(chan struct{})
go func() {
defer close(queryDone)
if i == 0 {
rows, err := conn.QueryContext(context.Background(), "select sleep(300)")
if err == nil {
_ = rows.Close()
}
} else {
stmt, err := conn.PrepareContext(context.Background(), "select sleep(?)")
if err == nil {
defer stmt.Close()
_, _ = stmt.Exec(300)
}
}
}()

// have two sessions
require.Eventually(t, func() bool {
return processlistCount(dbt) == 2
}, time.Second, time.Millisecond*10)

// close tcp connection
require.NoError(t, netConn.Close())

select {
case <-queryDone:
case <-time.After(time.Second * 3):
require.Fail(t, "query did not exit after closing the TCP connection")
}
_ = conn.Close()

// the `select sleep(300)` is killed
require.Eventually(t, func() bool {
return processlistCount(dbt) == 1
}, time.Second*3, time.Millisecond*10)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wait for the sleep statement itself, not just a second session.

Line 3769 can become true immediately after Line 3729 opens conn, before select sleep(...) is actually running. That makes this a false-positive-prone regression test: it can pass by opening and closing an idle connection, without ever exercising the long-running query path. Please wait until the target connection shows a non-empty Info/select sleep... in SHOW PROCESSLIST (or information_schema.processlist) before closing the socket, and ideally assert that the goroutine exits because of that disconnect.

Suggested tightening
-			// have two sessions
-			require.Eventually(t, func() bool {
-				return processlistCount(dbt) == 2
-			}, time.Second, time.Millisecond*10)
+			hasRunningSleep := func() bool {
+				rs := dbt.MustQuery("select info from information_schema.processlist where info like 'select sleep%'")
+				defer func() {
+					require.NoError(t, rs.Close())
+				}()
+				ok := rs.Next()
+				require.NoError(t, rs.Err())
+				return ok
+			}
+			require.Eventually(t, hasRunningSleep, time.Second, time.Millisecond*10)

As per coding guidelines, test changes should stay minimal and deterministic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/tests/commontest/tidb_test.go` around lines 3729 - 3786, The test
currently waits only for processlistCount(dbt)==2 which can be true before the
long-running query starts; modify the setup so that before closing netConn you
poll the processlist and wait until the specific session shows a non-empty Info
containing "sleep" (e.g., query text from SHOW PROCESSLIST or
information_schema.processlist) for the connection created by conn (use its
connection id or other unique marker), then close netConn and assert the
goroutine tied to queryDone exits due to that disconnect; update the
require.Eventually call around processlistCount(dbt) to instead check the
presence of the "sleep" Info for the target session and keep the later
assertions (waiting on queryDone and verifying processlistCount returns to 1)
intact.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 79.72973% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5@a1f00ae). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             release-8.5     #68201   +/-   ##
================================================
  Coverage               ?   55.0256%           
================================================
  Files                  ?       1823           
  Lines                  ?     655402           
  Branches               ?          0           
================================================
  Hits                   ?     360639           
  Misses                 ?     267951           
  Partials               ?      26812           
Flag Coverage Δ
integration 38.2095% <54.8387%> (?)
unit 64.9684% <79.7297%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8824% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 54.6208% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Defined2014

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 8, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 8, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-08 01:31:00.50222899 +0000 UTC m=+404133.375578962: ☑️ agreed by Defined2014.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved do-not-merge/cherry-pick-not-approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants