server: check connection is available in SQLKiller (#60685)#68201
server: check connection is available in SQLKiller (#60685)#68201ti-chi-bot wants to merge 2 commits intopingcap:release-8.5from
Conversation
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
@Defined2014 This PR has conflicts, I have hold it. |
|
This cherry pick PR is for a release branch and has not yet been approved by triage owners. To merge this cherry pick:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
📝 WalkthroughWalkthroughAdds buffered-read connection liveness probing, exposes an IsConnectionAlive callback in SQLKiller that is periodically invoked to interrupt queries when the TCP connection is closed, wires the callback into statement execution paths, and adds an integration test plus helper updates to validate the behavior. ChangesConnection Liveness Detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/util/deeptest/statictesthelper.go (1)
145-158:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
reflect.Value.Elem()panics forreflect.UnsafePointer— the else branch will crash at runtime.Merging
reflect.Ptrandreflect.UnsafePointerinto a single case at line 145 creates a critical bug. Thereflect.Value.Elem()method only acceptsPtrandInterfacekinds; calling it onreflect.UnsafePointercauses an immediate panic. The else branch at lines 156–158 callsvalA.Elem()andvalB.Elem()unconditionally, so anyunsafe.Pointerfield not covered byWithPointerComparePathwill panic when this code path executes.(Note:
IsNil()andPointer()are both safe forUnsafePointer; onlyElem()is the problem.)Since
unsafe.Pointercarries no type information about what it points to, recursive descent viaElem()is impossible anyway. Split the case and omit theElem()call forUnsafePointer:🐛 Proposed fix
- case reflect.Ptr, reflect.UnsafePointer: + case reflect.Ptr: if valA.IsNil() && valB.IsNil() { return } // both of them are not nil require.NotEqual(t, 0, valA.Pointer(), path+" should not be nil") require.NotEqual(t, 0, valB.Pointer(), path+" should not be nil") if h.shouldComparePointer(path) { require.Equal(t, valA.Pointer(), valB.Pointer(), path+" should be the same") } else { require.NotEqual(t, valA.Pointer(), valB.Pointer(), path+" should be different") h.assertDeepClonedEqual(t, valA.Elem(), valB.Elem(), path) } + case reflect.UnsafePointer: + // unsafe.Pointer has no type info; we can only check nil-ness and pointer identity. + if valA.IsNil() && valB.IsNil() { + return + } + require.NotEqual(t, 0, valA.Pointer(), path+" should not be nil") + require.NotEqual(t, 0, valB.Pointer(), path+" should not be nil") + if h.shouldComparePointer(path) { + require.Equal(t, valA.Pointer(), valB.Pointer(), path+" should be the same") + } else { + require.NotEqual(t, valA.Pointer(), valB.Pointer(), path+" should be different") + // Cannot recurse into Elem() for unsafe.Pointer — no type info available. + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/util/deeptest/statictesthelper.go` around lines 145 - 158, The merged reflect.Ptr and reflect.UnsafePointer case in statictesthelper.go causes a panic because reflect.Value.Elem() is invalid for UnsafePointer; update the switch in assertDeepClonedEqual so Ptr stays as-is (use valA.Elem()/valB.Elem() in the else branch when handling reflect.Ptr and still call shouldComparePointer for pointer identity checks), but handle reflect.UnsafePointer in a separate case that only uses IsNil and Pointer() checks and does NOT call valA.Elem()/valB.Elem(); adjust logic around shouldComparePointer(path) for UnsafePointer to either assert equal pointers or assert they differ without descending recursively.pkg/server/tests/commontest/tidb_test.go (1)
3403-3772:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winResolve merge-conflict markers in test file before merge.
Line 3403 through Line 3772 still contains unresolved conflict markers, so the test package cannot compile.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/server/tests/commontest/tidb_test.go` around lines 3403 - 3772, The file contains unresolved Git conflict markers (<<<<<<<, =======, >>>>>>>) around two test blocks; remove the conflict markers and produce a single coherent version containing the intended tests (either keep both TestAuditPluginInfoForStarting/TestAuditPluginRetrying and TestIssue57531, or the correct one per branch), making sure to preserve function bodies and unique symbols (TestAuditPluginInfoForStarting, TestAuditPluginRetrying, TestIssue57531, plus any uses of net, reflect, unsafe, plugin.LoadPluginForTest, servertestkit.CreateTidbTestSuite). After resolving, run gofmt/goimports and go test to ensure the package compiles and there are no missing imports or duplicate definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/server/conn_stmt.go`:
- Around line 236-249: The file contains unresolved merge markers around the
code block in executePlanCacheStmt; remove the conflict markers (<<<<<<<,
=======, >>>>>>>) and keep the intended code that defines fn, uses
cc.bufReadConn to check IsAlive, stores the function into
cc.ctx.GetSessionVars().SQLKiller.IsConnectionAlive, defers storing nil, and
preserves the "//nolint:forcetypeassert" comment; ensure there are no duplicate
or partial copies of this block and that the closure variable names (fn,
cc.bufReadConn, cc.ctx.GetSessionVars().SQLKiller.IsConnectionAlive) remain
unchanged so the file compiles.
In `@pkg/server/internal/util/buffered_read_conn.go`:
- Around line 76-83: In IsAlive (or the method surrounding conn.Peek) change the
return semantics so that a successful conn.Peek (err == nil) yields 1
(alive/responsive), io.EOF and any non-timeout errors yield 0 (dead), and only
net.Error timeouts yield 1 (temporary/unconfirmed); specifically, after calling
conn.Peek(1) return 1 on nil err, return 0 for io.EOF or for errors that are not
net.Error timeouts, and return 1 only when err is a net.Error with
Timeout()==true.
---
Outside diff comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3403-3772: The file contains unresolved Git conflict markers
(<<<<<<<, =======, >>>>>>>) around two test blocks; remove the conflict markers
and produce a single coherent version containing the intended tests (either keep
both TestAuditPluginInfoForStarting/TestAuditPluginRetrying and TestIssue57531,
or the correct one per branch), making sure to preserve function bodies and
unique symbols (TestAuditPluginInfoForStarting, TestAuditPluginRetrying,
TestIssue57531, plus any uses of net, reflect, unsafe, plugin.LoadPluginForTest,
servertestkit.CreateTidbTestSuite). After resolving, run gofmt/goimports and go
test to ensure the package compiles and there are no missing imports or
duplicate definitions.
In `@pkg/util/deeptest/statictesthelper.go`:
- Around line 145-158: The merged reflect.Ptr and reflect.UnsafePointer case in
statictesthelper.go causes a panic because reflect.Value.Elem() is invalid for
UnsafePointer; update the switch in assertDeepClonedEqual so Ptr stays as-is
(use valA.Elem()/valB.Elem() in the else branch when handling reflect.Ptr and
still call shouldComparePointer for pointer identity checks), but handle
reflect.UnsafePointer in a separate case that only uses IsNil and Pointer()
checks and does NOT call valA.Elem()/valB.Elem(); adjust logic around
shouldComparePointer(path) for UnsafePointer to either assert equal pointers or
assert they differ without descending recursively.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cac22fb6-4180-4d7f-b166-a90450c95fc6
📒 Files selected for processing (7)
pkg/server/conn.gopkg/server/conn_stmt.gopkg/server/internal/util/buffered_read_conn.gopkg/server/tests/commontest/tidb_test.gopkg/util/deeptest/statictesthelper.gopkg/util/sqlkiller/BUILD.bazelpkg/util/sqlkiller/sqlkiller.go
Signed-off-by: King-Dylan <702299521@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3729-3786: The test currently waits only for
processlistCount(dbt)==2 which can be true before the long-running query starts;
modify the setup so that before closing netConn you poll the processlist and
wait until the specific session shows a non-empty Info containing "sleep" (e.g.,
query text from SHOW PROCESSLIST or information_schema.processlist) for the
connection created by conn (use its connection id or other unique marker), then
close netConn and assert the goroutine tied to queryDone exits due to that
disconnect; update the require.Eventually call around processlistCount(dbt) to
instead check the presence of the "sleep" Info for the target session and keep
the later assertions (waiting on queryDone and verifying processlistCount
returns to 1) intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 059c85d1-1ee4-4408-9fe0-805462e7e57b
📒 Files selected for processing (4)
pkg/server/conn_stmt.gopkg/server/internal/util/buffered_read_conn.gopkg/server/tests/commontest/tidb_test.gopkg/util/deeptest/statictesthelper.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/server/conn_stmt.go
- pkg/util/deeptest/statictesthelper.go
- pkg/server/internal/util/buffered_read_conn.go
| conn, err := dbt.GetDB().Conn(context.Background()) | ||
| require.NoError(t, err) | ||
| defer func() { | ||
| _ = conn.Close() | ||
| }() | ||
|
|
||
| // get the TCP connection | ||
| err = conn.Raw(func(driverConn any) error { | ||
| v := reflect.ValueOf(driverConn) | ||
| if v.Kind() == reflect.Ptr { | ||
| v = v.Elem() | ||
| } | ||
| f := v.FieldByName("netConn") | ||
| if f.IsValid() && f.Type().Implements(reflect.TypeOf((*net.Conn)(nil)).Elem()) { | ||
| netConn = *(*net.Conn)(unsafe.Pointer(f.UnsafeAddr())) | ||
| } | ||
| return nil | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, netConn) | ||
|
|
||
| // execute `select sleep(300)` | ||
| queryDone := make(chan struct{}) | ||
| go func() { | ||
| defer close(queryDone) | ||
| if i == 0 { | ||
| rows, err := conn.QueryContext(context.Background(), "select sleep(300)") | ||
| if err == nil { | ||
| _ = rows.Close() | ||
| } | ||
| } else { | ||
| stmt, err := conn.PrepareContext(context.Background(), "select sleep(?)") | ||
| if err == nil { | ||
| defer stmt.Close() | ||
| _, _ = stmt.Exec(300) | ||
| } | ||
| } | ||
| }() | ||
|
|
||
| // have two sessions | ||
| require.Eventually(t, func() bool { | ||
| return processlistCount(dbt) == 2 | ||
| }, time.Second, time.Millisecond*10) | ||
|
|
||
| // close tcp connection | ||
| require.NoError(t, netConn.Close()) | ||
|
|
||
| select { | ||
| case <-queryDone: | ||
| case <-time.After(time.Second * 3): | ||
| require.Fail(t, "query did not exit after closing the TCP connection") | ||
| } | ||
| _ = conn.Close() | ||
|
|
||
| // the `select sleep(300)` is killed | ||
| require.Eventually(t, func() bool { | ||
| return processlistCount(dbt) == 1 | ||
| }, time.Second*3, time.Millisecond*10) |
There was a problem hiding this comment.
Wait for the sleep statement itself, not just a second session.
Line 3769 can become true immediately after Line 3729 opens conn, before select sleep(...) is actually running. That makes this a false-positive-prone regression test: it can pass by opening and closing an idle connection, without ever exercising the long-running query path. Please wait until the target connection shows a non-empty Info/select sleep... in SHOW PROCESSLIST (or information_schema.processlist) before closing the socket, and ideally assert that the goroutine exits because of that disconnect.
Suggested tightening
- // have two sessions
- require.Eventually(t, func() bool {
- return processlistCount(dbt) == 2
- }, time.Second, time.Millisecond*10)
+ hasRunningSleep := func() bool {
+ rs := dbt.MustQuery("select info from information_schema.processlist where info like 'select sleep%'")
+ defer func() {
+ require.NoError(t, rs.Close())
+ }()
+ ok := rs.Next()
+ require.NoError(t, rs.Err())
+ return ok
+ }
+ require.Eventually(t, hasRunningSleep, time.Second, time.Millisecond*10)As per coding guidelines, test changes should stay minimal and deterministic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/server/tests/commontest/tidb_test.go` around lines 3729 - 3786, The test
currently waits only for processlistCount(dbt)==2 which can be true before the
long-running query starts; modify the setup so that before closing netConn you
poll the processlist and wait until the specific session shows a non-empty Info
containing "sleep" (e.g., query text from SHOW PROCESSLIST or
information_schema.processlist) for the connection created by conn (use its
connection id or other unique marker), then close netConn and assert the
goroutine tied to queryDone exits due to that disconnect; update the
require.Eventually call around processlistCount(dbt) to instead check the
presence of the "sleep" Info for the target session and keep the later
assertions (waiting on queryDone and verifying processlistCount returns to 1)
intact.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-8.5 #68201 +/- ##
================================================
Coverage ? 55.0256%
================================================
Files ? 1823
Lines ? 655402
Branches ? 0
================================================
Hits ? 360639
Misses ? 267951
Partials ? 26812
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Defined2014 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
This is an automated cherry-pick of #60685
What problem does this PR solve?
Issue Number: close #57531
Problem Summary:
What changed and how does it work?
Check List
Tests
As TiDB can't close connection immediately after killing client process #57531 metioned, run
select sleep(300)and kill the mysql-client. Then the sql can't found byshow processlist.Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit