expression, planner: allow more MATCH AGAINST boolean queries (#67242) | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#67266
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: winoros The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
📝 WalkthroughWalkthroughModified MATCH...AGAINST boolean group rewriting to conditionally include SHOULD terms based on MUST term presence. Removed prior validation restrictions on SHOULD-with-MUST/MUST-NOT combinations and multiple SHOULD-only terms. Updated tests to verify successful plan generation for previously rejected multi-term queries. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-fts-202602 #67266 +/- ##
=======================================================
Coverage ? 75.1052%
=======================================================
Files ? 1936
Lines ? 556494
Branches ? 0
=======================================================
Hits ? 417956
Misses ? 138532
Partials ? 6
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pkg/planner/core/casetest/tici/tici_test.go (1)
265-267: Use a narrower negative match than plainworld.
CheckNotContain("world")is broader than the behavior under test and can start failing on unrelated explain-output changes. Matching the specific helper fragment instead, e.g.fts_match_word("world", test.t1.title), would keep this regression focused on the rewrite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/casetest/tici/tici_test.go` around lines 265 - 267, The negative assertion in the tk.MustQuery(...) call is too broad: replace CheckNotContain("world") with a narrower negative match targeting the specific explain helper fragment (e.g. CheckNotContain(`fts_match_word("world", test.t1.title)`)); update the assertion that follows the explain for the query using match(title) against ('+hello world' IN BOOLEAN MODE) so it only rejects the exact helper output rather than any occurrence of "world", keeping the test focused on the rewrite logic around tk.MustQuery and CheckNotContain for the t1.title FTS helper.pkg/expression/fts_helper.go (1)
334-367: Capture the SHOULD slice boundary from the built slice.
startIdx := len(group.Must) + len(group.MustNot)assumes every earlier clause contributes exactly one appended expression. That is true today, but it is a subtle invariant across recursive rewrites. Recording the boundary fromlen(searchFuncs)before appendinggroup.Shouldwould keep the compaction logic local to the actual appends and make this path less fragile if clause rewriting ever stops being 1:1.Suggested refactor
- if includeShouldInFilter { - for _, item := range group.Should { - f, err := rewriteBooleanClauseToFTSExpr(bctx, matchCols, item, parserType) - if err != nil { - return nil, err - } - searchFuncs = append(searchFuncs, f) - } - } - if includeShouldInFilter && len(group.Should) > 0 { - startIdx := len(group.Must) + len(group.MustNot) - endIdx := startIdx + len(group.Should) - orShould := ComposeDNFCondition(bctx, searchFuncs[startIdx:endIdx]...) - searchFuncs[startIdx] = orShould - searchFuncs = searchFuncs[:startIdx+1] + if includeShouldInFilter { + shouldStart := len(searchFuncs) + for _, item := range group.Should { + f, err := rewriteBooleanClauseToFTSExpr(bctx, matchCols, item, parserType) + if err != nil { + return nil, err + } + searchFuncs = append(searchFuncs, f) + } + if len(group.Should) > 0 { + orShould := ComposeDNFCondition(bctx, searchFuncs[shouldStart:]...) + searchFuncs[shouldStart] = orShould + searchFuncs = searchFuncs[:shouldStart+1] + } }As per coding guidelines, code SHOULD be self-documenting through clear naming and structure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/expression/fts_helper.go` around lines 334 - 367, The code assumes earlier clauses append exactly one expression by computing startIdx := len(group.Must) + len(group.MustNot); instead capture the actual boundary before appending the SHOULD items: record startIdx := len(searchFuncs) right before the loop that appends group.Should, then append each rewritten expression via rewriteBooleanClauseToFTSExpr into searchFuncs and compute endIdx := len(searchFuncs) after the loop; pass searchFuncs[startIdx:endIdx] to ComposeDNFCondition and replace searchFuncs[startIdx] with the composed OR and truncate searchFuncs to startIdx+1—this keeps the boundary tied to the real appends and references symbols searchFuncs, group.Should, rewriteBooleanClauseToFTSExpr, includeShouldInFilter, and ComposeDNFCondition.pkg/expression/util_test.go (1)
799-818: Add one mixedSHOULD+MUST NOTregression here.These new assertions pin down pure
SHOULDandMUST+SHOULD, but not the new path where negative terms must survive SHOULD compaction. A case likehello -worldand/or+hello world -bananawould cover thestartIdx := len(group.Must)+len(group.MustNot)slice logic and catch accidental dropping ofMUST NOTfilters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/expression/util_test.go` around lines 799 - 818, Add a regression test that covers mixed SHOULD + MUST NOT cases so negative terms survive SHOULD compaction: call RewriteMySQLMatchAgainstRecursively with buildMatchAgainst("hello -world") and with buildMatchAgainst("+hello world -banana"), require no error and hasMySQLMatchAgainst(expr) is false, then use collectFTSLeaves(expr) (and the same root/type checks as the other tests) to assert the returned []ftsLeaf contains entries for both positive terms (e.g., "hello", "world" or "hello","world") and the negative terms (e.g., "world", "banana") so that MUST NOT leaves are not dropped; reference functions RewriteMySQLMatchAgainstRecursively, buildMatchAgainst, hasMySQLMatchAgainst, collectFTSLeaves and the ftsLeaf struct when adding the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/expression/fts_helper.go`:
- Around line 334-367: The code assumes earlier clauses append exactly one
expression by computing startIdx := len(group.Must) + len(group.MustNot);
instead capture the actual boundary before appending the SHOULD items: record
startIdx := len(searchFuncs) right before the loop that appends group.Should,
then append each rewritten expression via rewriteBooleanClauseToFTSExpr into
searchFuncs and compute endIdx := len(searchFuncs) after the loop; pass
searchFuncs[startIdx:endIdx] to ComposeDNFCondition and replace
searchFuncs[startIdx] with the composed OR and truncate searchFuncs to
startIdx+1—this keeps the boundary tied to the real appends and references
symbols searchFuncs, group.Should, rewriteBooleanClauseToFTSExpr,
includeShouldInFilter, and ComposeDNFCondition.
In `@pkg/expression/util_test.go`:
- Around line 799-818: Add a regression test that covers mixed SHOULD + MUST NOT
cases so negative terms survive SHOULD compaction: call
RewriteMySQLMatchAgainstRecursively with buildMatchAgainst("hello -world") and
with buildMatchAgainst("+hello world -banana"), require no error and
hasMySQLMatchAgainst(expr) is false, then use collectFTSLeaves(expr) (and the
same root/type checks as the other tests) to assert the returned []ftsLeaf
contains entries for both positive terms (e.g., "hello", "world" or
"hello","world") and the negative terms (e.g., "world", "banana") so that MUST
NOT leaves are not dropped; reference functions
RewriteMySQLMatchAgainstRecursively, buildMatchAgainst, hasMySQLMatchAgainst,
collectFTSLeaves and the ftsLeaf struct when adding the assertions.
In `@pkg/planner/core/casetest/tici/tici_test.go`:
- Around line 265-267: The negative assertion in the tk.MustQuery(...) call is
too broad: replace CheckNotContain("world") with a narrower negative match
targeting the specific explain helper fragment (e.g.
CheckNotContain(`fts_match_word("world", test.t1.title)`)); update the assertion
that follows the explain for the query using match(title) against ('+hello
world' IN BOOLEAN MODE) so it only rejects the exact helper output rather than
any occurrence of "world", keeping the test focused on the rewrite logic around
tk.MustQuery and CheckNotContain for the t1.title FTS helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dd7ba501-25f5-4f2c-8f26-beaa7f725750
📒 Files selected for processing (3)
pkg/expression/fts_helper.gopkg/expression/util_test.gopkg/planner/core/casetest/tici/tici_test.go
|
/retest |
2 similar comments
|
/retest |
|
/retest |
This is an automated cherry-pick of #67242
What problem does this PR solve?
Issue Number: ref #65626
Problem Summary:
MATCH ... AGAINST (... IN BOOLEAN MODE)still rejected some valid boolean queries during TiCI rewrite. Queries likehello worldand+hello worldreturnedTiDB only supports multiple terms with +/- modifiers.What changed and how does it work?
Check List
Tests
go test -run TestRewriteMySQLMatchAgainst --tags=intest ./pkg/expressiongo test -run TestTiCIMatchAgainstValidation --tags=intest ./pkg/planner/core/casetest/ticiSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Bug Fixes