Skip to content

expression, planner: allow more MATCH AGAINST boolean queries (#67242) | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#67266

Merged
winoros merged 1 commit intopingcap:release-fts-202602from
ti-chi-bot:cherry-pick-67242-to-release-fts-202602
Mar 25, 2026
Merged

expression, planner: allow more MATCH AGAINST boolean queries (#67242) | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#67266
winoros merged 1 commit intopingcap:release-fts-202602from
ti-chi-bot:cherry-pick-67242-to-release-fts-202602

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

@ti-chi-bot ti-chi-bot commented Mar 24, 2026

This is an automated cherry-pick of #67242

What problem does this PR solve?

Issue Number: ref #65626

Problem Summary:

MATCH ... AGAINST (... IN BOOLEAN MODE) still rejected some valid boolean queries during TiCI rewrite. Queries like hello world and +hello world returned TiDB only supports multiple terms with +/- modifiers.

What changed and how does it work?

  • Removed the extra BOOLEAN MODE validation that rejected multiple SHOULD clauses and mixed MUST/SHOULD clauses.
  • Rewrote pure SHOULD queries into OR-based FTS filters.
  • Kept MUST / MUST NOT filters when SHOULD terms appear in a no-score TiCI query, so optional terms no longer become mandatory filters.
  • Added regression coverage in expression rewrite tests and planner TiCI validation tests.

Check List

Tests

  • Unit test
    go test -run TestRewriteMySQLMatchAgainst --tags=intest ./pkg/expression
    go test -run TestTiCIMatchAgainstValidation --tags=intest ./pkg/planner/core/casetest/tici
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Allow more valid MATCH ... AGAINST BOOLEAN MODE queries during TiCI rewrite, including multiple SHOULD terms and mixed MUST/SHOULD queries.

Summary by CodeRabbit

  • New Features

    • Enhanced fulltext search in BOOLEAN MODE to support more flexible queries with multiple unmodified search terms, eliminating the requirement for explicit +/- modifiers on all terms.
  • Bug Fixes

    • Simplified and relaxed validation logic for fulltext search boolean group handling, removing unnecessary restrictions on query term combinations.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-fts-202602 labels Mar 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: winoros
Once this PR has been reviewed and has the lgtm label, please assign windtalker for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Mar 24, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-24 18:47:08.591399949 +0000 UTC m=+294024.627470209: ☑️ agreed by winoros.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Modified MATCH...AGAINST boolean group rewriting to conditionally include SHOULD terms based on MUST term presence. Removed prior validation restrictions on SHOULD-with-MUST/MUST-NOT combinations and multiple SHOULD-only terms. Updated tests to verify successful plan generation for previously rejected multi-term queries.

Changes

Cohort / File(s) Summary
FTS Boolean Group Rewriting Logic
pkg/expression/fts_helper.go
Modified rewriteBooleanGroupToFTSExpr to conditionally include SHOULD terms only when MUST terms are absent. Updated validateMySQLMatchAgainstBooleanGroup to remove TiDB limitation checks on SHOULD-MUST combinations and multiple SHOULD terms, now validating only non-nil input.
FTS Rewrite Test Expectations
pkg/expression/util_test.go
Updated TestRewriteMySQLMatchAgainst to assert successful rewriting (no errors) and correct FTS structure for multi-term patterns "hello world" and "+hello world", with proper leaf composition using FTSMatchWord and LogicOr nodes.
TiCI Plan Generation Tests
pkg/planner/core/casetest/tici/tici_test.go
Changed TestTiCIMatchAgainstValidation from expecting errors to verifying successful query plan generation for BOOLEAN MODE multi-term queries, validating fts_match_word composition and istrue_with_null guards.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • pingcap/tidb#67242: Makes identical code-level changes to rewriteBooleanGroupToFTSExpr and validateMySQLMatchAgainstBooleanGroup for SHOULD-only rewrite support.
  • pingcap/tidb#67128: Also modifies MATCH...AGAINST rewriting and validation logic in pkg/expression/fts_helper.go.
  • pingcap/tidb#67208: Modifies boolean-group to FTS rewrite logic in rewriteBooleanGroupToFTSExpr and related clause handling.

Suggested labels

approved, lgtm

Suggested reviewers

  • winoros
  • wshwsh12
  • hawkingrei

Poem

🐰 A boolean dream now takes its flight,
SHOULD terms dance when MUST's not in sight,
No more limits on "hello world"—
FTS queries bloom and gently swirl,
TiDB searches now run more right!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description follows the template structure with Issue Number, Problem Summary, What Changed, completed Check List (unit tests marked), Side effects evaluated, and a release note provided.
Title check ✅ Passed The title 'expression, planner: allow more MATCH AGAINST boolean queries' accurately reflects the main change: relaxing validation restrictions to support more valid boolean queries in MATCH AGAINST expressions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-fts-202602@1a76dfb). Learn more about missing BASE report.

Additional details and impacted files
@@                   Coverage Diff                   @@
##             release-fts-202602     #67266   +/-   ##
=======================================================
  Coverage                      ?   75.1052%           
=======================================================
  Files                         ?       1936           
  Lines                         ?     556494           
  Branches                      ?          0           
=======================================================
  Hits                          ?     417956           
  Misses                        ?     138532           
  Partials                      ?          6           
Flag Coverage Δ
integration 39.6360% <0.0000%> (?)
unit 74.1821% <84.6153%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 48.7657% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
pkg/planner/core/casetest/tici/tici_test.go (1)

265-267: Use a narrower negative match than plain world.

CheckNotContain("world") is broader than the behavior under test and can start failing on unrelated explain-output changes. Matching the specific helper fragment instead, e.g. fts_match_word("world", test.t1.title), would keep this regression focused on the rewrite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/casetest/tici/tici_test.go` around lines 265 - 267, The
negative assertion in the tk.MustQuery(...) call is too broad: replace
CheckNotContain("world") with a narrower negative match targeting the specific
explain helper fragment (e.g. CheckNotContain(`fts_match_word("world",
test.t1.title)`)); update the assertion that follows the explain for the query
using match(title) against ('+hello world' IN BOOLEAN MODE) so it only rejects
the exact helper output rather than any occurrence of "world", keeping the test
focused on the rewrite logic around tk.MustQuery and CheckNotContain for the
t1.title FTS helper.
pkg/expression/fts_helper.go (1)

334-367: Capture the SHOULD slice boundary from the built slice.

startIdx := len(group.Must) + len(group.MustNot) assumes every earlier clause contributes exactly one appended expression. That is true today, but it is a subtle invariant across recursive rewrites. Recording the boundary from len(searchFuncs) before appending group.Should would keep the compaction logic local to the actual appends and make this path less fragile if clause rewriting ever stops being 1:1.

Suggested refactor
-	if includeShouldInFilter {
-		for _, item := range group.Should {
-			f, err := rewriteBooleanClauseToFTSExpr(bctx, matchCols, item, parserType)
-			if err != nil {
-				return nil, err
-			}
-			searchFuncs = append(searchFuncs, f)
-		}
-	}
-	if includeShouldInFilter && len(group.Should) > 0 {
-		startIdx := len(group.Must) + len(group.MustNot)
-		endIdx := startIdx + len(group.Should)
-		orShould := ComposeDNFCondition(bctx, searchFuncs[startIdx:endIdx]...)
-		searchFuncs[startIdx] = orShould
-		searchFuncs = searchFuncs[:startIdx+1]
+	if includeShouldInFilter {
+		shouldStart := len(searchFuncs)
+		for _, item := range group.Should {
+			f, err := rewriteBooleanClauseToFTSExpr(bctx, matchCols, item, parserType)
+			if err != nil {
+				return nil, err
+			}
+			searchFuncs = append(searchFuncs, f)
+		}
+		if len(group.Should) > 0 {
+			orShould := ComposeDNFCondition(bctx, searchFuncs[shouldStart:]...)
+			searchFuncs[shouldStart] = orShould
+			searchFuncs = searchFuncs[:shouldStart+1]
+		}
 	}

As per coding guidelines, code SHOULD be self-documenting through clear naming and structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/expression/fts_helper.go` around lines 334 - 367, The code assumes
earlier clauses append exactly one expression by computing startIdx :=
len(group.Must) + len(group.MustNot); instead capture the actual boundary before
appending the SHOULD items: record startIdx := len(searchFuncs) right before the
loop that appends group.Should, then append each rewritten expression via
rewriteBooleanClauseToFTSExpr into searchFuncs and compute endIdx :=
len(searchFuncs) after the loop; pass searchFuncs[startIdx:endIdx] to
ComposeDNFCondition and replace searchFuncs[startIdx] with the composed OR and
truncate searchFuncs to startIdx+1—this keeps the boundary tied to the real
appends and references symbols searchFuncs, group.Should,
rewriteBooleanClauseToFTSExpr, includeShouldInFilter, and ComposeDNFCondition.
pkg/expression/util_test.go (1)

799-818: Add one mixed SHOULD + MUST NOT regression here.

These new assertions pin down pure SHOULD and MUST+SHOULD, but not the new path where negative terms must survive SHOULD compaction. A case like hello -world and/or +hello world -banana would cover the startIdx := len(group.Must)+len(group.MustNot) slice logic and catch accidental dropping of MUST NOT filters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/expression/util_test.go` around lines 799 - 818, Add a regression test
that covers mixed SHOULD + MUST NOT cases so negative terms survive SHOULD
compaction: call RewriteMySQLMatchAgainstRecursively with
buildMatchAgainst("hello -world") and with buildMatchAgainst("+hello world
-banana"), require no error and hasMySQLMatchAgainst(expr) is false, then use
collectFTSLeaves(expr) (and the same root/type checks as the other tests) to
assert the returned []ftsLeaf contains entries for both positive terms (e.g.,
"hello", "world" or "hello","world") and the negative terms (e.g., "world",
"banana") so that MUST NOT leaves are not dropped; reference functions
RewriteMySQLMatchAgainstRecursively, buildMatchAgainst, hasMySQLMatchAgainst,
collectFTSLeaves and the ftsLeaf struct when adding the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/expression/fts_helper.go`:
- Around line 334-367: The code assumes earlier clauses append exactly one
expression by computing startIdx := len(group.Must) + len(group.MustNot);
instead capture the actual boundary before appending the SHOULD items: record
startIdx := len(searchFuncs) right before the loop that appends group.Should,
then append each rewritten expression via rewriteBooleanClauseToFTSExpr into
searchFuncs and compute endIdx := len(searchFuncs) after the loop; pass
searchFuncs[startIdx:endIdx] to ComposeDNFCondition and replace
searchFuncs[startIdx] with the composed OR and truncate searchFuncs to
startIdx+1—this keeps the boundary tied to the real appends and references
symbols searchFuncs, group.Should, rewriteBooleanClauseToFTSExpr,
includeShouldInFilter, and ComposeDNFCondition.

In `@pkg/expression/util_test.go`:
- Around line 799-818: Add a regression test that covers mixed SHOULD + MUST NOT
cases so negative terms survive SHOULD compaction: call
RewriteMySQLMatchAgainstRecursively with buildMatchAgainst("hello -world") and
with buildMatchAgainst("+hello world -banana"), require no error and
hasMySQLMatchAgainst(expr) is false, then use collectFTSLeaves(expr) (and the
same root/type checks as the other tests) to assert the returned []ftsLeaf
contains entries for both positive terms (e.g., "hello", "world" or
"hello","world") and the negative terms (e.g., "world", "banana") so that MUST
NOT leaves are not dropped; reference functions
RewriteMySQLMatchAgainstRecursively, buildMatchAgainst, hasMySQLMatchAgainst,
collectFTSLeaves and the ftsLeaf struct when adding the assertions.

In `@pkg/planner/core/casetest/tici/tici_test.go`:
- Around line 265-267: The negative assertion in the tk.MustQuery(...) call is
too broad: replace CheckNotContain("world") with a narrower negative match
targeting the specific explain helper fragment (e.g.
CheckNotContain(`fts_match_word("world", test.t1.title)`)); update the assertion
that follows the explain for the query using match(title) against ('+hello
world' IN BOOLEAN MODE) so it only rejects the exact helper output rather than
any occurrence of "world", keeping the test focused on the rewrite logic around
tk.MustQuery and CheckNotContain for the t1.title FTS helper.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd7ba501-25f5-4f2c-8f26-beaa7f725750

📥 Commits

Reviewing files that changed from the base of the PR and between 1a76dfb and d44c32f.

📒 Files selected for processing (3)
  • pkg/expression/fts_helper.go
  • pkg/expression/util_test.go
  • pkg/planner/core/casetest/tici/tici_test.go

@winoros winoros changed the title expression, planner: allow more MATCH AGAINST boolean queries | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts (#67242) expression, planner: allow more MATCH AGAINST boolean queries (#67242) | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts Mar 24, 2026
@winoros
Copy link
Copy Markdown
Member

winoros commented Mar 24, 2026

/retest

2 similar comments
@winoros
Copy link
Copy Markdown
Member

winoros commented Mar 25, 2026

/retest

@winoros
Copy link
Copy Markdown
Member

winoros commented Mar 25, 2026

/retest

@winoros winoros removed their assignment Mar 25, 2026
@winoros winoros merged commit 45244ec into pingcap:release-fts-202602 Mar 25, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-fts-202602

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants