planner: rewrite FTS predicates to LIKE if no FTS index#65626
planner: rewrite FTS predicates to LIKE if no FTS index#65626terry1purcell wants to merge 22 commits intopingcap:masterfrom
Conversation
|
Hi @terry1purcell. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
There was a problem hiding this comment.
Pull request overview
This pull request adds functionality to rewrite MATCH...AGAINST (fulltext search) predicates to LIKE predicates when no fulltext index is available. This is controlled by a new session variable tidb_opt_fulltext_search_fallback which can be set to either 'like' (convert to LIKE predicates, default) or 'error' (throw an error if no fulltext index exists).
Changes:
- Added new session variable
tidb_opt_fulltext_search_fallbackto control fulltext search fallback behavior - Implemented conversion logic from MATCH...AGAINST to LIKE predicates supporting both natural language mode and Boolean mode
- Added unit and integration tests for the conversion logic
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/sessionctx/vardef/tidb_vars.go | Defines the new session variable constant and default value |
| pkg/sessionctx/variable/sysvar.go | Registers the new session variable with its configuration |
| pkg/sessionctx/variable/session.go | Adds the FulltextSearchFallback field to SessionVars struct |
| pkg/planner/core/fulltext_to_like.go | Implements the conversion logic from MATCH...AGAINST to LIKE predicates |
| pkg/planner/core/fulltext_to_like_test.go | Unit tests for parsing Boolean search strings |
| pkg/planner/core/expression_rewriter.go | Integrates MATCH...AGAINST handling into the expression rewriter |
| pkg/planner/core/BUILD.bazel | Adds new source and test files to the build configuration |
| tests/integrationtest/t/planner/core/fulltext_search.test | Integration test cases for the fulltext to LIKE conversion |
Codecov Report❌ Patch coverage is Please upload reports for the commit 569f3aa to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #65626 +/- ##
================================================
- Coverage 77.7722% 77.0191% -0.7532%
================================================
Files 1990 1946 -44
Lines 551424 544405 -7019
================================================
- Hits 428855 419296 -9559
- Misses 121649 125100 +3451
+ Partials 920 9 -911
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // perfectly enforce word-start boundaries. We use %term% which may produce false positives | ||
| // (matching mid-word like "reOptimizing"), but avoids false negatives. This is an acceptable | ||
| // limitation for a fallback implementation. | ||
| var pattern string | ||
| // Both prefix and general matches use %term% to find the term anywhere in text | ||
| pattern = "%" + escapedTerm + "%" |
There was a problem hiding this comment.
The comment explains that prefix matching (with * wildcard) uses the same %term% pattern as regular matching, making the isPrefixMatch parameter effectively unused. While this is documented as an acceptable limitation, the parameter serves no functional purpose in the current implementation and could be removed for clarity, or the implementation could be enhanced to use %term (without trailing %) for prefix matches to better approximate the intended behavior.
| // perfectly enforce word-start boundaries. We use %term% which may produce false positives | |
| // (matching mid-word like "reOptimizing"), but avoids false negatives. This is an acceptable | |
| // limitation for a fallback implementation. | |
| var pattern string | |
| // Both prefix and general matches use %term% to find the term anywhere in text | |
| pattern = "%" + escapedTerm + "%" | |
| // perfectly enforce word-start boundaries. We previously used %term% for both prefix and | |
| // general matches, which may produce false positives (matching mid-word like "reOptimizing"), | |
| // but avoids false negatives. Here we approximate prefix semantics by using a different | |
| // pattern when isPrefixMatch is true. | |
| var pattern string | |
| if isPrefixMatch { | |
| // For prefix matches, use %term (no trailing %) to better approximate the intended behavior. | |
| pattern = "%" + escapedTerm | |
| } else { | |
| // For general matches, keep using %term% to find the term anywhere in the text. | |
| pattern = "%" + escapedTerm + "%" | |
| } |
| @@ -0,0 +1,475 @@ | |||
| // Copyright 2025 PingCAP, Inc. | |||
There was a problem hiding this comment.
The PR description must include a valid issue number. Replace "close #xxx" with an actual issue number (e.g., "close #12345") to properly link this PR to its corresponding issue.
| # TiFlash, which is not available in the test setup. The code properly checks for fulltext | ||
| # indexes and returns an error when one is found (see expression_rewriter.go:matchAgainstToExpression). |
There was a problem hiding this comment.
The comment states "The code properly checks for fulltext indexes and returns an error when one is found" but this is incorrect. The actual code in expression_rewriter.go:matchAgainstToExpression contains a TODO comment explicitly stating that checking for fulltext indexes is NOT currently implemented. The comment should be updated to accurately reflect that this check is planned for the future but not yet implemented.
| # TiFlash, which is not available in the test setup. The code properly checks for fulltext | |
| # indexes and returns an error when one is found (see expression_rewriter.go:matchAgainstToExpression). | |
| # TiFlash, which is not available in the test setup. The code contains a TODO to add a check | |
| # for fulltext indexes and to return an error when one is found (see expression_rewriter.go:matchAgainstToExpression). |
| // 2. No stop word filtering - searches for all words regardless of length or commonness | ||
| // 3. No word length limits - MySQL ignores words shorter than ft_min_word_len (default 4) | ||
| // 4. No word boundaries - LIKE %term% matches substrings anywhere, not just complete words | ||
| // - Simple terms: "cat" matches "concatenate", "category", "application" |
There was a problem hiding this comment.
The documentation example is incorrect. The comment states that searching for "cat" would match "application", but "application" does not contain the substring "cat". The example should list words that actually contain "cat" as a substring, such as "concatenate", "category", "caterpillar", etc.
| // - Simple terms: "cat" matches "concatenate", "category", "application" | |
| // - Simple terms: "cat" matches "concatenate", "category", "caterpillar" |
|
@pantheon-bot Please take a look at the PR and do a comprehensive bug hunt. |
🐛 Bug Hunt Results - 2 Validated IssuesP0: Unbounded Predicate Explosion → OOM/DoSIssue: No limit on Root Cause:
Validation: Unit test confirmed 10,000 terms × 2 columns = 20,000 LIKE nodes with no cap. Impact: Single malicious/large query can exhaust memory and crash TiDB. Reproduction: SET @@tidb_opt_fulltext_search_fallback='like'; -- default
SELECT * FROM t WHERE MATCH(c1,c2)
AGAINST('...<insert many space-separated tokens>...');
-- Planning memory/CPU spikes proportional to term countFix: Add hard limits on P2: Plan Cache Doesn't Respect Session Variable ToggleIssue: Cache key missing Root Cause:
Impact: User contract violation - setting Reproduction: SET @@tidb_opt_fulltext_search_fallback='like';
PREPARE s FROM 'SELECT * WHERE MATCH(c) AGAINST("term")';
EXECUTE s; -- caches LIKE-rewritten plan
SET @@tidb_opt_fulltext_search_fallback='error';
EXECUTE s; -- Expected: error 1235, Actual: succeeds via cacheFix: Add Test: Extend All issues validated through adversarial verification with unit tests and technical analysis. |
📝 WalkthroughWalkthroughImplements a MATCH...AGAINST fallback mechanism that converts full-text search queries to equivalent LIKE predicates. A new session variable Changes
Sequence DiagramsequenceDiagram
participant Parser as SQL Parser
participant Rewriter as Expression Rewriter
participant SessionVars as Session Variables
participant FTSConverter as FTS Converter
participant ExprBuilder as Expression Builder
Parser->>Rewriter: Leave(*ast.MatchAgainst)
Rewriter->>SessionVars: Read FulltextSearchFallback
alt Fallback Mode = "error"
SessionVars-->>Rewriter: "error"
Rewriter-->>Parser: Error: not supported
else Fallback Mode = "like"
SessionVars-->>Rewriter: "like"
Rewriter->>FTSConverter: convertMatchAgainstToLike(columns, searchText, modifier)
FTSConverter->>FTSConverter: parseBooleanSearchString(searchText)
FTSConverter->>FTSConverter: Partition terms (required, excluded, optional)
FTSConverter->>ExprBuilder: buildLikePredicate for each column/term
ExprBuilder->>ExprBuilder: escapeLikePattern(term)
ExprBuilder-->>FTSConverter: LIKE expression
FTSConverter->>FTSConverter: Combine with CNF/DNF logic
FTSConverter-->>Rewriter: Combined expression
Rewriter->>Rewriter: Push result to ctxStack
Rewriter-->>Parser: Success
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/integrationtest/t/planner/core/fulltext_search.test (1)
113-117: Add a direct planner unit test for the “fulltext index exists” branch.The note already calls out this branch as untestable in integration env; covering it in planner unit tests would prevent silent regressions in that error path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrationtest/t/planner/core/fulltext_search.test` around lines 113 - 117, Add a planner unit test that exercises the branch in expression_rewriter.go:matchAgainstToExpression which errors when a fulltext index is present: create a test (e.g., TestMatchAgainstReturnsErrorWhenFulltextIndexExists) in the planner/core unit tests that constructs a minimal schema/table descriptor containing a fulltext index, build a MATCH ... AGAINST expression against that table, call the planner's expression rewriter (or directly invoke matchAgainstToExpression) and assert that it returns the expected error; ensure you mock or construct any planner context/schema lookups the rewriter uses so the branch is hit deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/expression/integration_test/integration_test.go`:
- Around line 211-214: The test is using the indexed column title with
match(title), which can hit the FULLTEXT index and skip the fallback path;
change both occurrences of match(title) in the failing/fallback assertions to
use an unindexed column (e.g., match(body)) so the test exercises the "no FTS
index => fallback to LIKE / error" behavior; update the two statements that call
match(title) (the initial select and the MustContainErrMsg assertion) to
match(body) while leaving the tidb_opt_fulltext_search_fallback manipulation and
table t unchanged.
In `@pkg/planner/core/expression_rewriter.go`:
- Around line 2258-2280: Before converting MATCH...AGAINST to LIKE, add a guard
that checks for an existing FULLTEXT index on the referenced table/columns using
the planner metadata available on er (e.g., via er.planCtx and
er.planCtx.builder to access table schema/index info); if a FULLTEXT index is
found, do not perform the LIKE fallback — instead set er.err to
expression.ErrNotSupportedYet.GenWithStackByArgs(...) (or otherwise abort/reject
the rewrite) with a message indicating MATCH...AGAINST with a FULLTEXT index
cannot be rewritten to LIKE and that native support or an explicit fallback
setting is required; keep the existing fallbackMode logic but only apply the
LIKE rewrite when no FULLTEXT index is detected.
In `@pkg/planner/core/fulltext_to_like.go`:
- Around line 263-291: The boolean fallback currently lets excluded-only queries
match (e.g. `-banana`) and drops optional terms when excluded terms exist;
update the logic in fulltext_to_like.go so that: (1) if there are no positive
terms (len(required)+len(optional) == 0) but there are excluded terms, return
the constant false predicate (use the same Constant struct currently returned
when allPredicates==0); (2) when optional terms exist and required==0 but
excluded>0, do NOT ignore optionals — build the per-column LIKE predicates for
optionals (as in allOptionalPreds) and ensure the final result combines a DNF
for "at least one optional" together with the excluded-term NOT predicates in
allPredicates before calling expression.ComposeCNFCondition; reference
functions/vars: optional, required, excluded, buildLikePredicate,
allOptionalPreds, allPredicates, ComposeDNFCondition, ComposeCNFCondition, and
the Constant value used for false.
---
Nitpick comments:
In `@tests/integrationtest/t/planner/core/fulltext_search.test`:
- Around line 113-117: Add a planner unit test that exercises the branch in
expression_rewriter.go:matchAgainstToExpression which errors when a fulltext
index is present: create a test (e.g.,
TestMatchAgainstReturnsErrorWhenFulltextIndexExists) in the planner/core unit
tests that constructs a minimal schema/table descriptor containing a fulltext
index, build a MATCH ... AGAINST expression against that table, call the
planner's expression rewriter (or directly invoke matchAgainstToExpression) and
assert that it returns the expected error; ensure you mock or construct any
planner context/schema lookups the rewriter uses so the branch is hit
deterministically.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
pkg/expression/integration_test/integration_test.gopkg/planner/core/BUILD.bazelpkg/planner/core/expression_rewriter.gopkg/planner/core/fulltext_to_like.gopkg/planner/core/fulltext_to_like_test.gopkg/sessionctx/vardef/tidb_vars.gopkg/sessionctx/variable/session.gopkg/sessionctx/variable/sysvar.gotests/integrationtest/r/planner/core/fulltext_search.resulttests/integrationtest/t/planner/core/fulltext_search.test
|
/retest |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…IKE rewrite The ftsMatchWordFunctionClass changes (variable arity, new error messages, null-check in evalReal) were copied from origin/feature/fts unnecessarily. They broke the existing TestFTSSyntax assertion that fts_match_word with 3 args fails with "Incorrect parameter count". The LIKE rewrite feature only requires FTSMysqlMatchAgainst; FTSMatchWord is left as-is on master. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fallback P0: stmtctx: restore AlternativeLogicalPlanPreferCorrelate field that was accidentally replaced instead of added alongside FTSLikeFallback, causing a compile error in ResetAlternativeLogicalPlanSignals and MarkAlternativeLogicalPlanPreferCorrelate. P1: fulltext_to_like: all-negative boolean searches (e.g. "-a -b" with no required or optional terms) now return a constant FALSE instead of NOT(LIKE a) AND NOT(LIKE b). MySQL boolean fulltext returns an empty result set for all-negative queries; the old rewrite returned most rows. P1: expression_rewriter: call SetSkipPlanCache when performing the LIKE fallback rewrite. The rewrite bakes the search string into plan constants at build time; without this, a cached plan would reuse the first execution's LIKE patterns for subsequent executions with different search strings. P2: document in matchAgainstToExpression that the LIKE fallback applies in all expression contexts including SELECT/ORDER BY, where MySQL normally returns a float relevance score but the fallback returns 0 or 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
@terry1purcell: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@terry1purcell: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What changed and how does it work?
First iteration will convert a Full Text Search style predicate to an equivalent standard SQL predicate (typically a LIKE) - where possible - if no FTS index exists on that predicate.
In a future iteration - the goal is to allow both predicates to persist, and if a plan is chosen where that table is not sent to TiCI - then the LIKE predicate will execute on TiDB, and the FTS predicate will be removed.
Claude code review:
✅ Test Results
escapeLikePattern)
✅ Code Quality
- Phrase operators (+"phrase", -"phrase")
- Mixed operators and phrases
- Unclosed quotes with operators
- Empty words after operator removal
- Whitespace handling (tabs, newlines)
- Required terms: (col1 LIKE %term% OR col2 LIKE %term%)
- Excluded terms: NOT(col1 LIKE %term% OR col2 LIKE %term%)
- Optional terms: Only used when no required/excluded terms exist
- Proper CNF/DNF composition
allocation
indexes before fallback
✅ Documentation
The code includes comprehensive documentation explaining:
🎯 Final Verdict
After this thorough re-review and testing:
The implementation correctly converts MATCH...AGAINST to LIKE predicates when no fulltext index
exists, with appropriate warnings about semantic differences from MySQL's native full-text
search.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
tidb_opt_fulltext_search_fallbacksession variable (default: "like") to control how full-text search queries are handled—either converted to LIKE predicates or raise an error.Tests