Skip to content

feat(search): 更精确的资源中文搜索#2631

Open
Pigeon0v0 wants to merge 1 commit intodevfrom
feat/comp-search
Open

feat(search): 更精确的资源中文搜索#2631
Pigeon0v0 wants to merge 1 commit intodevfrom
feat/comp-search

Conversation

@Pigeon0v0
Copy link
Contributor

@Pigeon0v0 Pigeon0v0 commented Mar 21, 2026

部分代码来自主线 2.12.3 更新

来自 Sourcery 的总结

在改进相关元数据处理和日志记录的同时,提高中文资源搜索的准确性和搜索权重。

新功能:

  • 为中文搜索引入备用的 CurseForge 搜索文本,以更好地匹配 CurseForge API 的行为。

缺陷修复:

  • 防止在解析 CurseForge 和 Modrinth 版本列表时出现重复的 Minecraft 游戏版本。

改进优化:

  • 优化中文搜索词的提取和权重分配,从中获取更准确的英文关键词,用于跨平台的 Mod 搜索。
  • 将加权搜索来源模型进行泛化,以支持每个来源拥有多个别名,并更新所有调用方以使用新的抽象层。
  • 调整项目评分机制,更好地奖励精确匹配,并对相似度贡献进行归一化处理。
  • 改进关键词后处理逻辑,包括对 OptiForge 和 OptiFabric 的特殊处理,以及更健壮的噪音词过滤。
  • 在去重日志中增加当前结果数量信息,便于调试。
  • 跳过下载不必要的 Modrinth 更新日志数据,以减小请求负载。
Original summary in English

Summary by Sourcery

Improve Chinese resource search accuracy and search weighting while refining related metadata handling and logging.

New Features:

  • Introduce an alternative CurseForge search text specifically for Chinese searches to better align with CurseForge API behavior.

Bug Fixes:

  • Prevent duplicate Minecraft game versions from appearing in parsed CurseForge and Modrinth version lists.

Enhancements:

  • Refine Chinese search term extraction and weighting to derive more accurate English keywords for Mod search across platforms.
  • Generalize the weighted search source model to support multiple aliases per source and update all callers to use the new abstraction.
  • Adjust project scoring to better reward exact matches and normalize similarity contributions.
  • Improve keyword post-processing, including special handling for OptiForge and OptiFabric, and more robust filtering of noisy terms.
  • Enrich de-duplication logs with current result counts for easier debugging.
  • Skip downloading unnecessary Modrinth changelog data to reduce request payloads.

部分代码来自主线 2.12.3 更新
@pcl-ce-automation pcl-ce-automation bot added 🛠️ 等待审查 Pull Request 已完善,等待维护者或负责人进行代码审查 size: L PR 大小评估:大型 labels Mar 21, 2026
@sourcery-ai
Copy link

sourcery-ai bot commented Mar 21, 2026

审阅者指南

通过引入结构化的 SearchSource 抽象(支持别名)、改进关键词提取和权重分配、增加 CurseForge 特有的搜索文本处理、调优评分逻辑,并做若干相关 API 与数据清理改动,从而优化多语言搜索系统(尤其是中文查询),覆盖模组、收藏、本地文件、存档以及帮助页面等场景。

带 CurseForge 特殊处理的中文模组搜索时序图

sequenceDiagram
    actor User
    participant UI_SearchBox
    participant ModSearchService
    participant CurseForgeAPI
    participant ModrinthAPI

    User->>UI_SearchBox: 输入中文关键词
    UI_SearchBox->>ModSearchService: StartSearch(filter)

    ModSearchService->>ModSearchService: Build SearchEntry list
    ModSearchService->>ModSearchService: Search(entries, SearchText, 40, 0.2)
    ModSearchService->>ModSearchService: ExtractWords() per result
    ModSearchService->>ModSearchService: Aggregate WordWeights
    ModSearchService->>ModSearchService: Choose SearchText and CurseForgeAltSearchText
    ModSearchService->>ModSearchService: processKeywords(SearchText)
    ModSearchService->>ModSearchService: processKeywords(CurseForgeAltSearchText)

    ModSearchService->>CurseForgeAPI: GET /mods?searchFilter=CurseForgeAltSearchText
    ModSearchService->>ModrinthAPI: GET /search?query=SearchText

    CurseForgeAPI-->>ModSearchService: CurseForge results
    ModrinthAPI-->>ModSearchService: Modrinth results

    ModSearchService->>UI_SearchBox: Display merged search results
Loading

带有 SearchSource 和 SearchEntry 的更新搜索模型类图

classDiagram
    class SearchEntry_T_ {
        +T Item
        +List_SearchSource_ SearchSource
        +double Similarity
        +bool AbsoluteRight
    }

    class SearchSource {
        +string[] Aliases
        +double Weight
        +SearchSource(aliases string[], weight double)
        +SearchSource(text string, weight double)
    }

    class SearchModule {
        +double SearchSimilarityWeighted(source List_SearchSource_, query string)
        +List_SearchEntry_T_ Search(entries List_SearchEntry_T_, query string, maxBlurCount int, minBlurSimilarity double)
    }

    class CompSearchRequest {
        +string SearchText
        +string CurseForgeAltSearchText
    }

    SearchEntry_T_ --> SearchSource : uses *
    SearchModule --> SearchSource : weights
    SearchModule --> SearchEntry_T_ : evaluates
    SearchModule --> CompSearchRequest : fills
Loading

文件级改动

Change Details Files
引入 SearchSource 抽象并更新加权相似度/搜索逻辑,以支持每个文本来源拥有多个别名。
  • 将 SearchEntry 及所有调用点中的 SearchSource 从 List(Of KeyValuePair(Of String, Double)) 替换为 List(Of SearchSource)。
  • 实现带别名数组和权重的 SearchSource 类,并提供基于文本和基于别名数组的构造函数。
  • 更新 SearchSimilarityWeighted,使其对每个来源使用其别名中的最大相似度,并按来源权重加权。
  • 在 Search 函数中进行精确片段匹配前,对别名进行规范化处理(去掉空格、转小写)。
Plain Craft Launcher 2/Modules/Base/ModBase.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceCompResource.xaml.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceSaves/PageInstanceSavesDatapack.xaml.vb
Plain Craft Launcher 2/Pages/PageTools/PageToolsHelp.xaml.vb
Plain Craft Launcher 2/Pages/PageDownload/PageDownloadCompFavorites.xaml.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceSaves.xaml.vb
通过提取和加权候选英文关键词,以及处理 CurseForge 特有的搜索过滤行为,改进中文模组搜索效果。
  • 为中文名称构建 SearchEntry 的来源列表时,使用别名(主名称与后缀/slug 组合)并分配不同权重。
  • 在中文搜索场景中,增加 Search() 的结果窗口大小,并调整最小相似度阈值。
  • 从顶部搜索结果中提取类英文单词,过滤停用词/数字/特殊情况,并根据相似度累积词权重,对精确别名匹配给予极高权重。
  • 基于加权单词推导 Request.SearchText 和新增的 Request.CurseForgeAltSearchText,对精确匹配与模糊匹配使用不同的选择规则,并记录选出的关键词日志。
  • 引入 processKeywords 辅助函数对关键词进行规范化/过滤,对 SearchText 和 CurseForgeAltSearchText 复用该逻辑,同时保留 OptiForge/OptiFabric 特殊处理。
  • 在构建 CurseForge API 的 searchFilter 时优先使用 CurseForgeAltSearchText,当其为 null 时回退到 SearchText。
Plain Craft Launcher 2/Modules/Minecraft/ModComp.vb
调整组件搜索与元数据处理的评分与结果处理方式。
  • 修改相似度对 Scores 的贡献方式:绝对正确匹配获得固定且强的加分;当首个结果为绝对正确匹配时,其他结果采用不同的相对缩放方式。
  • 在把排好序的 Scores 加入 Storage.Results 前增加中止检查,以尊重任务取消。
  • 改进结果累积日志,增加当前 Storage.Results 数量信息。
  • 在排序和裁剪前,对来自 CurseForge 和 Modrinth 的 GameVersions 列表进行去重。
  • 在 Modrinth 版本 API 请求中增加 include_changelog=false,以减少响应载荷大小。
Plain Craft Launcher 2/Modules/Minecraft/ModComp.vb

可能关联的问题

  • #无编号(mod搜索优化建议): 该 PR 重写并增强中文搜索与关键词权重和匹配逻辑,直接提升 mod 搜索准确度,对应该建议。

提示与命令

与 Sourcery 交互

  • 触发新一轮审查: 在 Pull Request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 在审查评论下请求 Sourcery 从该评论创建 issue。你也可以直接回复审查评论 @sourcery-ai issue 来从该评论创建 issue。
  • 生成 Pull Request 标题: 在 Pull Request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。也可以在 Pull Request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 Pull Request 摘要: 在 Pull Request 正文任意位置写上 @sourcery-ai summary,即可在对应位置生成 PR 摘要。也可以在 Pull Request 中评论 @sourcery-ai summary 来(重新)生成摘要。
  • 生成审阅者指南: 在 Pull Request 中评论 @sourcery-ai guide,即可随时(重新)生成审阅者指南。
  • 一次性解决所有 Sourcery 评论: 在 Pull Request 中评论 @sourcery-ai resolve,即可将所有 Sourcery 评论标记为已解决。如果你已经处理完所有评论且不想再看到它们,这会很有用。
  • 一次性忽略所有 Sourcery 审查: 在 Pull Request 中评论 @sourcery-ai dismiss,即可忽略所有已有的 Sourcery 审查。尤其适用于你想从头开始新的审查时——别忘了再评论 @sourcery-ai review 来触发新一轮审查!

自定义你的使用体验

打开你的 控制面板 以:

  • 启用或禁用审查功能,例如 Sourcery 自动生成的 Pull Request 摘要、审阅者指南等。
  • 更改审查语言。
  • 添加、移除或编辑自定义审查指令。
  • 调整其它审查相关设置。

获取帮助

Original review guide in English

Reviewer's Guide

Refines the multilingual search system (especially for Chinese queries) across mods, favorites, local files, saves, and help pages by introducing a structured SearchSource abstraction with aliases, improving keyword extraction and weighting, adding CurseForge-specific search text handling, tuning scoring, and making a few related API and data-cleanup adjustments.

Sequence diagram for refined Chinese mod search with CurseForge-specific handling

sequenceDiagram
    actor User
    participant UI_SearchBox
    participant ModSearchService
    participant CurseForgeAPI
    participant ModrinthAPI

    User->>UI_SearchBox: 输入中文关键词
    UI_SearchBox->>ModSearchService: StartSearch(filter)

    ModSearchService->>ModSearchService: Build SearchEntry list
    ModSearchService->>ModSearchService: Search(entries, SearchText, 40, 0.2)
    ModSearchService->>ModSearchService: ExtractWords() per result
    ModSearchService->>ModSearchService: Aggregate WordWeights
    ModSearchService->>ModSearchService: Choose SearchText and CurseForgeAltSearchText
    ModSearchService->>ModSearchService: processKeywords(SearchText)
    ModSearchService->>ModSearchService: processKeywords(CurseForgeAltSearchText)

    ModSearchService->>CurseForgeAPI: GET /mods?searchFilter=CurseForgeAltSearchText
    ModSearchService->>ModrinthAPI: GET /search?query=SearchText

    CurseForgeAPI-->>ModSearchService: CurseForge results
    ModrinthAPI-->>ModSearchService: Modrinth results

    ModSearchService->>UI_SearchBox: Display merged search results
Loading

Class diagram for updated search model with SearchSource and SearchEntry

classDiagram
    class SearchEntry_T_ {
        +T Item
        +List_SearchSource_ SearchSource
        +double Similarity
        +bool AbsoluteRight
    }

    class SearchSource {
        +string[] Aliases
        +double Weight
        +SearchSource(aliases string[], weight double)
        +SearchSource(text string, weight double)
    }

    class SearchModule {
        +double SearchSimilarityWeighted(source List_SearchSource_, query string)
        +List_SearchEntry_T_ Search(entries List_SearchEntry_T_, query string, maxBlurCount int, minBlurSimilarity double)
    }

    class CompSearchRequest {
        +string SearchText
        +string CurseForgeAltSearchText
    }

    SearchEntry_T_ --> SearchSource : uses *
    SearchModule --> SearchSource : weights
    SearchModule --> SearchEntry_T_ : evaluates
    SearchModule --> CompSearchRequest : fills
Loading

File-Level Changes

Change Details Files
Introduce SearchSource abstraction and update weighted similarity/search logic to support multiple aliases per text source.
  • Replace SearchSource from List(Of KeyValuePair(Of String, Double)) to List(Of SearchSource) in SearchEntry and all call sites.
  • Implement SearchSource class with alias array and weight, plus constructors for text and alias arrays.
  • Update SearchSimilarityWeighted to use max similarity over aliases per source, weighted by source weight.
  • Normalize aliases (remove spaces, lowercase) before exact-part matching in the Search function.
Plain Craft Launcher 2/Modules/Base/ModBase.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceCompResource.xaml.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceSaves/PageInstanceSavesDatapack.xaml.vb
Plain Craft Launcher 2/Pages/PageTools/PageToolsHelp.xaml.vb
Plain Craft Launcher 2/Pages/PageDownload/PageDownloadCompFavorites.xaml.vb
Plain Craft Launcher 2/Pages/PageInstance/PageInstanceSaves.xaml.vb
Improve Chinese mod search by extracting and weighting candidate English keywords and handling CurseForge-specific search filter behavior.
  • Build SearchEntry sources for Chinese names using aliases (primary name and suffix/slug combination with different weights).
  • Increase Search() result window and adjust minimum similarity for Chinese search.
  • Extract English-like words from top search results, filter stopwords/numbers/special cases, and accumulate word weights based on similarity, giving very high weight to exact alias matches.
  • Derive Request.SearchText and a new Request.CurseForgeAltSearchText from weighted words, with different selection rules for exact vs fuzzy matches, and log the chosen keywords.
  • Introduce processKeywords helper to normalize/filter keywords, reuse for both SearchText and CurseForgeAltSearchText, and keep OptiForge/OptiFabric special-case handling.
  • Use CurseForgeAltSearchText when building CurseForge API searchFilter, falling back to SearchText when alternative text is null.
Plain Craft Launcher 2/Modules/Minecraft/ModComp.vb
Tune scoring and result handling for component searches and metadata.
  • Change similarity contribution to Scores so that absolute-right matches get a fixed strong bonus and relative scaling differs when top result is absolute-right.
  • Add an abort check before adding sorted Scores to Storage.Results to respect task cancellation.
  • Improve logging of result accumulation to include current Storage.Results count.
  • Deduplicate GameVersions lists from CurseForge and Modrinth before sorting and trimming.
  • Add include_changelog=false to Modrinth versions API request to reduce payload.
Plain Craft Launcher 2/Modules/Minecraft/ModComp.vb

Possibly linked issues

  • #无编号(mod搜索优化建议): 该 PR重写并增强中文搜索与关键词权重和匹配逻辑,直接提升 mod 搜索准确度,对应该建议。

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我在这里给出了一些高层次的反馈:

  • Search 中,processedSources 现在是一个 List(Of String()),并且在构建之后似乎没有再被使用;如果你的意图只是就地规范化 Aliases,可以考虑把当前的 Select 改成对 Entry.SearchSource 做一个简单的 For Each,并完全去掉 processedSources,以避免混淆和不必要的内存分配。
  • ExtractWords lambda 中,条件 If w.Split(" ").Count > 3 AndAlso w.Contains("ftb") Then Return False 永远不可能为真,因为 w 已经是单个 token(不包含空格);建议移除或重写这个检查,让 FTB 的特殊处理逻辑能按预期生效。
  • 新的 SearchSimilarityWeighted 实现假定 totalWeight > 0;如果调用方传入的是空的或全部为 0 的 SearchSource 列表,会导致除以 0 的错误,因此在 totalWeight 为 0 时提前返回 0 可能会更安全。
给 AI 智能体的提示
Please address the comments from this code review:

## Overall Comments
-`Search` 中,`processedSources` 现在是一个 `List(Of String())`,并且在构建之后似乎没有再被使用;如果你的意图只是就地规范化 `Aliases`,可以考虑把当前的 `Select` 改成对 `Entry.SearchSource` 做一个简单的 `For Each`,并完全去掉 `processedSources`,以避免混淆和不必要的内存分配。
-`ExtractWords` lambda 中,条件 `If w.Split(" ").Count > 3 AndAlso w.Contains("ftb") Then Return False` 永远不可能为真,因为 `w` 已经是单个 token(不包含空格);建议移除或重写这个检查,让 FTB 的特殊处理逻辑能按预期生效。
- 新的 `SearchSimilarityWeighted` 实现假定 `totalWeight > 0`;如果调用方传入的是空的或全部为 0 的 `SearchSource` 列表,会导致除以 0 的错误,因此在 `totalWeight` 为 0 时提前返回 0 可能会更安全。

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点击 👍 或 👎,我会根据你的反馈来改进后续的评审。
Original comment in English

Hey - I've left some high level feedback:

  • In Search, processedSources is now a List(Of String()) and appears to be unused after being built; if the intent is only to normalize Aliases in place, consider replacing the Select with a simple For Each over Entry.SearchSource and dropping processedSources altogether to avoid confusion and unnecessary allocations.
  • In the ExtractWords lambda, the condition If w.Split(" ").Count > 3 AndAlso w.Contains("ftb") Then Return False will never be true because w is already a single token (no spaces); consider removing or rewriting this check so the FTB special case actually works as intended.
  • The new SearchSimilarityWeighted implementation assumes totalWeight > 0; if a caller ever passes an empty or all‑zero SearchSource list this will cause a divide‑by‑zero, so it may be safer to early‑return 0 when totalWeight is 0.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `Search`, `processedSources` is now a `List(Of String())` and appears to be unused after being built; if the intent is only to normalize `Aliases` in place, consider replacing the `Select` with a simple `For Each` over `Entry.SearchSource` and dropping `processedSources` altogether to avoid confusion and unnecessary allocations.
- In the `ExtractWords` lambda, the condition `If w.Split(" ").Count > 3 AndAlso w.Contains("ftb") Then Return False` will never be true because `w` is already a single token (no spaces); consider removing or rewriting this check so the FTB special case actually works as intended.
- The new `SearchSimilarityWeighted` implementation assumes `totalWeight > 0`; if a caller ever passes an empty or all‑zero `SearchSource` list this will cause a divide‑by‑zero, so it may be safer to early‑return 0 when `totalWeight` is 0.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: L PR 大小评估:大型 🛠️ 等待审查 Pull Request 已完善,等待维护者或负责人进行代码审查

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant