Skip to content

feat: 优化 token_estimator.go 热路径性能,消除每次调用的字符串重建和线性扫描 #3428

Open
GYMmaorui wants to merge 1 commit intoQuantumNous:mainfrom
GYMmaorui:main
Open

feat: 优化 token_estimator.go 热路径性能,消除每次调用的字符串重建和线性扫描 #3428
GYMmaorui wants to merge 1 commit intoQuantumNous:mainfrom
GYMmaorui:main

Conversation

@GYMmaorui
Copy link

@GYMmaorui GYMmaorui commented Mar 24, 2026

Summary

  • isMathSymbol() 中每次调用都重建的 90+ 字符字符串 + O(n) 线性扫描,替换为包级别预计算的 map[rune]struct{} O(1) 查找,miss 路径提升 ~47 倍
  • isURLDelim() 同样改为预计算 map 查找
  • 移除 multipliersMap 上无意义的 sync.RWMutex(该 map 初始化后从未被写入,锁纯属浪费)
  • 简化 getMultipliers() 为直接 map 查找 + fallback,移除冗余 switch
  • 新增 token_estimator_test.go:7 个功能正确性测试 + 15 个基准测试(含新旧实现对比)

Background

isMathSymbol() 在 token 估算的主循环中被高频调用:对每个非字母/非数字/非 CJK/非 emoji/非空格的字符都会执行一次。旧实现每次调用都构造一个包含 90+ 个 Unicode 字符的字符串,然后 range 遍历逐个比较。

在高并发场景下,token 估算有两条触发路径:

  1. 请求侧估算:可通过 CountToken=false 环境变量关闭
  2. 响应侧估算(ResponseText2Usage):无独立开关,当上游未返回 usage 时自动触发

响应侧无法通过配置关闭,只能通过代码优化解决。一个 10KB 的响应文本中如果有 1000 个符号字符,旧实现需要约 90,000 次比较 + 1000 次字符串构造。

Changes

service/token_estimator.go

isMathSymbol() 的字符串替换为包级别预计算的 map:

var mathSymbolSet = func() map[rune]struct{} {
    mathSymbols := "∑∫∂√∞≤≥≠≈±×÷..."
    set := make(map[rune]struct{}, len([]rune(mathSymbols)))
    for _, r := range mathSymbols {
        set[r] = struct{}{}
    }
    return set
}()

func isMathSymbol(r rune) bool {
    if _, ok := mathSymbolSet[r]; ok {
        return true
    }
    ...
}

isURLDelim() 同理。

移除无意义的读写锁(multipliersMap 初始化后只读,全局搜索确认无写入):

func getMultipliers(p Provider) multipliers {
    if m, ok := multipliersMap[p]; ok {
        return m
    }
    return multipliersMap[OpenAI]
}

service/token_estimator_test.go(新增)

  • 7 个功能测试:覆盖 isMathSymbolisURLDelimgetMultipliersEstimateTokenEstimateTokenByModel 的正确性和确定性
  • 15 个基准测试:包含新旧实现直接对比,以及不同文本类型(英文、数学符号密集、混合内容、URL、10KB 大文本)的端到端性能测试

Benchmark

测试环境:Apple M4, Go 1.22

核心函数新旧对比

函数 场景 旧版 新版 提升
isMathSymbol miss(最常见路径) 168 ns/op 3.4 ns/op ~47x
isMathSymbol hit 2.8 ns/op 3.5 ns/op ~1x
isURLDelim miss 5.2 ns/op 4.1 ns/op ~1.3x

miss 是最关键的路径:绝大多数字符(英文、空格、CJK 等)都不是数学符号,每个都走 miss 分支。

端到端 EstimateToken(优化后,0 allocs/op)

输入文本 耗时
英文 ~1KB 6.8 μs
数学符号密集 23 μs
混合内容 12 μs
URL 密集 9.4 μs
大文本 ~10KB 38 μs

Test plan

  • 7 个单元测试全部通过
  • 15 个基准测试全部通过,0 allocs/op
  • go build ./... 编译通过
  • 确定性验证:相同输入重复 10 次,输出一致

Summary by CodeRabbit

  • Performance Improvements

    • Optimized token estimation with improved character classification efficiency for faster processing.
  • Tests

    • Added comprehensive unit tests and benchmarks for token estimation utilities ensuring reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa31675b-c98e-46cb-9627-3831cbd0942c

📥 Commits

Reviewing files that changed from the base of the PR and between 9ae9040 and 2f59856.

📒 Files selected for processing (2)
  • service/token_estimator.go
  • service/token_estimator_test.go

Walkthrough

This pull request optimizes the token estimator module by removing synchronization overhead and replacing linear character-list scans with precomputed lookup sets for O(1) membership checks. It also adds comprehensive unit tests and performance benchmarks to validate the optimized behavior across various input types and model providers.

Changes

Cohort / File(s) Summary
Token Estimator Optimization
service/token_estimator.go
Removed sync.RWMutex locking around multipliersMap access; replaced per-call linear scans through character lists with precomputed mathSymbolSet and urlDelimSet for O(1) lookups; updated isMathSymbol and isURLDelim to use set membership checks.
Test & Benchmark Coverage
service/token_estimator_test.go
Added comprehensive unit tests validating isMathSymbol, isURLDelim, getMultipliers, EstimateToken, and EstimateTokenByModel across multiple input types and providers; included determinism validation and performance benchmarks comparing optimized logic against prior implementations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Locks unlocked and sets now fly,
O(1) lookups reach for the sky!
Tests abound like carrots in rows,
Faster tokens—watch performance grow! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is in Chinese and describes an optimization to token_estimator.go that removes string reconstruction and linear scans. It accurately reflects the main changes: performance optimization of hot paths by eliminating redundant operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@GYMmaorui
Copy link
Author

new api在流量很小的情况下也会出现cpu打满的情况,触发503拒绝。我们使用pprof导出了cpu满载时期的火焰图,由claude分析并修改定位。

@GYMmaorui
Copy link
Author

关联issue:
#1217
#2290

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant