Skip to content

fix: 清理 stale channel affinity 并继续正常 fallback#3413

Open
constansino wants to merge 2 commits intoQuantumNous:mainfrom
constansino:fix-stale-channel-affinity-fallback
Open

fix: 清理 stale channel affinity 并继续正常 fallback#3413
constansino wants to merge 2 commits intoQuantumNous:mainfrom
constansino:fix-stale-channel-affinity-fallback

Conversation

@constansino
Copy link

@constansino constansino commented Mar 23, 2026

⚠️ 提交警告 / PR Warning

请注意: 请提供人工撰写的简洁摘要。包含大量 AI 灌水内容、逻辑混乱或无视模版的 PR 可能会被无视或直接关闭


💡 沟通提示 / Pre-submission

重大功能变更? 请先提交 Issue 交流,避免无效劳动。

📝 变更描述 / Description

当 channel affinity 命中的首选渠道已经失效、缺失,或已经不再匹配当前 group/model 时,原逻辑会被这条 stale affinity 卡住,导致请求不能继续正常选路。

这个 PR 做了两件事:

  1. 放弃 stale affinity 时,先删除当前 affinity 缓存,再继续走正常渠道选择。
  2. 同时把当前请求上下文里的 skip retry 状态清掉,避免 controller/relay.go 后续重试逻辑误继承旧 affinity 的 SkipRetryOnFailure

另外补了测试,覆盖 stale affinity 清理、auto group 不再匹配、以及清理后 ShouldSkipRetryAfterChannelAffinityFailure 应返回 false 的场景。

🚀 变更类型 / Type of change

  • 🐛 Bug 修复 (Bug fix)
  • ✨ 新功能 (New feature) - 重大特性建议先 Issue 沟通
  • ⚡ 性能优化 / 重构 (Refactor)
  • 📝 文档更新 (Documentation)

🔗 关联任务 / Related Issue

✅ 提交前检查项 / Checklist

  • 人工确认: 我已亲自撰写此描述,去除了 AI 原始输出的冗余。
  • 深度理解: 我已完全理解这些更改的工作原理及潜在影响。
  • 范围聚焦: 本 PR 未包含任何与当前任务无关的代码改动。
  • 本地验证: 已在本地运行并通过了测试或手动验证。
  • 安全合规: 代码中无敏感凭据,且符合项目代码规范。

📸 运行证明 / Proof of Work

$ go test ./service ./middleware ./controller
ok   github.com/QuantumNous/new-api/service      0.044s
ok   github.com/QuantumNous/new-api/middleware   0.028s
ok   github.com/QuantumNous/new-api/controller   0.029s

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling for preferred channel affinity cache lookups; affinity is now properly cleared when lookups fail instead of being ignored.
    • Fixed behavior for disabled preferred channels to ensure affinity is consistently cleared.
  • Refactor

    • Centralized preferred channel affinity validation logic for improved consistency.
  • Tests

    • Added unit tests for channel affinity validation scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 535f2d2b-24c4-4cd6-bee5-977333ecfd95

📥 Commits

Reviewing files that changed from the base of the PR and between b7b6f3a and 9713a50.

📒 Files selected for processing (2)
  • service/channel_affinity.go
  • service/channel_affinity_cache_test.go
✅ Files skipped from review due to trivial changes (1)
  • service/channel_affinity_cache_test.go

Walkthrough

Introduces a centralized preferred-channel selection helper, changes handling of stale or disabled preferred affinity to clear cache entries (instead of aborting), updates cache-lookup failure behavior to clear affinity, and adds utilities + tests for clearing affinity and recording stale-affinity metadata.

Changes

Cohort / File(s) Summary
Distributor logic
middleware/distributor.go
Added selectPreferredAffinityChannel(...) to centralize preferred-channel affinity validation and selection. Replaced inline handling with calls to the new helper. On cache lookup failure now clears affinity with a "lookup failed" reason. Disabled preferred channels are cleared (no abort).
Distributor tests
middleware/distributor_test.go
Added unit tests for selectPreferredAffinityChannel: clearing disabled channels, auto-group mismatch fallback, and successful auto-group match; asserts returns and Gin-context side effects.
Affinity service utilities
service/channel_affinity.go
Added ClearCurrentChannelAffinity(c, reason) bool to delete affinity cache keys and reset request-scoped retry flags, plus appendChannelAffinityClearedAdminInfo(...) to write stale-affinity metadata into the request context.
Affinity cache tests
service/channel_affinity_cache_test.go
Added TestClearCurrentChannelAffinity to verify cache deletion, retry-skip reset, and context log info populated with stale-affinity reason and metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Distributor as Middleware/Distributor
    participant Selector as selectPreferredAffinityChannel
    participant Cache as ChannelAffinityCache
    participant Service as ClearCurrentChannelAffinity
    participant Fallback as ChannelSelectionFallback

    Client->>Distributor: request arrives (model, group, affinity key)
    Distributor->>Selector: check preferred affinity (channel)
    Selector->>Cache: CacheGetChannel(affinityKey)
    alt cache hit -> channel info available
        Selector-->>Selector: validate channel status & group match
        alt channel disabled or mismatch
            Selector->>Service: ClearCurrentChannelAffinity(reason)
            Service-->>Selector: deleted=true
            Selector->>Fallback: continue normal selection
            Fallback-->>Distributor: select alternate channel
        else matches
            Selector-->>Distributor: return preferred channel
        end
    else cache lookup error / miss
        Selector->>Service: ClearCurrentChannelAffinity("lookup failed")
        Service-->>Selector: deleted=true/false
        Selector->>Fallback: continue normal selection
    end
    Distributor->>Client: forward to chosen channel / respond
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I found a stale key in Redis late at night,
It blocked the channels and dimmed the light.
I cleared the cache with a hop and a cheer,
Backups now running, the path is clear.
Hooray — the rabbit fixed affinity right! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title is in Chinese and describes fixing stale channel affinity cleanup and normal fallback. It accurately summarizes the main change of the PR.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding objectives from issue #3414: detects and clears stale affinity entries, resets skip-retry state, and includes test coverage for stale-affinity scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue #3414. The modifications focus on handling stale channel affinity with new helper functions, tests, and control-flow updates without introducing unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/channel_affinity.go`:
- Around line 663-677: Clear the leftover per-request retry override when
abandoning an affinity in ClearCurrentChannelAffinity: after successfully
deleting the cache key (cache.DeleteMany) also unset or reset the
request-context value that stores the skip-retry flag (the
ginKeyChannelAffinitySkipRetry / channelAffinityMeta.SkipRetry entry placed by
GetPreferredChannelByAffinity) so that
ShouldSkipRetryAfterChannelAffinityFailure does not inherit a stale SkipRetry
value; you can do this by removing the gin context key or setting it to the
default (false) before calling appendChannelAffinityClearedAdminInfo.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f6d26478-44d0-4120-aafa-03e4b9afb54b

📥 Commits

Reviewing files that changed from the base of the PR and between 9ae9040 and b7b6f3a.

📒 Files selected for processing (4)
  • middleware/distributor.go
  • middleware/distributor_test.go
  • service/channel_affinity.go
  • service/channel_affinity_cache_test.go

@constansino constansino changed the title fix: clear stale channel affinity before normal fallback fix: 清理 stale channel affinity 并继续正常 fallback Mar 23, 2026
@constansino
Copy link
Author

已根据 CodeRabbit 的评论补上最后一个修复:清理 stale affinity 时同步重置请求上下文里的 skip-retry 状态,避免后续 relay retry 误继承旧 affinity。另已把 issue / PR 标题和正文改成中文模板,当前关联 issue 为 #3414

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex/Claude 的 stale channel affinity 指向失效渠道时会阻断正常 fallback

1 participant