Skip to content

Improve responses stream startup recovery during temporary channel outages#3392

Open
MisonL wants to merge 15 commits intoQuantumNous:mainfrom
MisonL:feat/codex-graceful-channel-recovery
Open

Improve responses stream startup recovery during temporary channel outages#3392
MisonL wants to merge 15 commits intoQuantumNous:mainfrom
MisonL:feat/codex-graceful-channel-recovery

Conversation

@MisonL
Copy link

@MisonL MisonL commented Mar 22, 2026

📝 变更描述 / Description

  • /v1/responses 流式请求增加首包前的 bootstrap 恢复窗口。
  • 分发层在短时间内拿不到可用渠道时,不立即返回错误,而是在窗口内继续探测并发送 SSE ping 保活。
  • relay 层在首个真实 payload 之前遇到可重试错误时,会在同一窗口内继续恢复;超时后返回 SSE event: error,不再中途切回 JSON。
  • 增加单元测试与 httptest + gin + sqlite 集成测试覆盖真实链路。

🚀 变更类型 / Type of change

  • 🐛 Bug 修复 (Bug fix)
  • ✨ 新功能 (New feature)
  • ⚡ 性能优化 / 重构 (Refactor)
  • 📝 文档更新 (Documentation)

🔗 关联任务 / Related Issue

✅ 提交前检查项 / Checklist

  • 人工确认: 我已亲自撰写此描述,去除了 AI 原始输出的冗余。
  • 深度理解: 我已完全理解这些更改的工作原理及潜在影响。
  • 范围聚焦: 本 PR 未包含任何与当前任务无关的代码改动。
  • 本地验证: 已在本地运行并通过了测试或手动验证。
  • 安全合规: 代码中无敏感凭据,且符合项目代码规范。

📸 运行证明 / Proof of Work

  • go test ./...
  • go vet ./service ./middleware ./controller ./relay/helper
  • go test ./middleware -run 'TestDistributeResponsesBootstrap' -count=1 -v

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a Responses Bootstrap Recovery feature for /v1/responses streaming: request-scoped recovery state and timing, distributor channel-probing grace window with optional SSE pings, relay retry/recovery loop integration emitting OpenAI-style SSE errors when appropriate, configuration/UI/i18n, context key constant, and tests across service, middleware, relay helpers, model, and frontend.

Changes

Cohort / File(s) Summary
Core Recovery Service
service/responses_bootstrap_recovery.go, service/responses_bootstrap_recovery_test.go
New request-scoped recovery state/config, timing constants, state mutation/query helpers, continuation logic, probe/wait/ping scheduling, eligibility rules, and unit tests.
Distributor Middleware & Tests
middleware/distributor.go, middleware/distributor_bootstrap_integration_test.go, middleware/utils.go, middleware/utils_test.go
Initialize recovery state from requests, replace channel lookup with getChannelWithBootstrapGrace (poll/wait/probe with optional SSE : PING), emit OpenAI-style SSE errors when applicable, and add integration/unit tests for timing and SSE behavior.
Relay Controller & Helpers
controller/relay.go, relay/helper/common.go, relay/helper/stream_scanner.go, relay/helper/stream_scanner_test.go
Integrates recovery checks into Relay retry loop (deferred pre-billing, recovery-gated retries), adds helpers (releaseBootstrapRecoveryBilling, waitForResponsesBootstrapRecoveryProbe), SSE OpenAI error event helper, and streaming-timeout default with tests.
Settings, Config & Snapshot
setting/operation_setting/general_setting.go, setting/config/config.go, setting/config/config_test.go
Adds general-setting fields for bootstrap recovery (enable, grace/probe/ping intervals, retryable codes), default retryable list and snapshot getter, and ConfigManager.Read + cloning helper with tests.
Model/Channel & DTO
model/channel_satisfy.go, model/channel_satisfy_test.go, dto/channel_settings.go
Per-channel opt-in flag added to DTO; new HasResponsesBootstrapRecoveryEnabledChannel with DB/cache paths and tests.
Relay Info Billing Reset
relay/common/relay_info.go
Added RelayInfo.ResetBillingMetadata to clear/refund billing metadata and added doc comments for exported types.
Constants / Context Key
constant/context_key.go
Added exported context key ContextKeyResponsesBootstrapRecoveryState and minor formatting alignment.
Frontend UI & i18n
web/src/pages/Setting/Operation/SettingsGeneral.jsx, web/src/components/table/channels/modals/EditChannelModal.jsx, web/src/components/settings/OperationSetting.jsx, web/src/constants/channel.constants.js, web/src/i18n/locales/*
UI fields for recovery config, per-channel toggle, new channel-type constant set, and translations added/updated (en, fr, ja, ru, vi, zh-CN, zh-TW).
Common JSON utilities
common/json.go
Added DecodeJsonUseNumber and UnmarshalWithNumber to preserve numeric JSON values as json.Number.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Distributor as Distributor (middleware)
    participant Service as Recovery Service
    participant ChannelStore as Channel Cache/DB
    participant Relay as Relay Controller
    participant SSE as SSE Helper

    Client->>Distributor: POST /v1/responses (stream: true)
    Distributor->>Service: EnsureResponsesBootstrapRecoveryState(c)
    Service-->>Distributor: state (enabled, deadline, intervals)

    Distributor->>ChannelStore: getChannelWithBootstrapGrace(model)
    ChannelStore-->>Distributor: no channel

    loop within grace window
        Distributor->>Service: NextResponsesBootstrapWait(state)
        Service-->>Distributor: waitDuration, sendPing
        alt sendPing
            Distributor->>SSE: SetEventStreamHeaders + PingData
            SSE-->>Client: ": PING"
        end
        Distributor->>ChannelStore: try get channel again
        ChannelStore-->>Distributor: channel available? / nil
    end

    alt channel found
        Distributor->>Relay: forward request (with channel)
        Relay->>Relay: processing (billing/retries, recovery checks)
        Relay-->>Client: success stream
    else grace expired or unrecoverable
        Relay->>Service: CanContinueResponsesBootstrapRecovery(err)
        Service-->>Relay: false
        Relay->>SSE: OpenAIErrorEvent(err)
        SSE-->>Client: "event: error" + data
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • seefs001

"🐰
I nudge the stream to start,
I send a PING to wake the line,
Grace waits while channels rouse,
Then payload sings and errors bow—
A hopping cheer, the stream's just fine!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature: adding a bootstrap recovery mechanism for responses streaming during temporary channel outages.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
setting/operation_setting/general_setting.go (1)

39-48: Consider documenting the rationale for 401/403 in retryable codes.

Including 401 and 403 in ResponsesStreamBootstrapRetryableStatusCodes is non-obvious since these are typically permanent auth failures. In a multi-channel context, retrying makes sense if another channel might succeed, but this deserves a brief comment explaining the reasoning.

📝 Suggested documentation
 	ResponsesStreamBootstrapRetryableStatusCodes: []int{
+		// 401/403 included because in multi-channel scenarios, another channel may succeed
 		401,
 		403,
 		408,
 		429,
 		500,
 		502,
 		503,
 		504,
 	},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@setting/operation_setting/general_setting.go` around lines 39 - 48, Add a
brief explanatory comment above the ResponsesStreamBootstrapRetryableStatusCodes
constant that documents why 401 and 403 are considered retryable: explain that
in multi-channel delivery the error may be channel-specific (transient or
credential-scoped) and retrying allows fallback to other channels or refreshed
credentials, and note any assumptions (e.g., presence of per-channel
auth/rotation or downstream retry/fallback logic) so future readers understand
the non-obvious choice. Reference the
ResponsesStreamBootstrapRetryableStatusCodes symbol when adding the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controller/relay.go`:
- Around line 248-254: The handler currently pre-consumes quota too early
causing reserved quota to hang while in the bootstrap-recovery wait path; update
the flow around CanContinueResponsesBootstrapRecovery and
waitForResponsesBootstrapRecoveryProbe so PreConsumeBilling is not held while
sleeping: either move the PreConsumeBilling invocation so it runs immediately
before the actual dispatch attempt (after CanContinueResponsesBootstrapRecovery
returns true and any wait probe completes) or, if pre-consume must occur
earlier, explicitly call the matching release/refund routine before calling
waitForResponsesBootstrapRecoveryProbe and then reacquire/pre-consume again when
the probe completes; update references around PreConsumeBilling,
CanContinueResponsesBootstrapRecovery, waitForResponsesBootstrapRecoveryProbe
and newAPIError to ensure quota is refunded on each break/sleep path and
re-acquired only when about to dispatch.

In `@middleware/distributor_bootstrap_integration_test.go`:
- Around line 53-67: The cleanup currently only calls model.InitChannelCache()
when oldDB != nil causing stale channel cache if the original model.DB was nil;
change the t.Cleanup block to always clear or rebuild the channel cache after
restoring globals by calling model.InitChannelCache() unconditionally (remove
the oldDB != nil guard) or explicitly clear the cache via the same cache-reset
path used by InitChannelCache, ensuring model.DB, model.LOG_DB and related flags
are restored before invoking InitChannelCache to repopulate or clear cached
channels.

In `@service/responses_bootstrap_recovery.go`:
- Around line 141-159: Change the deadline boundary so both recovery checks
treat now==Deadline as expired: in CanContinueResponsesBootstrapRecovery replace
the time comparison (currently time.Now().After(state.Deadline)) with a
non-before check (e.g., !time.Now().Before(state.Deadline)) so that now ==
state.Deadline returns false; also review NextResponsesBootstrapWait to ensure
it uses the same boundary semantics with state.Deadline (use the same
!time.Now().Before logic or equivalent remaining <= 0 check consistently) so
both functions (CanContinueResponsesBootstrapRecovery and
NextResponsesBootstrapWait) agree on the cutoff.

---

Nitpick comments:
In `@setting/operation_setting/general_setting.go`:
- Around line 39-48: Add a brief explanatory comment above the
ResponsesStreamBootstrapRetryableStatusCodes constant that documents why 401 and
403 are considered retryable: explain that in multi-channel delivery the error
may be channel-specific (transient or credential-scoped) and retrying allows
fallback to other channels or refreshed credentials, and note any assumptions
(e.g., presence of per-channel auth/rotation or downstream retry/fallback logic)
so future readers understand the non-obvious choice. Reference the
ResponsesStreamBootstrapRetryableStatusCodes symbol when adding the comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f0a9146c-5253-4e6b-8d0c-db0ba63c4f78

📥 Commits

Reviewing files that changed from the base of the PR and between deff59a and b47970e.

📒 Files selected for processing (12)
  • constant/context_key.go
  • controller/relay.go
  • middleware/distributor.go
  • middleware/distributor_bootstrap_integration_test.go
  • middleware/utils.go
  • middleware/utils_test.go
  • relay/helper/common.go
  • relay/helper/stream_scanner.go
  • relay/helper/stream_scanner_test.go
  • service/responses_bootstrap_recovery.go
  • service/responses_bootstrap_recovery_test.go
  • setting/operation_setting/general_setting.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/i18n/locales/ru.json`:
- Around line 3319-3329: The Russian locale file currently contains English
values for the new "Responses 流启动恢复" settings; replace the English strings with
proper Russian translations for the keys such as "启用 Responses 流启动恢复", "仅在
/v1/responses 流式请求首包前生效,用于短时故障恢复。", "启动恢复等待窗口(秒)", "在该时间窗口内持续探测可用渠道,超时后返回真实错误。",
"渠道探测间隔(毫秒)", "每次重新探测可用渠道之间的等待时间。", "SSE 保活间隔(秒)", "在等待恢复期间发送 :
PING,帮助客户端保持连接。", "可触发启动恢复的状态码", "填写 JSON 数组,例如 [401,403,429,500,502,503,504]。",
and the long description string — update their values to correct Russian text so
the UI shows Russian for these new settings.
- Around line 3318-3329: The JSON locale has several keys (e.g. "启用 Responses
流启动恢复", "仅在 /v1/responses 流式请求首包前生效,用于短时故障恢复。", "启动恢复等待窗口(秒)", etc.) that were
accidentally placed at the root because the "translation" object was closed too
early; move these keys back inside the existing "translation" object (i.e.,
remove the stray closing brace that ends "translation" before these entries) so
all new keys remain under "translation" and restore the proper JSON structure
for t('中文key') lookups.

In `@web/src/i18n/locales/zh-CN.json`:
- Around line 2960-2971: The new i18n keys were placed at the root because the
"translation" object was closed too early; reopen/move the closing brace so the
entries like "启用 Responses 流启动恢复", "启动恢复等待窗口(秒)", "渠道探测间隔(毫秒)", "SSE 保活间隔(秒)",
"可触发启动恢复的状态码" and the descriptive sentence are nested under the "translation"
namespace again (so they are available to useTranslation() / t('...')); ensure
the JSON structure has a single top-level "translation" object containing those
keys and a valid final closing brace for the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dae50967-78c2-493a-918a-9f292896fbe9

📥 Commits

Reviewing files that changed from the base of the PR and between b47970e and 78311bc.

📒 Files selected for processing (10)
  • docs/installation/responses-stream-bootstrap-recovery.md
  • web/src/components/settings/OperationSetting.jsx
  • web/src/i18n/locales/en.json
  • web/src/i18n/locales/fr.json
  • web/src/i18n/locales/ja.json
  • web/src/i18n/locales/ru.json
  • web/src/i18n/locales/vi.json
  • web/src/i18n/locales/zh-CN.json
  • web/src/i18n/locales/zh-TW.json
  • web/src/pages/Setting/Operation/SettingsGeneral.jsx
✅ Files skipped from review due to trivial changes (7)
  • web/src/components/settings/OperationSetting.jsx
  • web/src/i18n/locales/zh-TW.json
  • web/src/i18n/locales/ja.json
  • web/src/i18n/locales/en.json
  • web/src/i18n/locales/fr.json
  • web/src/i18n/locales/vi.json
  • docs/installation/responses-stream-bootstrap-recovery.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
setting/operation_setting/general_setting.go (1)

35-50: Define the retryable-status default in one place.

This list now exists both here and in service.GetResponsesBootstrapRecoveryConfig. If one side changes and the other does not, an explicit default config and an empty/invalid config will recover on different status codes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@setting/operation_setting/general_setting.go` around lines 35 - 50,
ResponsesStreamBootstrapRetryableStatusCodes is duplicated between this config
and service.GetResponsesBootstrapRecoveryConfig causing drift; centralize the
default by moving the slice into a single source of truth (e.g., define a
package-level constant or a shared function like
DefaultResponsesBootstrapRetryableStatusCodes) and have both general_setting.go
(the ResponsesStreamBootstrapRetryableStatusCodes field) and
service.GetResponsesBootstrapRecoveryConfig reference that single symbol so
changes only occur in one place.
controller/relay.go (1)

269-284: Centralize billing cleanup instead of mirroring it here.

This helper hard-codes a controller-side list of relayInfo fields that must stay in sync with whatever PreConsumeBilling populates. The next billing field added there can silently leak stale metadata into the next probe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controller/relay.go` around lines 269 - 284, The function
releaseBootstrapRecoveryBilling currently duplicates field-by-field billing
cleanup, which risks falling out of sync with PreConsumeBilling; instead add a
single centralized cleanup method (either RelayInfo.ResetBillingMetadata() on
the relaycommon.RelayInfo type or a helper like
relaycommon.ClearBilling(relayInfo *RelayInfo)) that performs Refund(c), nils
the Billing pointer and clears all billing-related fields, update
releaseBootstrapRecoveryBilling to call that new method, and ensure
PreConsumeBilling and any other code paths rely on the same centralized
Reset/Clear function so new billing fields won’t be missed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controller/relay.go`:
- Around line 242-253: processChannelError is currently called unconditionally
and can trigger auto-disable for status 401 during bootstrap recovery; before
calling processChannelError in the bootstrap retry loop, detect the bootstrap
grace window via service.CanContinueResponsesBootstrapRecovery(c, newAPIError)
(or an explicit bootstrap flag derived from relayInfo) and suppress
channelError.AutoBan (or clear the AutoBan flag) when inside that grace period
so 401 does not auto-disable the channel; keep the rest of the retry logic
intact and only alter the AutoBan behavior for bootstrap recovery rather than
changing AutomaticDisableStatusCodeRanges globally.
- Around line 253-255: The bootstrap recovery probe currently treats client
context cancellation as a server error; update
waitForResponsesBootstrapRecoveryProbe to differentiate cancellation from real
errors by returning a cancellation flag (e.g., change signature to return
(error, bool) or return a sentinel ErrClientCanceled) and ensure the caller
checks c.Request.Context().Err() / errors.Is(ctx.Err(), context.Canceled) or the
returned flag; if cancelled, do not set newAPIError or produce a 503—return
early and avoid treating cancelation as failure (mirror the graceful handling in
relay/helper/stream_scanner.go), and update the call site where you currently do
newAPIError = waitErr to handle the cancelled case separately.

---

Nitpick comments:
In `@controller/relay.go`:
- Around line 269-284: The function releaseBootstrapRecoveryBilling currently
duplicates field-by-field billing cleanup, which risks falling out of sync with
PreConsumeBilling; instead add a single centralized cleanup method (either
RelayInfo.ResetBillingMetadata() on the relaycommon.RelayInfo type or a helper
like relaycommon.ClearBilling(relayInfo *RelayInfo)) that performs Refund(c),
nils the Billing pointer and clears all billing-related fields, update
releaseBootstrapRecoveryBilling to call that new method, and ensure
PreConsumeBilling and any other code paths rely on the same centralized
Reset/Clear function so new billing fields won’t be missed.

In `@setting/operation_setting/general_setting.go`:
- Around line 35-50: ResponsesStreamBootstrapRetryableStatusCodes is duplicated
between this config and service.GetResponsesBootstrapRecoveryConfig causing
drift; centralize the default by moving the slice into a single source of truth
(e.g., define a package-level constant or a shared function like
DefaultResponsesBootstrapRetryableStatusCodes) and have both general_setting.go
(the ResponsesStreamBootstrapRetryableStatusCodes field) and
service.GetResponsesBootstrapRecoveryConfig reference that single symbol so
changes only occur in one place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c930f281-f254-4fda-bdac-f6f97d55c4d5

📥 Commits

Reviewing files that changed from the base of the PR and between 78311bc and a5ccd27.

📒 Files selected for processing (5)
  • controller/relay.go
  • middleware/distributor_bootstrap_integration_test.go
  • service/responses_bootstrap_recovery.go
  • service/responses_bootstrap_recovery_test.go
  • setting/operation_setting/general_setting.go
✅ Files skipped from review due to trivial changes (3)
  • middleware/distributor_bootstrap_integration_test.go
  • service/responses_bootstrap_recovery_test.go
  • service/responses_bootstrap_recovery.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
service/responses_bootstrap_recovery.go (1)

163-171: Consider explicit handling of zero status codes for network-level errors.

When newAPIError.StatusCode is 0 (common for network timeouts, connection refused, DNS failures), the lookup at line 169 will return false since 0 is not in the retryable codes map. This is likely the desired behavior for true network failures during bootstrap, but worth documenting.

Per context snippet 4 (types/error.go:90-99), StatusCode defaults to 500 in NewError() but can be 0 for raw network errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/responses_bootstrap_recovery.go` around lines 163 - 171, The code
should explicitly handle network-level errors where newAPIError.StatusCode == 0
instead of relying on the map lookup; update the function (the block using
types.IsChannelError, types.IsSkipRetryError and
GetResponsesBootstrapRecoveryConfig().RetryableStatusCodes) to add a clear
branch "if newAPIError.StatusCode == 0 { return false }" (optionally with a
short log message) before the map lookup so zero-status network failures are
deterministically treated as non-retryable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@service/responses_bootstrap_recovery.go`:
- Around line 163-171: The code should explicitly handle network-level errors
where newAPIError.StatusCode == 0 instead of relying on the map lookup; update
the function (the block using types.IsChannelError, types.IsSkipRetryError and
GetResponsesBootstrapRecoveryConfig().RetryableStatusCodes) to add a clear
branch "if newAPIError.StatusCode == 0 { return false }" (optionally with a
short log message) before the map lookup so zero-status network failures are
deterministically treated as non-retryable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 014eb939-84c5-49ac-96d1-88975b09d90f

📥 Commits

Reviewing files that changed from the base of the PR and between a5ccd27 and 2da87e3.

📒 Files selected for processing (5)
  • controller/relay.go
  • relay/common/relay_info.go
  • relay/helper/common.go
  • service/responses_bootstrap_recovery.go
  • setting/operation_setting/general_setting.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • setting/operation_setting/general_setting.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
service/responses_bootstrap_recovery.go (1)

169-173: Consider caching retryable status codes in state for consistency.

The ProbeInterval and PingInterval are stored in the state at creation time (lines 107-108), but RetryableStatusCodes is re-fetched from config on each call. This creates an inconsistency: intervals are frozen per-request while status codes can change mid-recovery.

If dynamic updates are intentional, consider documenting this. Otherwise, store the retryable codes in the state for consistency.

♻️ Suggested fix to cache status codes in state
 // ResponsesBootstrapRecoveryState tracks a single request's bootstrap recovery window.
 type ResponsesBootstrapRecoveryState struct {
 	Enabled              bool
 	StartedAt            time.Time
 	Deadline             time.Time
 	ProbeInterval        time.Duration
 	PingInterval         time.Duration
+	RetryableStatusCodes map[int]struct{}
 	LastPingAt           time.Time
 	HeadersSent          bool
 	PayloadStarted       bool
 	WaitAttempts         int
 	WaitDuration         time.Duration
 }

Then in EnsureResponsesBootstrapRecoveryState:

 	state := &ResponsesBootstrapRecoveryState{
-		Enabled:       true,
-		StartedAt:     now,
-		Deadline:      now.Add(cfg.GracePeriod),
-		ProbeInterval: cfg.ProbeInterval,
-		PingInterval:  cfg.PingInterval,
+		Enabled:              true,
+		StartedAt:            now,
+		Deadline:             now.Add(cfg.GracePeriod),
+		ProbeInterval:        cfg.ProbeInterval,
+		PingInterval:         cfg.PingInterval,
+		RetryableStatusCodes: cfg.RetryableStatusCodes,
 	}

And in CanContinueResponsesBootstrapRecovery:

-	_, ok = GetResponsesBootstrapRecoveryConfig().RetryableStatusCodes[newAPIError.StatusCode]
+	_, ok = state.RetryableStatusCodes[newAPIError.StatusCode]
 	return ok
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/responses_bootstrap_recovery.go` around lines 169 - 173, The code
currently reads RetryableStatusCodes from GetResponsesBootstrapRecoveryConfig()
on every call to CanContinueResponsesBootstrapRecovery while ProbeInterval and
PingInterval are captured in state at creation, causing inconsistent behavior;
update EnsureResponsesBootstrapRecoveryState to capture and store the current
RetryableStatusCodes in the state struct (alongside ProbeInterval and
PingInterval) and change CanContinueResponsesBootstrapRecovery to consult that
cached state field instead of calling GetResponsesBootstrapRecoveryConfig() each
time, keeping references to RetryableStatusCodes,
EnsureResponsesBootstrapRecoveryState and CanContinueResponsesBootstrapRecovery
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@service/responses_bootstrap_recovery.go`:
- Around line 169-173: The code currently reads RetryableStatusCodes from
GetResponsesBootstrapRecoveryConfig() on every call to
CanContinueResponsesBootstrapRecovery while ProbeInterval and PingInterval are
captured in state at creation, causing inconsistent behavior; update
EnsureResponsesBootstrapRecoveryState to capture and store the current
RetryableStatusCodes in the state struct (alongside ProbeInterval and
PingInterval) and change CanContinueResponsesBootstrapRecovery to consult that
cached state field instead of calling GetResponsesBootstrapRecoveryConfig() each
time, keeping references to RetryableStatusCodes,
EnsureResponsesBootstrapRecoveryState and CanContinueResponsesBootstrapRecovery
consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae96c3b0-a24e-4af5-beb7-a0130482ca2d

📥 Commits

Reviewing files that changed from the base of the PR and between 2da87e3 and d98ae8c.

📒 Files selected for processing (3)
  • relay/common/relay_info.go
  • service/responses_bootstrap_recovery.go
  • setting/operation_setting/general_setting.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • relay/common/relay_info.go

@MisonL
Copy link
Author

MisonL commented Mar 22, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
service/responses_bootstrap_recovery.go (1)

78-80: Tighten path matching to avoid false-positive prefixes

Line 79 currently matches any prefix starting with /v1/responses (e.g. /v1/responsesX). Consider exact path-segment matching.

Proposed refactor
 func IsResponsesBootstrapRecoveryPath(path string) bool {
-	return strings.HasPrefix(path, "/v1/responses") &&
-		!strings.HasPrefix(path, "/v1/responses/compact")
+	if path != "/v1/responses" && !strings.HasPrefix(path, "/v1/responses/") {
+		return false
+	}
+	return !strings.HasPrefix(path, "/v1/responses/compact")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/responses_bootstrap_recovery.go` around lines 78 - 80, The current
IsResponsesBootstrapRecoveryPath function uses strings.HasPrefix(path,
"/v1/responses") which incorrectly matches paths like "/v1/responsesX"; change
the logic to only accept the exact segment "/v1/responses" or paths under that
segment (e.g. "/v1/responses/" prefix) while still excluding the
"/v1/responses/compact" subtree; update IsResponsesBootstrapRecoveryPath to
check for path == "/v1/responses" OR strings.HasPrefix(path, "/v1/responses/")
and then ensure it does not match "/v1/responses/compact" (or its subtree) to
avoid false positives.
service/responses_bootstrap_recovery_test.go (1)

18-37: Guard global settings mutation in tests to prevent future parallel-test flakes

This helper mutates shared global settings. If any package test uses t.Parallel(), these tests can interfere. Add a test mutex around override/restore.

Proposed refactor
 import (
 	"errors"
 	"net/http"
 	"net/http/httptest"
+	"sync"
 	"strings"
 	"testing"
 	"time"
@@
 )
 
+var responsesBootstrapTestSettingsMu sync.Mutex
+
 func withResponsesBootstrapRecoverySetting(t *testing.T) {
 	t.Helper()
+	responsesBootstrapTestSettingsMu.Lock()
 	settings := operation_setting.GetGeneralSetting()
@@
 	t.Cleanup(func() {
 		settings.ResponsesStreamBootstrapRecoveryEnabled = oldEnabled
 		settings.ResponsesStreamBootstrapGracePeriodSeconds = oldGrace
 		settings.ResponsesStreamBootstrapProbeIntervalMilliseconds = oldProbe
 		settings.ResponsesStreamBootstrapPingIntervalSeconds = oldPing
 		settings.ResponsesStreamBootstrapRetryableStatusCodes = oldCodes
+		responsesBootstrapTestSettingsMu.Unlock()
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service/responses_bootstrap_recovery_test.go` around lines 18 - 37,
withResponsesBootstrapRecoverySetting mutates global settings from
operation_setting.GetGeneralSetting(), which can cause flakiness when tests run
in parallel; protect the override/restore by introducing a package-level
sync.Mutex (e.g., testSettingsMu) and acquire it at the start of
withResponsesBootstrapRecoverySetting and release it in the t.Cleanup restore
closure so the mutation and restoration of settings.ResponsesStreamBootstrap*
are serialized; reference the function name
withResponsesBootstrapRecoverySetting, the settings variable, and t.Cleanup when
making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@service/responses_bootstrap_recovery.go`:
- Around line 47-74: The code reads the shared mutable singleton via
operation_setting.GetGeneralSetting() without synchronization which risks data
races if settings ever change at runtime; replace the unsynchronized access by
adding a mutex-protected accessor in the operation_setting package (e.g., a
GetGeneralSettingCopy or GetGeneralSettingLocked) that returns an immutable copy
of the fields used by ResponsesBootstrapRecoveryConfig and use that accessor
here instead of GetGeneralSetting(); also ensure LoadFromDB (or any writer)
acquires the same mutex when mutating the singleton so readers and writers
synchronize properly (alternatively, if you prefer not to add locking, add a
clear comment near operation_setting.GetGeneralSetting/LoadFromDB and above
ResponsesBootstrapRecoveryConfig construction documenting that bootstrap
settings are intentionally immutable after startup and any future runtime
mutator must add synchronization).

---

Nitpick comments:
In `@service/responses_bootstrap_recovery_test.go`:
- Around line 18-37: withResponsesBootstrapRecoverySetting mutates global
settings from operation_setting.GetGeneralSetting(), which can cause flakiness
when tests run in parallel; protect the override/restore by introducing a
package-level sync.Mutex (e.g., testSettingsMu) and acquire it at the start of
withResponsesBootstrapRecoverySetting and release it in the t.Cleanup restore
closure so the mutation and restoration of settings.ResponsesStreamBootstrap*
are serialized; reference the function name
withResponsesBootstrapRecoverySetting, the settings variable, and t.Cleanup when
making the change.

In `@service/responses_bootstrap_recovery.go`:
- Around line 78-80: The current IsResponsesBootstrapRecoveryPath function uses
strings.HasPrefix(path, "/v1/responses") which incorrectly matches paths like
"/v1/responsesX"; change the logic to only accept the exact segment
"/v1/responses" or paths under that segment (e.g. "/v1/responses/" prefix) while
still excluding the "/v1/responses/compact" subtree; update
IsResponsesBootstrapRecoveryPath to check for path == "/v1/responses" OR
strings.HasPrefix(path, "/v1/responses/") and then ensure it does not match
"/v1/responses/compact" (or its subtree) to avoid false positives.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 76c2fb06-3904-4745-9293-6b4c96f5b958

📥 Commits

Reviewing files that changed from the base of the PR and between 2da87e3 and 33ae818.

📒 Files selected for processing (4)
  • relay/common/relay_info.go
  • service/responses_bootstrap_recovery.go
  • service/responses_bootstrap_recovery_test.go
  • setting/operation_setting/general_setting.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • relay/common/relay_info.go
  • setting/operation_setting/general_setting.go

@MisonL
Copy link
Author

MisonL commented Mar 22, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
setting/config/config.go (1)

42-45: Constrain callback behavior while read lock is held.

At Line 45, fn executes under RLock. If callback code (directly or indirectly) calls a write path like Register/LoadFromDB, it can deadlock on lock upgrade. Consider renaming to ReadLocked and documenting that callback paths must not acquire cm.mutex.Lock().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@setting/config/config.go` around lines 42 - 45, The Read method executes the
user callback fn while holding cm.mutex.RLock(), which allows the callback to
call write paths like Register or LoadFromDB and risk deadlock; change Read to
either (A) take and return a copy of cm.configs[name] then release the RLock
before invoking fn, or (B) rename the method to ReadLocked and update its doc
comment to explicitly state that callers must not call cm.mutex.Lock() or any
write paths (e.g., Register, LoadFromDB) from inside fn; apply the chosen
approach to the method named Read on ConfigManager and update any call sites or
documentation accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@setting/config/config.go`:
- Around line 42-45: The Read method executes the user callback fn while holding
cm.mutex.RLock(), which allows the callback to call write paths like Register or
LoadFromDB and risk deadlock; change Read to either (A) take and return a copy
of cm.configs[name] then release the RLock before invoking fn, or (B) rename the
method to ReadLocked and update its doc comment to explicitly state that callers
must not call cm.mutex.Lock() or any write paths (e.g., Register, LoadFromDB)
from inside fn; apply the chosen approach to the method named Read on
ConfigManager and update any call sites or documentation accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 054473d5-a9d0-4439-94b1-2a5a070af6b0

📥 Commits

Reviewing files that changed from the base of the PR and between 33ae818 and 20afa09.

📒 Files selected for processing (4)
  • service/responses_bootstrap_recovery.go
  • service/responses_bootstrap_recovery_test.go
  • setting/config/config.go
  • setting/operation_setting/general_setting.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • setting/operation_setting/general_setting.go

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

根据最新整理,这次不再保留单独的 docs/installation/responses-stream-bootstrap-recovery.md 文档,相关说明改为在 PR 讨论串中补充,避免在仓库里增加一篇孤立的安装文档。\n\n本次已同步删除该独立文档,功能行为和 UI 内说明保持不变。

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@setting/config/config.go`:
- Around line 41-47: The Read method currently releases the read lock before
calling fn(config), exposing a live pointer in cm.configs to concurrent writers
(e.g., LoadFromDB -> updateConfigFromMap) and also panics if fn is nil; fix it
by taking a snapshot/copy of the config under lock and validating fn != nil
while still holding the read lock, then release the lock and invoke fn with the
snapshot. Specifically, in ConfigManager.Read: acquire cm.mutex.RLock(),
retrieve cfg := cm.configs[name], create a safe deep/shallow copy (depending on
config mutability) of cfg while holding the lock, release the lock, then if fn
!= nil call fn(snapshot); reference ConfigManager.Read, cm.configs, LoadFromDB
and updateConfigFromMap to ensure the copy prevents races.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebfde3fd-0c1e-4ec9-8a51-19340236e847

📥 Commits

Reviewing files that changed from the base of the PR and between 20afa09 and fc99878.

📒 Files selected for processing (1)
  • setting/config/config.go

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

补充说明:本次在原有 Responses bootstrap recovery 的全局开关基础上,又补了一层“渠道级 opt-in”控制。

新增内容:

  • 渠道 other_settings 增加 responses_stream_bootstrap_recovery_enabled
  • 分发层只有在当前 group/model 下存在已 opt-in 的渠道时,才进入 bootstrap grace wait
  • relay 层只有在当前已选中渠道本身已 opt-in 时,才继续 bootstrap recovery
  • 渠道编辑页增加对应开关,运维设置页文案补充为“全局开启 + 渠道单独启用”两级控制
  • 补了 service/middleware 相关单测与集成测试,覆盖未 opt-in 不等待、已 opt-in 可恢复两条路径

本地已验证:

  • go test ./service ./middleware
  • prettier --check
  • 前端 vite build

这样可以避免把恢复窗口误作用到所有渠道,权限和行为边界会更清晰。

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

@coderabbitai review
Please review the latest commit 86e54ff, especially the channel opt-in gating for responses bootstrap recovery.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

@MisonL, I'll trigger a review of the latest commit 86e54ff, with a focus on the channel opt-in gating for responses bootstrap recovery — including the new responses_stream_bootstrap_recovery_enabled field in other_settings, the distribution-layer grace-wait opt-in guard, the relay-layer per-channel recovery continuation check, and the associated tests.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model/channel_satisfy.go`:
- Around line 59-70: The loop over channelsIDM should skip disabled channels
before checking other settings; add an enabled-status check at the top of the
loop (e.g. if !channel.IsEnabled() { continue } or the equivalent field/getter
on the Channel type) before using
channel.GetOtherSettings().ResponsesStreamBootstrapRecoveryEnabled so that
channelMatchesAnyGroup and channelSupportsModel are only evaluated for enabled
channels.
- Around line 92-109: The DB fallback function
hasResponsesBootstrapRecoveryEnabledChannelDB currently loads all channels via
DB.Find(&channels) without filtering by status; update it to only consider
enabled channels by adding a WHERE clause to the DB query (e.g., filter on
channel.Status == <enabled value>) or, if you prefer not to change the query,
check channel.Status inside the for loop before evaluating GetOtherSettings and
other checks. Modify the DB.Find call for the channels slice or add a guard
referencing Channel.Status in hasResponsesBootstrapRecoveryEnabledChannelDB so
only enabled channels are considered when evaluating
channel.GetOtherSettings().ResponsesStreamBootstrapRecoveryEnabled and
channelSupportsModel.

In `@setting/config/config.go`:
- Around line 52-62: Replace direct encoding/json usage with the common JSON
wrappers: use common.Marshal instead of json.Marshal when creating raw, and use
common.Unmarshal (or a new common.UnmarshalWithNumber if numeric precision via
decoder.UseNumber is required) to decode into target; update the block where
raw, err := json.Marshal(config) and decoder.Decode(target) are used
(referencing variables raw, target, valueType, config and the decoder.UseNumber
behavior) to call common.Marshal and common.Unmarshal/UnmarshalWithNumber and
return config on error as before.

In `@web/src/components/table/channels/modals/EditChannelModal.jsx`:
- Around line 2968-2984: The switch for
responses_stream_bootstrap_recovery_enabled is currently rendered only when
inputs.type === 3 (Azure-only); move it out of that Azure-only branch and render
it for any channel that supports /v1/responses bootstrap recovery (e.g., include
constant.ChannelTypeOpenAI and any other types that support it) so OpenAI
channels can opt in too; update the render condition around the Form.Switch (and
keep using the same field name responses_stream_bootstrap_recovery_enabled and
handler handleChannelOtherSettingsChange) to check for a capability (e.g.,
inputs.type === constant.ChannelTypeOpenAI || inputs.type === <other supported
types> or a new helper like supportsResponsesBootstrap(inputs.type)) instead of
the Azure-only check.

In `@web/src/i18n/locales/ja.json`:
- Around line 3285-3298: The new Japanese translation entries (e.g., "启用
Responses 流启动恢复", "仅在 /v1/responses 流式请求首包前生效,用于短时故障恢复。", "启动恢复等待窗口(秒)", etc.)
were added at the JSON root instead of inside the existing "translation" object;
move all these keys and their values into the existing "translation" object so
i18next can resolve them, ensuring you merge without duplicating existing keys
and keep the file as valid JSON.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c475562-37ef-4c64-8398-09c7176d67a2

📥 Commits

Reviewing files that changed from the base of the PR and between fc99878 and 86e54ff.

📒 Files selected for processing (16)
  • dto/channel_settings.go
  • middleware/distributor.go
  • middleware/distributor_bootstrap_integration_test.go
  • model/channel_satisfy.go
  • service/responses_bootstrap_recovery.go
  • service/responses_bootstrap_recovery_test.go
  • setting/config/config.go
  • web/src/components/table/channels/modals/EditChannelModal.jsx
  • web/src/i18n/locales/en.json
  • web/src/i18n/locales/fr.json
  • web/src/i18n/locales/ja.json
  • web/src/i18n/locales/ru.json
  • web/src/i18n/locales/vi.json
  • web/src/i18n/locales/zh-CN.json
  • web/src/i18n/locales/zh-TW.json
  • web/src/pages/Setting/Operation/SettingsGeneral.jsx
✅ Files skipped from review due to trivial changes (3)
  • web/src/i18n/locales/zh-CN.json
  • web/src/i18n/locales/fr.json
  • web/src/i18n/locales/vi.json
🚧 Files skipped from review as they are similar to previous changes (4)
  • middleware/distributor.go
  • web/src/i18n/locales/zh-TW.json
  • web/src/i18n/locales/en.json
  • web/src/i18n/locales/ru.json

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

@coderabbitai review

@MisonL
Copy link
Author

MisonL commented Mar 23, 2026

Addressed the latest CodeRabbit findings in 7da17cc.

Changes included:

  • skip disabled channels in responses bootstrap recovery checks for both cache and DB paths
  • switch config snapshot cloning to common JSON wrappers and add tests for detached snapshots
  • expose the channel opt-in switch to channel types that currently support responses bootstrap recovery instead of Azure-only
  • move the new Japanese translations back into the translation object
  • add a model test to lock the disabled-channel behavior

Validation:

  • go test ./model ./setting/... ./service/...
  • web: eslint on EditChannelModal.jsx
  • web: prettier --check on touched frontend files
  • web: bun run build

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/src/components/table/channels/modals/EditChannelModal.jsx (1)

947-963: ⚠️ Potential issue | 🟡 Minor

Missing default for responses_stream_bootstrap_recovery_enabled in legacy data handling.

For consistency with other boolean settings, add the default value for responses_stream_bootstrap_recovery_enabled in the else block that handles legacy channels without settings. Without this, the value will be undefined for such channels instead of false.

🛠️ Proposed fix
       } else {
         // 兼容历史数据:老渠道没有 settings 时,默认按 json 展示
         data.vertex_key_type = 'json';
         data.aws_key_type = 'ak_sk';
         data.is_enterprise_account = false;
         data.allow_service_tier = false;
         data.disable_store = false;
         data.allow_safety_identifier = false;
         data.allow_include_obfuscation = false;
         data.allow_inference_geo = false;
         data.claude_beta_query = false;
+        data.responses_stream_bootstrap_recovery_enabled = false;
         data.upstream_model_update_check_enabled = false;
         data.upstream_model_update_auto_sync_enabled = false;
         data.upstream_model_update_last_check_time = 0;
         data.upstream_model_update_last_detected_models = [];
         data.upstream_model_update_ignored_models = '';
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/table/channels/modals/EditChannelModal.jsx` around lines
947 - 963, In the legacy-channel fallback else block where the code sets
defaults on the data object (e.g., vertex_key_type, aws_key_type, etc.) ensure
you also set data.responses_stream_bootstrap_recovery_enabled = false so legacy
channels without settings get a boolean false instead of undefined; update the
else branch in EditChannelModal.jsx where the data object defaults are applied
to include this property.
♻️ Duplicate comments (1)
setting/config/config.go (1)

40-61: ⚠️ Potential issue | 🟠 Major

Avoid fallback to live config pointer when snapshot cloning fails.

On Line 52 and Line 57, clone failures return the original config; then Line 71 may execute fn with a live mutable pointer after unlock, reintroducing the concurrency hazard this change is meant to prevent.

🔧 Suggested fix
-func cloneConfigValue(config interface{}) interface{} {
+func cloneConfigValue(config interface{}) (interface{}, error) {
 	if config == nil {
-		return nil
+		return nil, nil
 	}

 	valueType := reflect.TypeOf(config)
 	if valueType.Kind() != reflect.Ptr || valueType.Elem().Kind() != reflect.Struct {
-		return config
+		return config, nil
 	}

 	raw, err := common.Marshal(config)
 	if err != nil {
-		return config
+		return nil, err
 	}

 	target := reflect.New(valueType.Elem()).Interface()
 	if err := common.UnmarshalWithNumber(raw, target); err != nil {
-		return config
+		return nil, err
 	}

-	return target
+	return target, nil
 }

 // Read executes fn with a snapshot of the named config after releasing the manager read lock.
 func (cm *ConfigManager) Read(name string, fn func(config interface{})) {
+	if fn == nil {
+		return
+	}
 	cm.mutex.RLock()
-	config := cloneConfigValue(cm.configs[name])
+	config, err := cloneConfigValue(cm.configs[name])
 	cm.mutex.RUnlock()
-	if fn == nil {
+	if err != nil {
+		common.SysError("failed to clone config " + name + ": " + err.Error())
 		return
 	}
 	fn(config)
 }

Also applies to: 64-71

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@setting/config/config.go` around lines 40 - 61, cloneConfigValue currently
returns the original live config pointer when JSON marshal/unmarshal fails
(calls to common.Marshal and common.UnmarshalWithNumber), which re-exposes the
mutable pointer; change the function so that on any error path (nil input aside)
it returns nil instead of returning config to avoid handing out the live pointer
— ensure the code paths after common.Marshal and common.UnmarshalWithNumber
return nil on error and that the successful path still returns the newly created
target (reflect.New(...).Interface()).
🧹 Nitpick comments (1)
model/channel_satisfy_test.go (1)

10-38: Consider extracting shared fixture setup/teardown for globals.

Both tests repeat the same global-state save/restore pattern. A small helper would reduce drift risk and keep future cases cleaner.

♻️ Optional refactor sketch
+func withResponsesBootstrapCacheFixture(
+	t *testing.T,
+	seed map[int]*Channel,
+	run func(),
+) {
+	originalMemoryCacheEnabled := common.MemoryCacheEnabled
+	originalChannelsIDM := channelsIDM
+
+	common.MemoryCacheEnabled = true
+	channelSyncLock.Lock()
+	channelsIDM = seed
+	channelSyncLock.Unlock()
+
+	t.Cleanup(func() {
+		common.MemoryCacheEnabled = originalMemoryCacheEnabled
+		channelSyncLock.Lock()
+		channelsIDM = originalChannelsIDM
+		channelSyncLock.Unlock()
+	})
+
+	run()
+}
+
 func TestHasResponsesBootstrapRecoveryEnabledChannelSkipsDisabledChannels(t *testing.T) {
-	originalMemoryCacheEnabled := common.MemoryCacheEnabled
-	originalChannelsIDM := channelsIDM
-	...
-	if HasResponsesBootstrapRecoveryEnabledChannel([]string{"default"}, "gpt-5") {
-		t.Fatal("disabled opted-in channel should not satisfy responses bootstrap recovery")
-	}
+	withResponsesBootstrapCacheFixture(t, map[int]*Channel{
+		1: {Id: 1, Status: common.ChannelStatusManuallyDisabled, Group: "default", Models: "gpt-5", OtherSettings: `{"responses_stream_bootstrap_recovery_enabled":true}`},
+		2: {Id: 2, Status: common.ChannelStatusEnabled, Group: "default", Models: "gpt-5", OtherSettings: `{"responses_stream_bootstrap_recovery_enabled":false}`},
+	}, func() {
+		if HasResponsesBootstrapRecoveryEnabledChannel([]string{"default"}, "gpt-5") {
+			t.Fatal("disabled opted-in channel should not satisfy responses bootstrap recovery")
+		}
+	})
 }

Also applies to: 46-67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model/channel_satisfy_test.go` around lines 10 - 38, Extract the repeated
save/modify/restore logic into a small helper (e.g., withTempChannels or
setupTestChannels) that accepts t *testing.T and the temporary channels map;
inside the helper capture original values of common.MemoryCacheEnabled and
channelsIDM, set common.MemoryCacheEnabled = true, lock channelSyncLock and
assign channelsIDM to the provided map, unlock, then register a t.Cleanup
closure that restores common.MemoryCacheEnabled and (under channelSyncLock)
resets channelsIDM to the captured original; update the tests to call this
helper instead of duplicating the setup/teardown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@web/src/components/table/channels/modals/EditChannelModal.jsx`:
- Around line 947-963: In the legacy-channel fallback else block where the code
sets defaults on the data object (e.g., vertex_key_type, aws_key_type, etc.)
ensure you also set data.responses_stream_bootstrap_recovery_enabled = false so
legacy channels without settings get a boolean false instead of undefined;
update the else branch in EditChannelModal.jsx where the data object defaults
are applied to include this property.

---

Duplicate comments:
In `@setting/config/config.go`:
- Around line 40-61: cloneConfigValue currently returns the original live config
pointer when JSON marshal/unmarshal fails (calls to common.Marshal and
common.UnmarshalWithNumber), which re-exposes the mutable pointer; change the
function so that on any error path (nil input aside) it returns nil instead of
returning config to avoid handing out the live pointer — ensure the code paths
after common.Marshal and common.UnmarshalWithNumber return nil on error and that
the successful path still returns the newly created target
(reflect.New(...).Interface()).

---

Nitpick comments:
In `@model/channel_satisfy_test.go`:
- Around line 10-38: Extract the repeated save/modify/restore logic into a small
helper (e.g., withTempChannels or setupTestChannels) that accepts t *testing.T
and the temporary channels map; inside the helper capture original values of
common.MemoryCacheEnabled and channelsIDM, set common.MemoryCacheEnabled = true,
lock channelSyncLock and assign channelsIDM to the provided map, unlock, then
register a t.Cleanup closure that restores common.MemoryCacheEnabled and (under
channelSyncLock) resets channelsIDM to the captured original; update the tests
to call this helper instead of duplicating the setup/teardown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74b3a192-22e9-4009-8d70-a5590f8bc152

📥 Commits

Reviewing files that changed from the base of the PR and between 86e54ff and 7da17cc.

📒 Files selected for processing (8)
  • common/json.go
  • model/channel_satisfy.go
  • model/channel_satisfy_test.go
  • setting/config/config.go
  • setting/config/config_test.go
  • web/src/components/table/channels/modals/EditChannelModal.jsx
  • web/src/constants/channel.constants.js
  • web/src/i18n/locales/ja.json
✅ Files skipped from review due to trivial changes (1)
  • web/src/i18n/locales/ja.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • model/channel_satisfy.go

@MisonL
Copy link
Author

MisonL commented Mar 24, 2026

Follow-up fixes are in e8437fa.

Handled:

  • stop exposing the live config pointer when snapshot cloning fails; Read now logs and returns instead
  • set responses_stream_bootstrap_recovery_enabled = false in the legacy channel fallback path
  • extract the shared test fixture helper in model/channel_satisfy_test.go
  • add a config clone failure test

Validation:

  • go test ./model ./setting/... ./service/...
  • web: eslint on EditChannelModal.jsx
  • web: prettier --check on EditChannelModal.jsx
  • web: bun run build

@MisonL
Copy link
Author

MisonL commented Mar 24, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/components/table/channels/modals/EditChannelModal.jsx`:
- Around line 1767-1768: The code unconditionally persists
settings.responses_stream_bootstrap_recovery_enabled from localInputs, which can
leave a stale opt-in for channels that don't support it; update the assignment
around settings.responses_stream_bootstrap_recovery_enabled to only set it from
localInputs.responses_stream_bootstrap_recovery_enabled when
supportsResponsesBootstrapRecovery(localInputs.type) returns true, and otherwise
set it to false (or delete it) so unsupported channel types cannot retain the
flag; locate the change near the existing assignment in EditChannelModal.jsx and
reference the supportsResponsesBootstrapRecovery helper to gate persistence.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ab8047b-56cb-491e-9a8c-ec47efdf3cdd

📥 Commits

Reviewing files that changed from the base of the PR and between 7da17cc and e8437fa.

📒 Files selected for processing (4)
  • model/channel_satisfy_test.go
  • setting/config/config.go
  • setting/config/config_test.go
  • web/src/components/table/channels/modals/EditChannelModal.jsx
✅ Files skipped from review due to trivial changes (1)
  • model/channel_satisfy_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • setting/config/config.go
  • setting/config/config_test.go

@MisonL
Copy link
Author

MisonL commented Mar 24, 2026

CodeRabbit follow-up fixes

已补充处理这轮剩余评论:

  • 修正 ru.json 中 responses recovery 新增文案落在 translation 命名空间外的问题。
  • EditChannelModal 保存时按 supportsResponsesBootstrapRecovery(localInputs.type) 做 capability gate;不支持的渠道类型现在会强制落 false,避免保留陈旧 opt-in。

已验证:

  • python3 -m json.tool web/src/i18n/locales/ru.json
  • cd web && bun run build

commit: 82c190e3

@MisonL MisonL force-pushed the feat/codex-graceful-channel-recovery branch from 2a13c2d to 6c305a4 Compare March 26, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant