refactor(runtime):验收机制重建#592
Merged
phantom5099 merged 25 commits into1024XEngineer:mainfrom May 10, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
3fdbf0f to
c0557fd
Compare
# Conflicts: # internal/runtime/state.go
# Conflicts: # internal/runtime/state.go
Collaborator
Author
|
@xgopilot 修复编译错误,把缺失字段补上 |
结果已修复当前分支的编译错误,并推送到 本次改动:
验证:
提交: |
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. 原有问题
重建前的 Runtime 验收链路存在几个结构性问题。
验收标准来源不稳定:旧逻辑依赖
InferTaskKind从用户原始输入做关键词推断,例如“创建文件”推断为写入任务,“看看”推断为只读任务。Plan 阶段模型已经声明了验证要求,但验收链路没有把这份声明作为一等输入。Decider 决策树覆盖不足:
decider/decide.go通过硬编码RequiredNextActions续跑,能表达的动作很有限。真实任务需要跑测试、检查文件内容、验证命令结果或处理 todo 收敛时,Decider 往往只能给出模糊建议。FinalIntercept 旁路过多:
finalInterceptStreak、pendingFinalProgress、shouldPromotePendingFinalProgress等状态让“是否继续”变成多处隐式判断,读文件、搜索、工具调用都可能影响完成判定,最终容易出现既过早停止又难以解释的状态。Progress 误承担完成判定:旧的
NoProgress/BusinessProgress试图判断“有没有业务进展”,但这个判断和 Accept Gate、todo、facts、completion protocol 职责重叠,容易把正常探索、连续验证或连续编辑误判为停滞。2. 当前方案
核心原则:验收标准来自结构化 Plan,Runtime 只做确定性事实检查;Progress 不再判断业务进展,只负责重复循环检测。
2.1 Plan.Verify 结构化
PlanSpec.Verify和SummaryView.Verify从自然语言列表升级为AcceptChecks:兼容旧
[]string格式,反序列化时自动迁移为结构化检查。AcceptChecks.Normalize()的去重 key 包含kind、target、match、params和required,避免不同content_contains条件或 required 属性被合并。2.2 Accept Gate 二元终态
Accept Gate 只读取执行期事实,不重新执行命令。
output_onlyworkspace_changecommand_successfile_existscontent_containstool_fact最终只有两类结果:全部 required checks 通过则 accepted;任一 required check 失败则
accept_check_failed。拒绝工具调用本身不会直接导致失败,但如果被拒绝的是 required check 所需证据且后续没有补跑,最终会失败。2.3 Completion Protocol
模型停止调用工具但没有输出结构化
task_completion时,Runtime 注入完成协议提醒继续一轮,而不是立即终止。missing_completion_signalThinkingMetadata响应不计入缺失计数2.4 Todo 边界
Plan 模式下,todo ownership 由结构化 Plan 决定。
PlanSpec.Todos补入 session todosCurrentPlan=nil或 completed plan 不继承旧 todotodo_not_found返回恢复建议和当前活跃 todo ID,帮助模型纠正 ID2.5 Repeat Cycle Detector
NoProgress和BusinessProgress已全量移除。Progress 现在只检测重复循环:三者同时相同才累加
RepeatCycleStreak。达到阈值后先注入REMINDER_REPEAT_CYCLE;如果下一轮仍然重复,才 hard stop 为repeat_cycle。读文件、grep、git status、测试命令、验证通过、写入成功、todo 状态变化都不再参与 Progress 计数。它们分别由 facts、accept gate、todo convergence 和 completion protocol 负责。
2.6 工具超时与搜索稳定性
工具超时不再只依赖全局
tool_timeout_sec。filesystem_grep、filesystem_glob、codebase_search_text、codebase_search_symboltool + scope/dir聚合退避 key,换关键词不丢失退避状态.cache、.tmp、tmp、build、dist、out、target、coverage、.next、.nuxt、.turbo、.parcel-cache、.vite、vendor、bin、obj2.7 配置迁移
配置升级流程会清理已废弃字段,避免旧配置在严格 YAML 解析时启动失败。
runtime.max_no_progress_streakruntime.max_repeat_cycle_streak3. 删除内容
internal/runtime/decider/internal/runtime/acceptance/internal/runtime/final_acceptance.gointernal/runtime/acceptance_service.gointernal/runtime/acceptance_events.gointernal/runtime/before_completion_orchestrator.gointernal/runtime/task_kind.gointernal/runtime/verify/git_diff.goNoProgress/BusinessProgress体系4. 新增与核心修改
internal/runtime/acceptgate/internal/runtime/acceptgate_runtime.gointernal/session/plan.goAcceptCheck/AcceptChecks,兼容旧 verify 格式internal/runtime/run.gointernal/runtime/controlplane/progress.gointernal/runtime/controlplane/stop_reason.gomissing_completion_signal、accept_check_failed、repeat_cycle等真实终止原因internal/runtime/permission.gointernal/repository/path.gointernal/tools/filesystem/helpers.gointernal/runtime/todo_bootstrap.gointernal/config/context_budget_migration.goruntime.max_no_progress_streak配置internal/tui/core/app/update.go5. 多端适配
Gateway
Gateway 主链路无需额外业务逻辑。
gateway_runtime_bridge.go已适配精简后的DecisionSnapshot:Status、StopReason、Summary、Details。TUI
TUI 已适配:
AcceptanceDecidedPayload.Summary/Results的结构化展示todo_not_found冲突时清空过期 todo 面板Web
Web 端协议类型中仍保留向后兼容字段,但当前 PR 的核心改动主要在 Runtime/TUI。若 Web 直接消费
ProgressScore的旧字段,需要跟进移除has_business_progress、no_progress_streak、evidence 等显示逻辑,并只展示 repeat-cycle 相关字段。6. 验证记录
已针对当前改动运行过以下目标测试:
全仓
go test ./...仍存在两个与本 PR 无关的本地/环境失败:internal/runner TestCapSignerHelpers:Windows 路径断言差异internal/tools/codebase TestCodebaseCommonHelpers:EvalSymlinks(root) Access is denied7. 预期收益
验收可解释:所有失败都能对应到具体 required check 或 todo 收敛状态,不再依赖 TaskKind 推断。
完成判定收敛:Completion Protocol 处理“模型停了但没声明完成”,Accept Gate 处理“声明完成后是否满足验收”,职责分开。
循环防护更窄更稳:Progress 不再尝试理解业务进展,只抓真正重复的工具签名、结果和子目标,降低误停概率。
旧配置可升级:删除
max_no_progress_streak后,已有用户配置可以通过 migration 自动清理,不会因为严格 YAML 解析启动失败。