Skip to content

[fix][evaluation] persist records after eval target timeout#532

Merged
caijialin0626 merged 10 commits into
mainfrom
codex/coze-video-timeout-record
May 25, 2026
Merged

[fix][evaluation] persist records after eval target timeout#532
caijialin0626 merged 10 commits into
mainfrom
codex/coze-video-timeout-record

Conversation

@caijialin0626
Copy link
Copy Markdown
Collaborator

Summary

Fix evaluation failure persistence when the parent execution context is already canceled.

  • Persist eval target records with a short detached context derived via context.WithoutCancel(ctx), so target_record can still be written after an invoke timeout or parent context cancellation.
  • Persist turn and item run logs with the same detached short-timeout pattern, so timed-out synchronous target execution is reflected as failed instead of staying in processing.
  • Keep target record diagnostics available through the persisted record: logid, trace_id, and converted run error.
  • Remove temporary Coze video timeout reproduction hooks and E2E marker logs from the final code.

Root Cause

The synchronous target execution path used the same parent context for both invoking the target and writing the resulting failure records. When the parent context was canceled before persistence, record creation and item/turn log updates could be skipped or fail through canceled context propagation, leaving experiment items stuck in processing without a target record log ID.

Validation

  • go test ./modules/evaluation/domain/service -run 'TestEvalTargetServiceImpl_ExecuteTarget_PersistsFailRecordAfterContextCanceled|Test_ExptItemEvalCtxExecutor_(CompleteSetItemRun|storeTurnRunResult)' -count=1
  • go test ./modules/evaluation/domain/service -count=1
  • Reproduced the stuck-processing case with a temporary 100ms outer timeout in PPE.
  • Verified the fix in PPE: experiment 7590096110691533570 failed with persisted target_record, logid, trace_id, and error.
  • Ran normal regression experiment 7590096105115854594, result success with target output fields actual_output and trajectory.

Deployment Note

The final branch was deployed through Bits dev self-test task 2401388 for CN. stone.cozeloop.evaluation succeeded with commercial dependency commit 7cf57812f18ab36d71c191b69160cb68b14c7da3, which points to this backend pseudo-version.

@caijialin0626 caijialin0626 changed the title [codex] persist evaluation failure records after context cancellation [fix][evaluation] persist records after eval target timeout May 25, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...es/evaluation/domain/service/expt_run_item_impl.go 77.77% 0 Missing and 2 partials ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   77.29%   77.32%   +0.02%     
==========================================
  Files         657      657              
  Lines       73820    73826       +6     
==========================================
+ Hits        57061    57085      +24     
+ Misses      13365    13349      -16     
+ Partials     3394     3392       -2     
Flag Coverage Δ
unittests 77.32% <86.66%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...d/modules/evaluation/domain/service/target_impl.go 80.58% <100.00%> (+0.71%) ⬆️
...es/evaluation/domain/service/expt_run_item_impl.go 77.32% <77.77%> (+11.25%) ⬆️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d6b2ef...785879a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread backend/modules/evaluation/domain/service/target_impl.go
@caijialin0626 caijialin0626 marked this pull request as ready for review May 25, 2026 12:48
@caijialin0626 caijialin0626 merged commit cd86741 into main May 25, 2026
17 of 18 checks passed
@caijialin0626 caijialin0626 deleted the codex/coze-video-timeout-record branch May 25, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants