[feat][evaluation] cozeclaw evaluation solution by HearyShen · Pull Request #517 · coze-dev/coze-loop

HearyShen · 2026-05-11T09:24:50Z

What type of PR is this?

Check the PR title

This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
The description of this PR title is user-oriented and clear enough for others to understand.
Add documentation if the current PR requires user awareness at the usage level.
This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

Implement ShouldSkipEvaluator method to allow evaluators to skip execution based on input data. This includes adding the interface method, default implementations for prompt and code evaluators, and integration into the evaluation workflow. The feature helps optimize evaluation by skipping unnecessary runs.

… and create record when skipped Modify ShouldSkip methods to return EvaluatorOutputData instead of EvaluatorRecord Update ShouldSkipEvaluator to create a record with output data when skipped Adjust tests and mocks to reflect the new behavior

Add mock expectation for ShouldSkipEvaluator in all test cases to ensure proper test coverage of evaluator skipping logic

Add error handling for ShouldSkipEvaluator call and log warning when skip check fails

codecov · 2026-05-11T09:52:04Z

Codecov Report

❌ Patch coverage is 90.56604% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ion/domain/service/expt_run_scheduler_mode_impl.go	60.00%	4 Missing and 4 partials ⚠️
...aluation/domain/service/expt_run_item_turn_impl.go	62.50%	4 Missing and 2 partials ⚠️
...on/domain/service/expt_run_scheduler_event_impl.go	90.90%	2 Missing and 2 partials ⚠️
...luation/domain/service/expt_run_item_event_impl.go	93.75%	2 Missing ⚠️

@@            Coverage Diff             @@
##             main     #517      +/-   ##
==========================================
+ Coverage   77.47%   77.51%   +0.03%     
==========================================
  Files         660      660              
  Lines       74230    74418     +188     
==========================================
+ Hits        57511    57686     +175     
- Misses      13318    13324       +6     
- Partials     3401     3408       +7

Flag	Coverage Δ
unittests	`77.51% <90.56%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...valuation/application/convertor/experiment/expt.go	`81.45% <100.00%> (+0.04%)`	⬆️
...uation/application/convertor/experiment/openapi.go	`88.53% <100.00%> (+0.20%)`	⬆️
backend/modules/evaluation/domain/entity/expt.go	`97.89% <ø> (ø)`
...odules/evaluation/domain/service/evaluator_impl.go	`83.67% <100.00%> (+1.03%)`	⬆️
...ation/domain/service/evaluator_source_code_impl.go	`84.38% <100.00%> (+0.04%)`	⬆️
...ion/domain/service/evaluator_source_prompt_impl.go	`75.66% <100.00%> (+0.09%)`	⬆️
...luation/domain/service/expt_run_item_event_impl.go	`73.25% <93.75%> (+2.20%)`	⬆️
...on/domain/service/expt_run_scheduler_event_impl.go	`76.95% <90.90%> (+2.47%)`	⬆️
...aluation/domain/service/expt_run_item_turn_impl.go	`86.40% <62.50%> (-0.83%)`	⬇️
...ion/domain/service/expt_run_scheduler_mode_impl.go	`85.85% <60.00%> (-0.56%)`	⬇️

... and 2 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f84e2b7...87ba99f. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nto feat/skip_evaluator

…522) * feat(evaluation): generalize evaluator skip rule to intercept rule Rename ShouldSkip to ShouldIntercept across interfaces, implementations, and tests. Add RunStatus field to EvaluatorOutputData so that intercept rules can control both score and status of evaluator records. * refactor(evaluation): rename skip to intercepted in ShouldIntercept semantics Align variable names and comments with the intercept semantics instead of the legacy skip terminology. * refactor: move RunStatus from EvaluatorOutputData to ShouldIntercept return value Align ShouldIntercept return style with Run method - status is now an independent return value instead of being embedded in the DO struct.

CLAassistant · 2026-05-14T12:47:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ HearyShen
✅ tpfz
❌ xiaowuxiaowu1
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

HearyShen added 7 commits May 7, 2026 00:06

fix(test): correct indentation in test function for malicious patterns

f2bb9a5

test(evaluation): add ShouldSkipEvaluator mock to test cases

aa384ad

Add mock expectation for ShouldSkipEvaluator in all test cases to ensure proper test coverage of evaluator skipping logic

Merge branch 'main' into feat/skip_evaluator

d66dbba

fix(evaluation): handle evaluator skip error and log warning

4b5ef43

Add error handling for ShouldSkipEvaluator call and log warning when skip check fails

Merge branch 'main' into feat/skip_evaluator

b14e326

tpfz and others added 10 commits May 11, 2026 18:50

fix runlog

eba3dbd

Merge branch 'feat/skip_evaluator' of github.com:coze-dev/coze-loop i…

7ce4fa6

…nto feat/skip_evaluator

fix run log

21b38d8

fix run log

aa619a0

fix retry

0ab7757

fix retry

9a10811

新增offline_expt_analysis_status字段

ba1f03c

补充枚举

723cd25

Merge branch 'main' into feat/coze/2.5

3a8c513

tpfz and others added 7 commits May 15, 2026 12:46

修复重试阻塞

4e26d09

优化evaluator record查询

fb6e80d

Merge branch 'main' into feat/coze/2.5

741b649

Merge branch 'main' into feat/coze/2.5

e036870

git merge main

ba58969

fix github PR lint and unit tests issue

e210803

Merge branch 'main' into feat/coze/2.5

87ba99f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][evaluation] cozeclaw evaluation solution#517

[feat][evaluation] cozeclaw evaluation solution#517
HearyShen wants to merge 24 commits into
mainfrom
feat/coze/2.5

HearyShen commented May 11, 2026

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HearyShen commented May 11, 2026

What type of PR is this?

Check the PR title

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

(Optional) Which issue(s) this PR fixes

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CLAassistant commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented May 11, 2026 •

edited

Loading

CLAassistant commented May 14, 2026 •

edited

Loading