Skip to content

[feat][evaluation] cozeclaw evaluation solution#517

Open
HearyShen wants to merge 24 commits into
mainfrom
feat/coze/2.5
Open

[feat][evaluation] cozeclaw evaluation solution#517
HearyShen wants to merge 24 commits into
mainfrom
feat/coze/2.5

Conversation

@HearyShen
Copy link
Copy Markdown
Collaborator

What type of PR is this?

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

HearyShen added 7 commits May 7, 2026 00:06
Implement ShouldSkipEvaluator method to allow evaluators to skip execution based on input data. This includes adding the interface method, default implementations for prompt and code evaluators, and integration into the evaluation workflow. The feature helps optimize evaluation by skipping unnecessary runs.
… and create record when skipped

Modify ShouldSkip methods to return EvaluatorOutputData instead of EvaluatorRecord
Update ShouldSkipEvaluator to create a record with output data when skipped
Adjust tests and mocks to reflect the new behavior
Add mock expectation for ShouldSkipEvaluator in all test cases to ensure proper test coverage of evaluator skipping logic
Add error handling for ShouldSkipEvaluator call and log warning when skip check fails
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 90.56604% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ion/domain/service/expt_run_scheduler_mode_impl.go 60.00% 4 Missing and 4 partials ⚠️
...aluation/domain/service/expt_run_item_turn_impl.go 62.50% 4 Missing and 2 partials ⚠️
...on/domain/service/expt_run_scheduler_event_impl.go 90.90% 2 Missing and 2 partials ⚠️
...luation/domain/service/expt_run_item_event_impl.go 93.75% 2 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #517      +/-   ##
==========================================
+ Coverage   77.47%   77.51%   +0.03%     
==========================================
  Files         660      660              
  Lines       74230    74418     +188     
==========================================
+ Hits        57511    57686     +175     
- Misses      13318    13324       +6     
- Partials     3401     3408       +7     
Flag Coverage Δ
unittests 77.51% <90.56%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...valuation/application/convertor/experiment/expt.go 81.45% <100.00%> (+0.04%) ⬆️
...uation/application/convertor/experiment/openapi.go 88.53% <100.00%> (+0.20%) ⬆️
backend/modules/evaluation/domain/entity/expt.go 97.89% <ø> (ø)
...odules/evaluation/domain/service/evaluator_impl.go 83.67% <100.00%> (+1.03%) ⬆️
...ation/domain/service/evaluator_source_code_impl.go 84.38% <100.00%> (+0.04%) ⬆️
...ion/domain/service/evaluator_source_prompt_impl.go 75.66% <100.00%> (+0.09%) ⬆️
...luation/domain/service/expt_run_item_event_impl.go 73.25% <93.75%> (+2.20%) ⬆️
...on/domain/service/expt_run_scheduler_event_impl.go 76.95% <90.90%> (+2.47%) ⬆️
...aluation/domain/service/expt_run_item_turn_impl.go 86.40% <62.50%> (-0.83%) ⬇️
...ion/domain/service/expt_run_scheduler_mode_impl.go 85.85% <60.00%> (-0.56%) ⬇️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f84e2b7...87ba99f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tpfz and others added 10 commits May 11, 2026 18:50
…522)

* feat(evaluation): generalize evaluator skip rule to intercept rule

Rename ShouldSkip to ShouldIntercept across interfaces, implementations,
and tests. Add RunStatus field to EvaluatorOutputData so that intercept
rules can control both score and status of evaluator records.

* refactor(evaluation): rename skip to intercepted in ShouldIntercept semantics

Align variable names and comments with the intercept semantics instead
of the legacy skip terminology.

* refactor: move RunStatus from EvaluatorOutputData to ShouldIntercept return value

Align ShouldIntercept return style with Run method - status is now an
independent return value instead of being embedded in the DO struct.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 14, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ HearyShen
✅ tpfz
❌ xiaowuxiaowu1
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants