Skip to content

fix(eval): allow invocation-level rubrics#6016

Open
JaeCoding wants to merge 1 commit into
google:mainfrom
JaeCoding:fix-per-invocation-rubrics
Open

fix(eval): allow invocation-level rubrics#6016
JaeCoding wants to merge 1 commit into
google:mainfrom
JaeCoding:fix-per-invocation-rubrics

Conversation

@JaeCoding
Copy link
Copy Markdown

@JaeCoding JaeCoding commented Jun 8, 2026

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Problem:

rubric_based_tool_use_quality_v1 fails during evaluator construction when rubrics are provided only at the invocation level. RubricBasedEvaluator.__init__ requires criterion-level rubrics via assert self._criterion.rubrics, even though invocation-level rubrics are supported by the eval data model and the tool-use evaluator already merges invocation rubrics before formatting the judge prompt.

This prevents valid evalsets from using per-invocation tool-use rubrics and raises:

AssertionError: Rubrics are required.

There is also a secondary CLI printing issue where rubric score formatting directly accesses metric_result.criterion.rubrics.

Solution:

Relax the constructor-time requirement for criterion-level rubrics and initialize missing criterion rubrics to an empty list. The effective rubric list is still built from criterion-level and invocation-level rubrics before prompt formatting. If no effective rubrics are available after merging/filtering, prompt formatting still fails with a clear ValueError.

Also guard CLI pretty printing so missing or empty criterion rubrics do not raise when displaying rubric scores; in that case the output falls back to the rubric id.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All targeted unit tests pass locally.

Commands run:

uv run pytest tests/unittests/evaluation/test_rubric_based_tool_use_quality_v1.py tests/unittests/evaluation/test_rubric_based_evaluator.py tests/unittests/evaluation/test_rubric_based_final_response_quality_v1.py tests/unittests/cli/utils/test_cli_eval_pretty_print.py -q
# 43 passed, 58 warnings

uv run pyink --check src/google/adk/evaluation/rubric_based_evaluator.py src/google/adk/evaluation/rubric_based_tool_use_quality_v1.py src/google/adk/evaluation/rubric_based_final_response_quality_v1.py src/google/adk/cli/cli_eval.py tests/unittests/evaluation/test_rubric_based_tool_use_quality_v1.py tests/unittests/evaluation/test_rubric_based_evaluator.py tests/unittests/cli/utils/test_cli_eval_pretty_print.py
# 7 files would be left unchanged

I also attempted the local pre-push hook, which runs full pytest -q, but my local environment did not have the full optional test dependency set installed and failed during collection on unrelated missing packages such as a2a, dotenv, and requests. The targeted tests for this change pass locally.

Manual End-to-End (E2E) Tests:

Manual scenario from #4926:

  1. Use an evalset with an invocation-level rubric of type TOOL_USE_QUALITY.
  2. Use an eval config with rubric_based_tool_use_quality_v1 and no criterion-level rubrics.
  3. Run adk eval.

Expected result: evaluator construction succeeds, invocation rubrics are included in the judge prompt, and result printing does not crash when criterion-level rubrics are empty.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and targeted unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules. N/A - no dependent changes.

Additional context

This keeps invalid configurations without any effective rubrics failing, but moves the validation to the point where criterion-level and invocation-level rubrics have both been considered.

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Jun 8, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Jun 8, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 8, 2026

Response from ADK Triaging Agent

Hello @JaeCoding, thank you for creating this pull request!

We noticed that the Contributor License Agreement (CLA) check is currently failing for this PR. According to our contribution guidelines, all contributions must be accompanied by a signed CLA before we can review and accept them.

Could you please visit https://cla.developers.google.com/ to sign or verify your agreement? Once signed, the status check should update automatically.

Thank you for your understanding and for helping to improve the ADK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rubric_based_tool_use_quality_v1 fails with AssertionError when rubrics are defined per-invocation only

2 participants