fix(eval): allow invocation-level rubrics#6016
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @JaeCoding, thank you for creating this pull request! We noticed that the Contributor License Agreement (CLA) check is currently failing for this PR. According to our contribution guidelines, all contributions must be accompanied by a signed CLA before we can review and accept them. Could you please visit https://cla.developers.google.com/ to sign or verify your agreement? Once signed, the status check should update automatically. Thank you for your understanding and for helping to improve the ADK! |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
Problem:
rubric_based_tool_use_quality_v1fails during evaluator construction when rubrics are provided only at the invocation level.RubricBasedEvaluator.__init__requires criterion-level rubrics viaassert self._criterion.rubrics, even though invocation-level rubrics are supported by the eval data model and the tool-use evaluator already merges invocation rubrics before formatting the judge prompt.This prevents valid evalsets from using per-invocation tool-use rubrics and raises:
AssertionError: Rubrics are required.There is also a secondary CLI printing issue where rubric score formatting directly accesses
metric_result.criterion.rubrics.Solution:
Relax the constructor-time requirement for criterion-level rubrics and initialize missing criterion rubrics to an empty list. The effective rubric list is still built from criterion-level and invocation-level rubrics before prompt formatting. If no effective rubrics are available after merging/filtering, prompt formatting still fails with a clear
ValueError.Also guard CLI pretty printing so missing or empty criterion rubrics do not raise when displaying rubric scores; in that case the output falls back to the rubric id.
Testing Plan
Unit Tests:
Commands run:
I also attempted the local pre-push hook, which runs full
pytest -q, but my local environment did not have the full optional test dependency set installed and failed during collection on unrelated missing packages such asa2a,dotenv, andrequests. The targeted tests for this change pass locally.Manual End-to-End (E2E) Tests:
Manual scenario from #4926:
TOOL_USE_QUALITY.rubric_based_tool_use_quality_v1and no criterion-level rubrics.adk eval.Expected result: evaluator construction succeeds, invocation rubrics are included in the judge prompt, and result printing does not crash when criterion-level rubrics are empty.
Checklist
Additional context
This keeps invalid configurations without any effective rubrics failing, but moves the validation to the point where criterion-level and invocation-level rubrics have both been considered.