feat(python-sdk): conventionality evaluator #38
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “Conventionality” evaluator to the Python SDK, backed by shared TOML settings and a generated, import-time settings module, plus extensive unit tests around evaluator inputs/settings loading and BaseEvaluator behavior.
Changes:
- Added conventionality evaluator settings TOML and generated Python settings module.
- Implemented
ConventionalityEvaluator(+ input/output schemas) and exported it from the SDK public API. - Added/updated unit tests covering settings loading, evaluator schemas, evaluator behavior, and BaseEvaluator internals; added Makefile targets for settings generation checks.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| sdks/settings/conventionality/settings.toml | New canonical settings (metadata, prompts, model config) for conventionality. |
| sdks/python/src/learning_commons_evaluators/settings/_generated_conventionality_settings.py | Generated settings module consumed at runtime by the evaluator. |
| sdks/python/src/learning_commons_evaluators/schemas/conventionality.py | Adds conventionality settings/output Pydantic models. |
| sdks/python/src/learning_commons_evaluators/schemas/init.py | Exposes conventionality schemas via learning_commons_evaluators.schemas. |
| sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py | Implements ConventionalityEvaluator + typed ConventionalityEvaluationInput. |
| sdks/python/src/learning_commons_evaluators/evaluators/init.py | Exports conventionality evaluator/input from the evaluators package. |
| sdks/python/src/learning_commons_evaluators/init.py | Exposes conventionality evaluator/input/schemas from the root package API. |
| scripts/generate_settings.py | New generator for _generated_*_settings.py from sdks/settings/**/settings.toml. |
| sdks/python/Makefile | Adds generate-settings / check-generated targets and wires check-generated into verify. |
| sdks/python/tests/test_package_imports.py | Ensures conventionality evaluator is importable from the root package. |
| sdks/python/tests/settings/test_load_settings.py | New tests for TOML settings loader + helpers and shared_settings_root(). |
| sdks/python/tests/schemas/test_evaluator_schemas.py | New tests for EvaluationInput coercion/validation/metadata behaviors. |
| sdks/python/tests/evaluators/test_conventionality.py | New tests for conventionality evaluator wiring and output typing. |
| sdks/python/tests/evaluators/test_base.py | New tests for BaseEvaluator telemetry branching, step execution, chain execution, errors, token usage. |
| sdks/python/tests/conftest.py | Updates fixtures to use real ConventionalityEvaluationSettings instead of a stub. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| chain_inputs=prompt_inputs, | ||
| parser_output_type=ConventionalityOutput, | ||
| ) | ||
| assert isinstance(conventionality_output, ConventionalityOutput) |
There was a problem hiding this comment.
P0 - Similar to above, we should raise an explicit EvaluatorError here
There was a problem hiding this comment.
No longer need to assert. execute_prompt_chain_step returns are typed.
| ] | ||
| ).partial(format_instructions=parser.get_format_instructions()) | ||
| conventionality_output = self.execute_prompt_chain_step( | ||
| step_name="main", |
There was a problem hiding this comment.
P0 -
| step_name="main", | |
| step_name="conventionality_evaluation", |
| 'text': TextInputSpec(name='text', min_text_length=10, max_text_length=10000), | ||
| 'grade': GradeInputSpec( | ||
| name='grade', | ||
| allowed_grades=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…anch when I split the PR using AI
Aligns with TextInputField.input_metadata() returning len() as int. Co-authored-by: Cursor <cursoragent@cursor.com>
…basic_conventionality
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…d in later PR. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…basic_conventionality
adnanrhussain
left a comment
There was a problem hiding this comment.
lgtm, thank you for iterating
| ("system", prompts["system_prompt"]), | ||
| ("human", prompts["human_prompt"]), | ||
| ] | ||
| ).partial(format_instructions=parser.get_format_instructions()) |
There was a problem hiding this comment.
P2 - For later, we need to move to with_structured_output
…basic_conventionality
… test (#39) * feat: contract test scaffold and conventionality contract test * chore: fix build issues * ci: fixing build * chore: moved capture script to scripts folder within python sdk * Align conventionality_evaluator notebook with main Co-authored-by: Cursor <cursoragent@cursor.com> * chore: addressing PR comments * feat(python-sdk): vocabulary evaluator (#36) * feat: vocabulary evaluator * chore: update vocabulary settings to use instead of for prompt settings * chore: fix capture and contract tests * chore: vocabulary settings are required * feat: eval instance settings overrides * chore: addressing PR comments * chore: restore vocabulary notebook * feat: base eval support for json normalizers * chore: cleaner implementation of vocab * chore: same step name as typescript sdk + edge case unit test --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Jira:
Implementation of the conventionality evaluator in the Python sdk
Test Plan