feat(python-sdk): conventionality evaluator by czi-fsisenda · Pull Request #38 · learning-commons-org/evaluators

czi-fsisenda · 2026-04-30T14:16:11Z

Summary

Jira:

Implementation of the conventionality evaluator in the Python sdk

creates settings file for conventionality
generates Python settings from settings file
implements conventionality
unit tests

Test Plan

Wrote automated tests
Manually tested my changes, and here are the details:

Copilot

Pull request overview

Adds a new “Conventionality” evaluator to the Python SDK, backed by shared TOML settings and a generated, import-time settings module, plus extensive unit tests around evaluator inputs/settings loading and BaseEvaluator behavior.

Changes:

Added conventionality evaluator settings TOML and generated Python settings module.
Implemented ConventionalityEvaluator (+ input/output schemas) and exported it from the SDK public API.
Added/updated unit tests covering settings loading, evaluator schemas, evaluator behavior, and BaseEvaluator internals; added Makefile targets for settings generation checks.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
sdks/settings/conventionality/settings.toml	New canonical settings (metadata, prompts, model config) for conventionality.
sdks/python/src/learning_commons_evaluators/settings/_generated_conventionality_settings.py	Generated settings module consumed at runtime by the evaluator.
sdks/python/src/learning_commons_evaluators/schemas/conventionality.py	Adds conventionality settings/output Pydantic models.
sdks/python/src/learning_commons_evaluators/schemas/init.py	Exposes conventionality schemas via `learning_commons_evaluators.schemas`.
sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py	Implements `ConventionalityEvaluator` + typed `ConventionalityEvaluationInput`.
sdks/python/src/learning_commons_evaluators/evaluators/init.py	Exports conventionality evaluator/input from the evaluators package.
sdks/python/src/learning_commons_evaluators/init.py	Exposes conventionality evaluator/input/schemas from the root package API.
scripts/generate_settings.py	New generator for `_generated__settings.py` from `sdks/settings/*/settings.toml`.
sdks/python/Makefile	Adds `generate-settings` / `check-generated` targets and wires `check-generated` into `verify`.
sdks/python/tests/test_package_imports.py	Ensures conventionality evaluator is importable from the root package.
sdks/python/tests/settings/test_load_settings.py	New tests for TOML settings loader + helpers and `shared_settings_root()`.
sdks/python/tests/schemas/test_evaluator_schemas.py	New tests for `EvaluationInput` coercion/validation/metadata behaviors.
sdks/python/tests/evaluators/test_conventionality.py	New tests for conventionality evaluator wiring and output typing.
sdks/python/tests/evaluators/test_base.py	New tests for BaseEvaluator telemetry branching, step execution, chain execution, errors, token usage.
sdks/python/tests/conftest.py	Updates fixtures to use real `ConventionalityEvaluationSettings` instead of a stub.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adnanrhussain · 2026-05-04T19:06:43Z

+            chain_inputs=prompt_inputs,
+            parser_output_type=ConventionalityOutput,
+        )
+        assert isinstance(conventionality_output, ConventionalityOutput)


P0 - Similar to above, we should raise an explicit EvaluatorError here

No longer need to assert. execute_prompt_chain_step returns are typed.

adnanrhussain · 2026-05-04T19:38:47Z

+            ]
+        ).partial(format_instructions=parser.get_format_instructions())
+        conventionality_output = self.execute_prompt_chain_step(
+            step_name="main",


P0 -

Suggested change

step_name="main",

step_name="conventionality_evaluation",

adnanrhussain · 2026-05-04T19:41:35Z

+        'text': TextInputSpec(name='text', min_text_length=10, max_text_length=10000),
+        'grade': GradeInputSpec(
+            name='grade',
+            allowed_grades=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],


P0 - 3 -12 only

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…anch when I split the PR using AI

…Config

Aligns with TextInputField.input_metadata() returning len() as int. Co-authored-by: Cursor <cursoragent@cursor.com>

…basic_conventionality

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…d in later PR. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…basic_conventionality

adnanrhussain

lgtm, thank you for iterating

adnanrhussain · 2026-05-13T17:23:03Z

+                ("system", prompts["system_prompt"]),
+                ("human", prompts["human_prompt"]),
+            ]
+        ).partial(format_instructions=parser.get_format_instructions())


P2 - For later, we need to move to with_structured_output

…basic_conventionality

… test (#39) * feat: contract test scaffold and conventionality contract test * chore: fix build issues * ci: fixing build * chore: moved capture script to scripts folder within python sdk * Align conventionality_evaluator notebook with main Co-authored-by: Cursor <cursoragent@cursor.com> * chore: addressing PR comments * feat(python-sdk): vocabulary evaluator (#36) * feat: vocabulary evaluator * chore: update vocabulary settings to use instead of for prompt settings * chore: fix capture and contract tests * chore: vocabulary settings are required * feat: eval instance settings overrides * chore: addressing PR comments * chore: restore vocabulary notebook * feat: base eval support for json normalizers * chore: cleaner implementation of vocab * chore: same step name as typescript sdk + edge case unit test --------- Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(python-sdk): python SDK scaffold * feat(python-sdk): conventionality evaluator (#38) * feat(python-sdk): contract test scaffold and conventionality contract test (#39) * feat(python-sdk): vocabulary evaluator (#36)

feat: conventionality evaluator

fa41738

czi-fsisenda changed the base branch from main to fsisenda/sdk_python_scaffold April 30, 2026 14:20

czi-fsisenda requested review from adnanrhussain, Copilot and georgemelvin April 30, 2026 14:21

Copilot started reviewing on behalf of czi-fsisenda April 30, 2026 14:22 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

adnanrhussain requested changes May 4, 2026

View reviewed changes

czi-fsisenda and others added 19 commits May 12, 2026 12:16

chore: eval version as string

e2849a7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

test: generalize min text length test description

07a755e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: PR comments - config validation todo, input_metadata only in logs

cd7867b

chore: execute_step implementation without unsafe casts

673857e

test: unit tests for base evaluator. somehow didn't make into this br…

bf67996

…anch when I split the PR using AI

chore: textLength as int

f9fb6d5

feat: strip white space by default for text inputs

354ced2

chore: PR comments TODOs, remove redundant fields from PromptProvider…

7d14607

…Config

chore: introduce TelemetryConfig class.

bfcd4c8

chore: remove custom LLM endpoints support for now.

2317005

chore: errors TODOs

6d69909

chore: simplified load_settings and formatting

16e2432

chore: update make

1052031

ci: fix CI?

9b7429a

chore: first pass addressing PR comments

1c6ee38

test: expect integer textLength in input_metadata assertions

b264db5

Aligns with TextInputField.input_metadata() returning len() as int. Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'fsisenda/sdk_python_scaffold' into fsisenda/sdk_python_…

c1985f9

…basic_conventionality

chore: updating generate_settings to be more general + misc PR updates

23f3def

chore: simplified and generalized generate_settings

93ecf53

czi-fsisenda requested a review from Copilot May 12, 2026 23:03

Copilot started reviewing on behalf of czi-fsisenda May 12, 2026 23:04 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

Comment thread sdks/python/Makefile Outdated

Comment thread sdks/python/src/learning_commons_evaluators/schemas/conventionality.py

Comment thread scripts/generate_settings.py Outdated

czi-fsisenda and others added 2 commits May 12, 2026 16:18

chore: missing contract files is just a warning for now

f8a592e

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

chore: removing unused directory from Makefile. Might be re-introduce…

5a8a8d2

…d in later PR. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

czi-fsisenda requested a review from adnanrhussain May 12, 2026 23:55

czi-fsisenda and others added 5 commits May 12, 2026 17:36

Merge branch 'fsisenda/sdk_python_scaffold' into fsisenda/sdk_python_…

764dd6a

…basic_conventionality

chore: generalized test_package_imports

1007197

chore: generalized conftest

ee9133e

chore: generalized test_evaluator_schemas

15f22f3

chore: move generate_settings script into sdks/python directory

3ae2f5b

adnanrhussain approved these changes May 13, 2026

View reviewed changes

czi-fsisenda and others added 2 commits May 13, 2026 19:34

Merge branch 'fsisenda/sdk_python_scaffold' into fsisenda/sdk_python_…

fa641fa

…basic_conventionality

czi-fsisenda merged commit d8db3cd into fsisenda/sdk_python_scaffold May 14, 2026
4 checks passed

czi-fsisenda deleted the fsisenda/sdk_python_basic_conventionality branch May 14, 2026 09:22

Conversation

czi-fsisenda commented Apr 30, 2026

Summary

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adnanrhussain May 4, 2026

Choose a reason for hiding this comment

Uh oh!

czi-fsisenda May 12, 2026

Choose a reason for hiding this comment

Uh oh!

adnanrhussain May 4, 2026

Choose a reason for hiding this comment

Uh oh!

adnanrhussain May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adnanrhussain left a comment

Choose a reason for hiding this comment

Uh oh!

adnanrhussain May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants