Skip to content

feat(python-sdk): conventionality evaluator #38

Merged
czi-fsisenda merged 29 commits into
fsisenda/sdk_python_scaffoldfrom
fsisenda/sdk_python_basic_conventionality
May 14, 2026
Merged

feat(python-sdk): conventionality evaluator #38
czi-fsisenda merged 29 commits into
fsisenda/sdk_python_scaffoldfrom
fsisenda/sdk_python_basic_conventionality

Conversation

@czi-fsisenda
Copy link
Copy Markdown
Contributor

Summary

Jira:

Implementation of the conventionality evaluator in the Python sdk

  • creates settings file for conventionality
  • generates Python settings from settings file
  • implements conventionality
  • unit tests

Test Plan

  • Wrote automated tests
  • Manually tested my changes, and here are the details:

@czi-fsisenda czi-fsisenda changed the base branch from main to fsisenda/sdk_python_scaffold April 30, 2026 14:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Conventionality” evaluator to the Python SDK, backed by shared TOML settings and a generated, import-time settings module, plus extensive unit tests around evaluator inputs/settings loading and BaseEvaluator behavior.

Changes:

  • Added conventionality evaluator settings TOML and generated Python settings module.
  • Implemented ConventionalityEvaluator (+ input/output schemas) and exported it from the SDK public API.
  • Added/updated unit tests covering settings loading, evaluator schemas, evaluator behavior, and BaseEvaluator internals; added Makefile targets for settings generation checks.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sdks/settings/conventionality/settings.toml New canonical settings (metadata, prompts, model config) for conventionality.
sdks/python/src/learning_commons_evaluators/settings/_generated_conventionality_settings.py Generated settings module consumed at runtime by the evaluator.
sdks/python/src/learning_commons_evaluators/schemas/conventionality.py Adds conventionality settings/output Pydantic models.
sdks/python/src/learning_commons_evaluators/schemas/init.py Exposes conventionality schemas via learning_commons_evaluators.schemas.
sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py Implements ConventionalityEvaluator + typed ConventionalityEvaluationInput.
sdks/python/src/learning_commons_evaluators/evaluators/init.py Exports conventionality evaluator/input from the evaluators package.
sdks/python/src/learning_commons_evaluators/init.py Exposes conventionality evaluator/input/schemas from the root package API.
scripts/generate_settings.py New generator for _generated_*_settings.py from sdks/settings/**/settings.toml.
sdks/python/Makefile Adds generate-settings / check-generated targets and wires check-generated into verify.
sdks/python/tests/test_package_imports.py Ensures conventionality evaluator is importable from the root package.
sdks/python/tests/settings/test_load_settings.py New tests for TOML settings loader + helpers and shared_settings_root().
sdks/python/tests/schemas/test_evaluator_schemas.py New tests for EvaluationInput coercion/validation/metadata behaviors.
sdks/python/tests/evaluators/test_conventionality.py New tests for conventionality evaluator wiring and output typing.
sdks/python/tests/evaluators/test_base.py New tests for BaseEvaluator telemetry branching, step execution, chain execution, errors, token usage.
sdks/python/tests/conftest.py Updates fixtures to use real ConventionalityEvaluationSettings instead of a stub.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdks/settings/conventionality/settings.toml Outdated
Comment thread scripts/generate_settings.py Outdated
Comment thread sdks/python/tests/evaluators/test_conventionality.py Outdated
Comment thread sdks/python/tests/evaluators/test_base.py Outdated
Comment thread sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py Outdated
Comment thread scripts/generate_settings.py Outdated
Comment thread sdks/python/Makefile
Comment thread sdks/python/tests/schemas/test_evaluator_schemas.py Outdated
Comment thread sdks/python/tests/evaluators/test_base.py Outdated
Comment thread sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py Outdated
chain_inputs=prompt_inputs,
parser_output_type=ConventionalityOutput,
)
assert isinstance(conventionality_output, ConventionalityOutput)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 - Similar to above, we should raise an explicit EvaluatorError here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need to assert. execute_prompt_chain_step returns are typed.

]
).partial(format_instructions=parser.get_format_instructions())
conventionality_output = self.execute_prompt_chain_step(
step_name="main",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 -

Suggested change
step_name="main",
step_name="conventionality_evaluation",

'text': TextInputSpec(name='text', min_text_length=10, max_text_length=10000),
'grade': GradeInputSpec(
name='grade',
allowed_grades=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 - 3 -12 only

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment thread sdks/python/Makefile Outdated
Comment thread scripts/generate_settings.py Outdated
czi-fsisenda and others added 2 commits May 12, 2026 16:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…d in later PR.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@adnanrhussain adnanrhussain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you for iterating

("system", prompts["system_prompt"]),
("human", prompts["human_prompt"]),
]
).partial(format_instructions=parser.get_format_instructions())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 - For later, we need to move to with_structured_output

czi-fsisenda and others added 2 commits May 13, 2026 19:34
… test (#39)

* feat: contract test scaffold and conventionality contract test

* chore: fix build issues

* ci: fixing build

* chore: moved capture script to scripts folder within python sdk

* Align conventionality_evaluator notebook with main

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: addressing PR comments

* feat(python-sdk): vocabulary evaluator (#36)

* feat: vocabulary evaluator

* chore: update vocabulary settings to use  instead of  for prompt settings

* chore: fix capture and contract tests

* chore: vocabulary settings are required

* feat: eval instance settings overrides

* chore: addressing PR comments

* chore: restore vocabulary notebook

* feat: base eval support for json normalizers

* chore: cleaner implementation of vocab

* chore: same step name as typescript sdk + edge case unit test

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
@czi-fsisenda czi-fsisenda merged commit d8db3cd into fsisenda/sdk_python_scaffold May 14, 2026
4 checks passed
@czi-fsisenda czi-fsisenda deleted the fsisenda/sdk_python_basic_conventionality branch May 14, 2026 09:22
czi-fsisenda added a commit that referenced this pull request May 14, 2026
* feat(python-sdk): python SDK scaffold
* feat(python-sdk): conventionality evaluator  (#38)
* feat(python-sdk): contract test scaffold and conventionality contract test (#39)
* feat(python-sdk): vocabulary evaluator (#36)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants