feat(python-sdk): contract test scaffold and conventionality contract test#39
Conversation
There was a problem hiding this comment.
Pull request overview
Adds contract-test infrastructure to the Python SDK and seeds it with an initial Conventionality evaluator contract artifact + test, ensuring evaluator behavior matches the reference notebook and that bundled artifacts stay synced with canonical settings.
Changes:
- Introduces
contracts.tomlartifacts for the Conventionality evaluator (canonical undersdks/settings/plus bundled copy under the Python package). - Adds a contract-test loader + harness and a Conventionality contract test that asserts prompt fidelity and result mapping.
- Adds Makefile targets and a sync-guard test to keep bundled contract artifacts byte-identical to the canonical source.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/settings/conventionality/contracts.toml |
Adds canonical Conventionality contract artifact captured from the notebook. |
sdks/python/src/learning_commons_evaluators/settings/conventionality/contracts.toml |
Adds bundled package copy of the Conventionality contract artifact for installed-package testing. |
sdks/python/tests/settings/test_load_settings.py |
Adds bundled-artifact presence check and a canonical-vs-bundled sync guard. |
sdks/python/tests/contract_tests/loader.py |
Adds TOML-backed contract case model + loader resolving via the package settings root. |
sdks/python/tests/contract_tests/harness.py |
Adds provider-mocking harness that captures prompt requests and asserts contract fidelity. |
sdks/python/tests/contract_tests/conventionality.py |
Adds Conventionality case loader and notebook→SDK expected-result mapper. |
sdks/python/tests/contract_tests/test_conventionality.py |
Adds the initial Conventionality contract test for the “turnip” case. |
sdks/python/tests/contract_tests/__init__.py |
Defines the contract-tests package and documents the contract-test approach. |
sdks/python/Makefile |
Adds build/check-build and contract-test/sync targets for artifact maintenance. |
evals/conventionality_evaluator.ipynb |
Updates the notebook to capture LLM calls and print a contracts.toml block. |
evals/capture.py |
Adds notebook utilities for capturing prompt/response snapshots and emitting TOML artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| temperature: float | ||
| llm_response: str | ||
|
|
||
| def is_populated(self) -> bool: |
There was a problem hiding this comment.
P1 - Currently unused. Is this used in a downstream PR and in tests?
There was a problem hiding this comment.
Updated. Currently used to check if test artifact still has placeholders.
…a/sdk_python_contract_tests
…a/sdk_python_contract_tests
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| def __exit__(self, *args: Any) -> None: | ||
| if self._patch is not None: | ||
| self._patch.stop() |
There was a problem hiding this comment.
P1 - in case the test author misses calling the assert_prompt_step,
perhaps in the exit, we can compare the prompt_steps & _captured counts as an exit validation
There was a problem hiding this comment.
These tests can get better. In an earlier iteration, I missed an assert. It's a decent start that we can build on.
…a/sdk_python_contract_tests
* feat: vocabulary evaluator * chore: update vocabulary settings to use instead of for prompt settings * chore: fix capture and contract tests * chore: vocabulary settings are required * feat: eval instance settings overrides * chore: addressing PR comments * chore: restore vocabulary notebook * feat: base eval support for json normalizers * chore: cleaner implementation of vocab * chore: same step name as typescript sdk + edge case unit test
f34050e
into
fsisenda/sdk_python_basic_conventionality
Summary
Jira:
Contract tests for evaluators in the Python SDK
Test Plan