feat(python-sdk)!: async-first evaluators with evaluate_sync wrapper#63
Merged
Conversation
- Make evaluate, evaluate_impl, execute_step, and execute_prompt_chain_step async; add evaluate_sync via asyncio.run for sync callers. - Use ainvoke for LangChain prompt chains.
There was a problem hiding this comment.
Pull request overview
This PR makes the Python SDK evaluator execution async-first by converting BaseEvaluator.evaluate, prompt-chain execution, and evaluator implementations to async, while adding evaluate_sync for synchronous callers.
Changes:
- Converted base evaluator flow and prompt-chain execution to
async/await. - Updated built-in vocabulary and conventionality evaluators to await prompt steps.
- Updated tests and README examples to use
evaluate_sync.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/python/src/learning_commons_evaluators/evaluators/base.py |
Introduces async evaluation and sync wrapper. |
sdks/python/src/learning_commons_evaluators/evaluators/conventionality.py |
Converts conventionality implementation to async prompt execution. |
sdks/python/src/learning_commons_evaluators/evaluators/vocabulary.py |
Converts vocabulary implementation and helper chain to async prompt execution. |
sdks/python/README.md |
Updates usage examples for evaluate_sync. |
sdks/python/tests/evaluators/test_base.py |
Updates base evaluator tests for async helpers and sync wrapper. |
sdks/python/tests/evaluators/test_conventionality.py |
Updates evaluator calls to evaluate_sync. |
sdks/python/tests/evaluators/test_vocabulary.py |
Updates evaluator calls to evaluate_sync. |
sdks/python/tests/contract_tests/harness.py |
Updates harness usage example. |
sdks/python/tests/contract_tests/test_conventionality.py |
Updates contract evaluator call to evaluate_sync. |
sdks/python/tests/contract_tests/test_vocabulary.py |
Updates contract evaluator calls to evaluate_sync. |
Comments suppressed due to low confidence (1)
sdks/python/src/learning_commons_evaluators/evaluators/base.py:84
- The new async public entrypoint is only exercised indirectly through
evaluate_sync; there is no test that callsawait evaluator.evaluate(...)from an existing event loop. Because this PR makesevaluatethe primary async API, add a direct async test so regressions in the awaited API are caught independently of the sync wrapper.
async def evaluate(
self,
input: InputT,
evaluation_settings: SettingsT | None = None,
) -> OutputT:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jecortez
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
async-first evaluators with evaluate_sync wrapper
Test Plan: