feat(python-sdk)!: error handling improvements#65
Conversation
There was a problem hiding this comment.
Pull request overview
This PR revamps Python SDK error handling with a renamed input-validation error, sanitized provider/output error wrapping, retryability signals, and updated documentation/tests.
Changes:
- Replaces
ValidationError/EvaluatorTimeoutErrorpublic API names withInputValidationError/RequestTimeoutError. - Adds structured API error context, output-validation wrapping, metadata-safe error formatting, and Pydantic error sanitization helpers.
- Updates tests and README to cover the new hierarchy and behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/python/src/learning_commons_evaluators/schemas/errors.py |
Defines the new error hierarchy, wrapping, retryability, and sanitization helpers. |
sdks/python/src/learning_commons_evaluators/evaluators/base.py |
Uses sanitized metadata errors and wraps LLM output parsing/schema failures. |
sdks/python/src/learning_commons_evaluators/schemas/evaluator.py |
Updates input validation aggregation to use InputValidationError. |
sdks/python/src/learning_commons_evaluators/schemas/common_inputs.py |
Raises InputValidationError from common input fields. |
sdks/python/src/learning_commons_evaluators/schemas/__init__.py |
Updates schema-level error re-export. |
sdks/python/src/learning_commons_evaluators/errors.py |
Updates package error re-exports. |
sdks/python/src/learning_commons_evaluators/__init__.py |
Updates top-level public exports. |
sdks/python/README.md |
Documents the new hierarchy, retry behavior, and sanitized metadata behavior. |
sdks/python/tests/schemas/test_errors.py |
Expands coverage for error hierarchy, wrapping, and sanitization helpers. |
sdks/python/tests/schemas/test_evaluator_schemas.py |
Updates evaluator schema validation tests for InputValidationError. |
sdks/python/tests/schemas/test_common_inputs.py |
Updates common input validation tests for InputValidationError. |
sdks/python/tests/evaluators/test_vocabulary.py |
Updates vocabulary evaluator validation expectations. |
sdks/python/tests/evaluators/test_base.py |
Updates base evaluator tests for sanitized metadata and output-validation wrapping. |
Comments suppressed due to low confidence (2)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:452
- This final fallback makes provider exceptions with only message-based network/timeout signals (for example
ECONNREFUSED,ENOTFOUND, orrequest timed out) non-retryable genericAPIErrors. The previous implementation routed those to network/timeout errors, and the new docs still describeNetworkErroras covering connection refused/DNS failures. Consider preserving a bounded keyword fallback while keeping the returned SDK message sanitized.
return APIError(default_message, retryable=False, **common)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:448
- HTTP 408 responses are not handled as request timeouts here, so a provider exception with
status_code=408becomes a non-retryable genericAPIErrorbecause it is below the 5xx threshold. SinceRequestTimeoutErroritself defaults to status 408 and is documented as retryable, route status 408 toRequestTimeoutErrorbefore the generic status-code branch.
if status_code is not None:
return APIError(
f"API request failed (HTTP {status_code})",
status_code=status_code,
retryable=status_code >= _RETRYABLE_STATUS_MIN,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Honour subclass class-level default unless caller explicitly overrides. | ||
| if retryable is not None: | ||
| self.retryable = retryable |
|
|
||
| ## Error handling | ||
|
|
||
| Every exception the SDK raises inherits from `EvaluatorError`. Failures inside LLM prompt steps are wrapped at the boundary so callers see a predictable, sanitized hierarchy instead of raw LangChain, OpenAI, Anthropic, or HTTP-client exceptions. |
| """Strip raw input values from a list of :meth:`pydantic.ValidationError.errors` entries. | ||
|
|
||
| Pydantic's per-error dicts include an ``input`` key containing the raw | ||
| value that failed validation. For LLM outputs that value may echo prompt | ||
| content, so we drop it before the error reaches logs or telemetry. The | ||
| remaining keys (``loc``, ``type``, ``msg``, ``ctx`` if present) describe | ||
| the schema and the failure mode but not the offending content. | ||
| """ | ||
| return [{key: value for key, value in error.items() if key != "input"} for error in errors] | ||
|
|
||
|
|
| raise | ||
| except PydanticValidationError: | ||
| raise | ||
| except (PydanticValidationError, OutputParserException) as e: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:516
- This timeout check applies the message-regex fallback even when a structured non-timeout HTTP status is present. For example, a provider error with
status_code=400or401whose message mentions an invalid timeout setting would be classified as retryableRequestTimeoutErrorbefore the 4xx/auth handling below, contrary to the documented preference for structured attributes over message parsing. The message-only timeout fallback should be limited to status-less errors (or 408 specifically).
if _is_timeout(error, status_code):
return RequestTimeoutError(status_code=status_code, **common)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:519
- The network detector also runs its message-pattern fallback even when a structured HTTP status code was extracted. A provider 4xx/404 response whose message contains phrases like "could not resolve" (for example, an invalid model/resource message) will be returned as retryable
NetworkErrorbefore the model-not-found/auth/rate-limit/status handling below. As with timeouts, the message fallback should not override a known HTTP status.
if _is_network(error):
return NetworkError(**common)
| clean: dict[str, Any] = { | ||
| key: value for key, value in error.items() if key not in {"input", "ctx"} | ||
| } |
| - `model` on `APIError` — the model ID requested. | ||
| - `response_body` on `APIError` — decoded response body. Opt-in for debugging; may contain echoed prompt content, so treat as sensitive. | ||
| - `request_id` on `APIError` — provider request ID, useful for support escalation. | ||
| - `validation_errors` on `OutputValidationError` — per-field details from Pydantic's `errors()` API with raw input values stripped (safe for logs). |
| Prefers the structured attribute that openai-python and anthropic-python | ||
| both expose (``.status_code``). Falls back to ``.status`` (used by some | ||
| httpx-based clients) and finally to a regex on the message — narrowed to | ||
| HTTP-shaped 4xx/5xx codes near the start of the message to reduce false | ||
| positives like model names or token counts. | ||
| """ | ||
| for attr in ("status_code", "status"): | ||
| value = getattr(error, attr, None) | ||
| if isinstance(value, int): | ||
| return value |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:519
- The message-pattern fallback still runs even when
_provider_status_code()extracted a structured HTTP status. Because_is_timeout()and_is_network()both inspectstr(error)unconditionally, a 400/401/404 provider response whose body mentions "timeout" or "connection refused" will be reclassified asRequestTimeoutError/NetworkErrorbefore the status-code routing below. This contradicts the documented structured-signal precedence; guard the message-based checks so they only apply whenstatus_code is None(while preserving typed timeout/network exceptions if intended).
if _is_timeout(error, status_code):
return RequestTimeoutError(status_code=status_code, **common)
if _is_network(error):
return NetworkError(**common)
sdks/python/src/learning_commons_evaluators/schemas/errors.py:355
- The
.textfallback is outside the best-efforttry, so response objects whosetextproperty raises (for example, an unread/streaming HTTP response) can makewrap_provider_error()raise a new exception while trying to wrap the original provider failure. Keep the text fallback inside a guarded block, similar to the.json()call, so error wrapping never masks the original error.
json_fn = getattr(response, "json", None)
if callable(json_fn):
try:
return json_fn()
except Exception: # noqa: BLE001 — body is best-effort debug data
pass
return getattr(response, "text", None)
| clean: dict[str, Any] = { | ||
| key: value for key, value in error.items() if key not in {"input", "ctx"} | ||
| } | ||
| ctx = error.get("ctx") | ||
| if isinstance(ctx, Mapping): | ||
| safe_ctx = {k: v for k, v in ctx.items() if _is_safe_ctx_value(v)} | ||
| if safe_ctx: | ||
| clean["ctx"] = safe_ctx | ||
| sanitized.append(clean) |
| # First entry: ctx had only an unsafe value, so the whole ctx key is gone. | ||
| assert sanitized[0] == { | ||
| "loc": ("answer",), | ||
| "type": "value_error", | ||
| "msg": "Value error, rejected text: <sensitive prompt echo>", | ||
| } |
| but custom validators can attach *anything* there, including exception | ||
| objects whose ``str()`` may echo the offending input. We restrict to | ||
| JSON-primitive values and tuples/lists of the same. | ||
| """ | ||
| if value is None or isinstance(value, (bool, int, float, str)): |
| SDK errors (:class:`EvaluatorError` subclasses) already carry sanitized | ||
| messages, so include both the class name and the message for debuggability. | ||
| Any other exception is rendered as just its class name — the message may | ||
| contain raw values (e.g. an :class:`AttributeError` echoing a field name | ||
| from user input) and isn't safe to put in telemetry. | ||
| """ | ||
| if isinstance(error, EvaluatorError): | ||
| return f"{type(error).__name__}: {error}" |
| *, | ||
| default_message: str = "API request failed", |
| def __init__( | ||
| self, | ||
| message: str, | ||
| *, | ||
| status_code: int | None = None, | ||
| retryable: bool = False, | ||
| code: str | None = None, | ||
| ): | ||
| super().__init__(message, code) | ||
| retryable: bool | None = None, | ||
| provider: LLMProvider | None = None, |
Summary
InputValidationError,RequestTimeoutError) so they do not collide with builtin or Pydantic names, and documents a singleretryablesignal onEvaluatorErrorfor client retry logic.status_code,response_body,request_id,provider,model), HTTP routing (including 408 →RequestTimeoutError),OutputValidationErrorfor bad LLM output, andsanitize_pydantic_errorsfor safe telemetry.wrap_provider_errorapplies narrow timeout/network message patterns only when no HTTP status was extracted, so 4xx responses are not reclassified from body wording alone (aligns with the documented preference for structured signals over message parsing).make lint-fixand broad README + unit test coverage for the new hierarchy.Test plan
tests/schemas/test_errors.py,tests/evaluators/test_base.py, schema and vocabulary tests updated for the new types)TestWrapProviderErrorincluding new cases for structured status vs. message-pattern precedence