Skip to content

feat(python-sdk)!: error handling improvements#65

Draft
czi-fsisenda wants to merge 8 commits into
mainfrom
fsisenda/sdk_python_errors
Draft

feat(python-sdk)!: error handling improvements#65
czi-fsisenda wants to merge 8 commits into
mainfrom
fsisenda/sdk_python_errors

Conversation

@czi-fsisenda
Copy link
Copy Markdown
Contributor

@czi-fsisenda czi-fsisenda commented May 16, 2026

Summary

  • Renames public validation and timeout types (InputValidationError, RequestTimeoutError) so they do not collide with builtin or Pydantic names, and documents a single retryable signal on EvaluatorError for client retry logic.
  • Expands provider error wrapping: sanitized SDK messages, structured fields (status_code, response_body, request_id, provider, model), HTTP routing (including 408 → RequestTimeoutError), OutputValidationError for bad LLM output, and sanitize_pydantic_errors for safe telemetry.
  • wrap_provider_error applies narrow timeout/network message patterns only when no HTTP status was extracted, so 4xx responses are not reclassified from body wording alone (aligns with the documented preference for structured signals over message parsing).
  • Adds make lint-fix and broad README + unit test coverage for the new hierarchy.

Test plan

  • Unit tests (tests/schemas/test_errors.py, tests/evaluators/test_base.py, schema and vocabulary tests updated for the new types)
  • Verified TestWrapProviderError including new cases for structured status vs. message-pattern precedence

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR revamps Python SDK error handling with a renamed input-validation error, sanitized provider/output error wrapping, retryability signals, and updated documentation/tests.

Changes:

  • Replaces ValidationError/EvaluatorTimeoutError public API names with InputValidationError/RequestTimeoutError.
  • Adds structured API error context, output-validation wrapping, metadata-safe error formatting, and Pydantic error sanitization helpers.
  • Updates tests and README to cover the new hierarchy and behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sdks/python/src/learning_commons_evaluators/schemas/errors.py Defines the new error hierarchy, wrapping, retryability, and sanitization helpers.
sdks/python/src/learning_commons_evaluators/evaluators/base.py Uses sanitized metadata errors and wraps LLM output parsing/schema failures.
sdks/python/src/learning_commons_evaluators/schemas/evaluator.py Updates input validation aggregation to use InputValidationError.
sdks/python/src/learning_commons_evaluators/schemas/common_inputs.py Raises InputValidationError from common input fields.
sdks/python/src/learning_commons_evaluators/schemas/__init__.py Updates schema-level error re-export.
sdks/python/src/learning_commons_evaluators/errors.py Updates package error re-exports.
sdks/python/src/learning_commons_evaluators/__init__.py Updates top-level public exports.
sdks/python/README.md Documents the new hierarchy, retry behavior, and sanitized metadata behavior.
sdks/python/tests/schemas/test_errors.py Expands coverage for error hierarchy, wrapping, and sanitization helpers.
sdks/python/tests/schemas/test_evaluator_schemas.py Updates evaluator schema validation tests for InputValidationError.
sdks/python/tests/schemas/test_common_inputs.py Updates common input validation tests for InputValidationError.
sdks/python/tests/evaluators/test_vocabulary.py Updates vocabulary evaluator validation expectations.
sdks/python/tests/evaluators/test_base.py Updates base evaluator tests for sanitized metadata and output-validation wrapping.
Comments suppressed due to low confidence (2)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:452

  • This final fallback makes provider exceptions with only message-based network/timeout signals (for example ECONNREFUSED, ENOTFOUND, or request timed out) non-retryable generic APIErrors. The previous implementation routed those to network/timeout errors, and the new docs still describe NetworkError as covering connection refused/DNS failures. Consider preserving a bounded keyword fallback while keeping the returned SDK message sanitized.
    return APIError(default_message, retryable=False, **common)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:448

  • HTTP 408 responses are not handled as request timeouts here, so a provider exception with status_code=408 becomes a non-retryable generic APIError because it is below the 5xx threshold. Since RequestTimeoutError itself defaults to status 408 and is documented as retryable, route status 408 to RequestTimeoutError before the generic status-code branch.
    if status_code is not None:
        return APIError(
            f"API request failed (HTTP {status_code})",
            status_code=status_code,
            retryable=status_code >= _RETRYABLE_STATUS_MIN,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdks/python/src/learning_commons_evaluators/schemas/errors.py
Comment on lines +112 to +114
# Honour subclass class-level default unless caller explicitly overrides.
if retryable is not None:
self.retryable = retryable
Comment thread sdks/python/README.md Outdated

## Error handling

Every exception the SDK raises inherits from `EvaluatorError`. Failures inside LLM prompt steps are wrapped at the boundary so callers see a predictable, sanitized hierarchy instead of raw LangChain, OpenAI, Anthropic, or HTTP-client exceptions.
Comment on lines +230 to +240
"""Strip raw input values from a list of :meth:`pydantic.ValidationError.errors` entries.

Pydantic's per-error dicts include an ``input`` key containing the raw
value that failed validation. For LLM outputs that value may echo prompt
content, so we drop it before the error reaches logs or telemetry. The
remaining keys (``loc``, ``type``, ``msg``, ``ctx`` if present) describe
the schema and the failure mode but not the offending content.
"""
return [{key: value for key, value in error.items() if key != "input"} for error in errors]


raise
except PydanticValidationError:
raise
except (PydanticValidationError, OutputParserException) as e:
Comment thread sdks/python/README.md
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:516

  • This timeout check applies the message-regex fallback even when a structured non-timeout HTTP status is present. For example, a provider error with status_code=400 or 401 whose message mentions an invalid timeout setting would be classified as retryable RequestTimeoutError before the 4xx/auth handling below, contrary to the documented preference for structured attributes over message parsing. The message-only timeout fallback should be limited to status-less errors (or 408 specifically).
    if _is_timeout(error, status_code):
        return RequestTimeoutError(status_code=status_code, **common)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:519

  • The network detector also runs its message-pattern fallback even when a structured HTTP status code was extracted. A provider 4xx/404 response whose message contains phrases like "could not resolve" (for example, an invalid model/resource message) will be returned as retryable NetworkError before the model-not-found/auth/rate-limit/status handling below. As with timeouts, the message fallback should not override a known HTTP status.
    if _is_network(error):
        return NetworkError(**common)

Comment on lines +286 to +288
clean: dict[str, Any] = {
key: value for key, value in error.items() if key not in {"input", "ctx"}
}
Comment thread sdks/python/README.md Outdated
- `model` on `APIError` — the model ID requested.
- `response_body` on `APIError` — decoded response body. Opt-in for debugging; may contain echoed prompt content, so treat as sensitive.
- `request_id` on `APIError` — provider request ID, useful for support escalation.
- `validation_errors` on `OutputValidationError` — per-field details from Pydantic's `errors()` API with raw input values stripped (safe for logs).
Comment on lines +324 to +333
Prefers the structured attribute that openai-python and anthropic-python
both expose (``.status_code``). Falls back to ``.status`` (used by some
httpx-based clients) and finally to a regex on the message — narrowed to
HTTP-shaped 4xx/5xx codes near the start of the message to reduce false
positives like model names or token counts.
"""
for attr in ("status_code", "status"):
value = getattr(error, attr, None)
if isinstance(value, int):
return value
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:519

  • The message-pattern fallback still runs even when _provider_status_code() extracted a structured HTTP status. Because _is_timeout() and _is_network() both inspect str(error) unconditionally, a 400/401/404 provider response whose body mentions "timeout" or "connection refused" will be reclassified as RequestTimeoutError/NetworkError before the status-code routing below. This contradicts the documented structured-signal precedence; guard the message-based checks so they only apply when status_code is None (while preserving typed timeout/network exceptions if intended).
    if _is_timeout(error, status_code):
        return RequestTimeoutError(status_code=status_code, **common)

    if _is_network(error):
        return NetworkError(**common)

sdks/python/src/learning_commons_evaluators/schemas/errors.py:355

  • The .text fallback is outside the best-effort try, so response objects whose text property raises (for example, an unread/streaming HTTP response) can make wrap_provider_error() raise a new exception while trying to wrap the original provider failure. Keep the text fallback inside a guarded block, similar to the .json() call, so error wrapping never masks the original error.
    json_fn = getattr(response, "json", None)
    if callable(json_fn):
        try:
            return json_fn()
        except Exception:  # noqa: BLE001 — body is best-effort debug data
            pass
    return getattr(response, "text", None)

Comment on lines +286 to +294
clean: dict[str, Any] = {
key: value for key, value in error.items() if key not in {"input", "ctx"}
}
ctx = error.get("ctx")
if isinstance(ctx, Mapping):
safe_ctx = {k: v for k, v in ctx.items() if _is_safe_ctx_value(v)}
if safe_ctx:
clean["ctx"] = safe_ctx
sanitized.append(clean)
Comment on lines +461 to +466
# First entry: ctx had only an unsafe value, so the whole ctx key is gone.
assert sanitized[0] == {
"loc": ("answer",),
"type": "value_error",
"msg": "Value error, rejected text: <sensitive prompt echo>",
}
@czi-fsisenda czi-fsisenda marked this pull request as draft May 18, 2026 15:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Comment on lines +262 to +266
but custom validators can attach *anything* there, including exception
objects whose ``str()`` may echo the offending input. We restrict to
JSON-primitive values and tuples/lists of the same.
"""
if value is None or isinstance(value, (bool, int, float, str)):
Comment on lines +307 to +314
SDK errors (:class:`EvaluatorError` subclasses) already carry sanitized
messages, so include both the class name and the message for debuggability.
Any other exception is rendered as just its class name — the message may
contain raw values (e.g. an :class:`AttributeError` echoing a field name
from user input) and isn't safe to put in telemetry.
"""
if isinstance(error, EvaluatorError):
return f"{type(error).__name__}: {error}"
Comment on lines +485 to +486
*,
default_message: str = "API request failed",
Comment on lines 107 to +113
def __init__(
self,
message: str,
*,
status_code: int | None = None,
retryable: bool = False,
code: str | None = None,
):
super().__init__(message, code)
retryable: bool | None = None,
provider: LLMProvider | None = None,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants