Skip to content

Bump azure-ai-evaluation from 1.15.0 to 1.16.8#73

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/azure-ai-evaluation-1.16.8
Open

Bump azure-ai-evaluation from 1.15.0 to 1.16.8#73
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/azure-ai-evaluation-1.16.8

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github May 20, 2026

Bumps azure-ai-evaluation from 1.15.0 to 1.16.8.

Release notes

Sourced from azure-ai-evaluation's releases.

azure-ai-evaluation_1.16.8

1.16.8 (2026-05-19)

Features Added

  • App Insights logging now forwards arbitrary evaluator-specific keys from each event's properties payload as a single gen_ai.evaluation.properties JSON attribute (carried inside internal_properties). Previously only the four red-team keys (attack_success, attack_technique, attack_complexity, attack_success_threshold) were forwarded; structured outputs such as rubric dimension_scores were silently dropped. Payloads larger than 7500 characters are replaced with a valid JSON marker ({"truncated": true, "original_size_bytes": <n>}) so consumers can always json.loads the value. Non-dict properties payloads are now safely ignored instead of raising in the red-team forwarder.

azure-ai-evaluation_1.16.7

1.16.7 (2026-05-07)

Features Added

  • Added extra_headers keyword argument to RaiServiceEvaluatorBase (and all content safety evaluators) to allow passing custom HTTP headers to all backend RAI service calls. SDK-owned headers (Authorization, User-Agent, Content-Type, aml-user-token, x-ms-client-request-id) cannot be overridden by extra_headers.

  • Added status field ("completed", "error", "skipped") on evaluation result items to indicate evaluator execution outcome.

  • Added skipped and errored counts to result_counts and per_testing_criteria_results in AOAI evaluation summaries.

  • Added skipped to ResultCount and skipped/errored to PerTestingCriteriaResult typed contracts.

Bugs Fixed

  • _TaskNavigationEfficiencyEvaluator now accepts JSON-stringified response and ground_truth inputs (e.g., from data pipelines that serialize list/tuple inputs to strings). String inputs are parsed as JSON; on parse failure the original value is preserved so downstream validation surfaces the error as before.
  • Fixed error blame attribution in _get_single_run_results to perform a case-insensitive comparison when checking the AOAI error code for UserError, ensuring failed evaluation runs are correctly classified as user errors regardless of server-side casing.
  • Fixed deflection_rate evaluator showing incorrect pass/fail labels where all results were labeled "pass" regardless of the actual score. The inverse metric adjustment was overriding the evaluator's correct string labels, remapping every result to "pass".
  • Fixed evaluate() raising EvaluationException: (InternalError) unhashable type: 'list' when an evaluator emitted a list value under a _result-suffixed column. Binary aggregation now skips such columns with a warning instead of aborting the entire run.
  • Fixed task_adherence red team scoring by adding scenario=redteam to the RAI scorer evaluation payload, ensuring the server-side score mapping correctly routes to Direct mapping for attack success determination.
  • Fixed row classification double-counting in _calculate_aoai_evaluation_summary where errored rows were counted separately and could also be counted as passed/failed. Rows are now classified into mutually exclusive buckets with priority: passed > failed > errored > skipped.
  • Fixed row classification where rows with empty or missing results lists were incorrectly counted as "passed" (the condition passed_count == len(results) - error_count evaluated 0 == 0 as True).
  • Fixed _get_metric_result prefix matching where shorter metric names (e.g., xpia) could match before longer, more-specific ones (e.g., xpia_manipulated_content). Now sorts by length descending for correct longest-prefix matching.
  • Fixed non-dict _properties values from evaluators causing downstream issues. Values that are not dicts are now logged and dropped gracefully.
  • Fixed filename length error in _inline_image by catching OSError/ValueError during local path resolution and fall back to returning a text chunk instead of throwing.

Other Changes

  • Moved token usage attributes (gen_ai.evaluation.usage.input_tokens, gen_ai.evaluation.usage.output_tokens) from standard App Insights event attributes into the internal_properties JSON bag to align with internal telemetry conventions.

azure-ai-evaluation_1.16.6

1.16.6 (2026-04-27)

Bugs Fixed

  • Fixed evaluation token usage not being emitted in the genai evaluation event, causing token consumption metrics to be missing from telemetry.
  • Fixed multi-turn red team attacks(RedTeamingAttack-based strategies like MultiTurn) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) RedTeamingAttack._setup_async raised RuntimeError: Conversation already exists because it seeded prepended conversation messages before calling set_system_prompt; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) RedTeamingAttack._generate_next_prompt_async returned context.next_message without calling .duplicate_message(), causing sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
  • Fixed sensitive_data_leakage red team attacks producing 100% false-pass rates. _extract_context_items in the Foundry execution path only handled list or dict shapes for messages[0].context; pre-curated SDL attack objectives store the document text as a str with sibling context_type/tool_name fields, so the document was silently dropped and a fallback synthesized a context item from the user prompt. The agent never received the sensitive document content and could not leak it, causing the evaluator to score every attempt as a pass. Added str handling (both message-level and top-level), normalized raw string entries inside list-shaped context, and gated the context_type fallback so it only runs when no usable context was extracted (including the context: null case).
Commits
  • adcf0f3 Bump azure-ai-evaluation version to 1.16.8
  • dfa2b30 Move evaluator properties changelog entry to 1.16.8 section
  • 1aa3506 Run black on _evaluate.py to satisfy CI lint
  • 42c68b2 Address Copilot review feedback
  • b3dfccb azure-ai-evaluation: forward evaluator properties to App Insights
  • e2cb236 Set CHANGELOG date to 2026-05-07 and bump version to 1.16.7
  • fd63edb Fix TaskNavigationEfficiencyEvaluator threshold defaulting to 3.0 for binary ...
  • 6a0a5aa Standradize Task Navigation Efficiency Output (#46474)
  • fd3c19d Accept JSON string inputs in TaskNavigationEfficiencyEvaluator (#46760)
  • 369603e fix filename length limit error (#46771)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [azure-ai-evaluation](https://github.com/Azure/azure-sdk-for-python) from 1.15.0 to 1.16.8.
- [Release notes](https://github.com/Azure/azure-sdk-for-python/releases)
- [Commits](Azure/azure-sdk-for-python@azure-ai-evaluation_1.15.0...azure-ai-evaluation_1.16.8)

---
updated-dependencies:
- dependency-name: azure-ai-evaluation
  dependency-version: 1.16.8
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 20, 2026
Copilot AI review requested due to automatic review settings May 20, 2026 04:40
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 20, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant