Skip to content

Bump azure-ai-evaluation from 1.15.0 to 1.16.7#68

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/azure-ai-evaluation-1.16.7
Closed

Bump azure-ai-evaluation from 1.15.0 to 1.16.7#68
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/azure-ai-evaluation-1.16.7

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github May 13, 2026

Bumps azure-ai-evaluation from 1.15.0 to 1.16.7.

Release notes

Sourced from azure-ai-evaluation's releases.

azure-ai-evaluation_1.16.7

1.16.7 (2026-05-07)

Features Added

  • Added extra_headers keyword argument to RaiServiceEvaluatorBase (and all content safety evaluators) to allow passing custom HTTP headers to all backend RAI service calls. SDK-owned headers (Authorization, User-Agent, Content-Type, aml-user-token, x-ms-client-request-id) cannot be overridden by extra_headers.

  • Added status field ("completed", "error", "skipped") on evaluation result items to indicate evaluator execution outcome.

  • Added skipped and errored counts to result_counts and per_testing_criteria_results in AOAI evaluation summaries.

  • Added skipped to ResultCount and skipped/errored to PerTestingCriteriaResult typed contracts.

Bugs Fixed

  • _TaskNavigationEfficiencyEvaluator now accepts JSON-stringified response and ground_truth inputs (e.g., from data pipelines that serialize list/tuple inputs to strings). String inputs are parsed as JSON; on parse failure the original value is preserved so downstream validation surfaces the error as before.
  • Fixed error blame attribution in _get_single_run_results to perform a case-insensitive comparison when checking the AOAI error code for UserError, ensuring failed evaluation runs are correctly classified as user errors regardless of server-side casing.
  • Fixed deflection_rate evaluator showing incorrect pass/fail labels where all results were labeled "pass" regardless of the actual score. The inverse metric adjustment was overriding the evaluator's correct string labels, remapping every result to "pass".
  • Fixed evaluate() raising EvaluationException: (InternalError) unhashable type: 'list' when an evaluator emitted a list value under a _result-suffixed column. Binary aggregation now skips such columns with a warning instead of aborting the entire run.
  • Fixed task_adherence red team scoring by adding scenario=redteam to the RAI scorer evaluation payload, ensuring the server-side score mapping correctly routes to Direct mapping for attack success determination.
  • Fixed row classification double-counting in _calculate_aoai_evaluation_summary where errored rows were counted separately and could also be counted as passed/failed. Rows are now classified into mutually exclusive buckets with priority: passed > failed > errored > skipped.
  • Fixed row classification where rows with empty or missing results lists were incorrectly counted as "passed" (the condition passed_count == len(results) - error_count evaluated 0 == 0 as True).
  • Fixed _get_metric_result prefix matching where shorter metric names (e.g., xpia) could match before longer, more-specific ones (e.g., xpia_manipulated_content). Now sorts by length descending for correct longest-prefix matching.
  • Fixed non-dict _properties values from evaluators causing downstream issues. Values that are not dicts are now logged and dropped gracefully.
  • Fixed filename length error in _inline_image by catching OSError/ValueError during local path resolution and fall back to returning a text chunk instead of throwing.

Other Changes

  • Moved token usage attributes (gen_ai.evaluation.usage.input_tokens, gen_ai.evaluation.usage.output_tokens) from standard App Insights event attributes into the internal_properties JSON bag to align with internal telemetry conventions.

azure-ai-evaluation_1.16.6

1.16.6 (2026-04-27)

Bugs Fixed

  • Fixed evaluation token usage not being emitted in the genai evaluation event, causing token consumption metrics to be missing from telemetry.
  • Fixed multi-turn red team attacks(RedTeamingAttack-based strategies like MultiTurn) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) RedTeamingAttack._setup_async raised RuntimeError: Conversation already exists because it seeded prepended conversation messages before calling set_system_prompt; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) RedTeamingAttack._generate_next_prompt_async returned context.next_message without calling .duplicate_message(), causing sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
  • Fixed sensitive_data_leakage red team attacks producing 100% false-pass rates. _extract_context_items in the Foundry execution path only handled list or dict shapes for messages[0].context; pre-curated SDL attack objectives store the document text as a str with sibling context_type/tool_name fields, so the document was silently dropped and a fallback synthesized a context item from the user prompt. The agent never received the sensitive document content and could not leak it, causing the evaluator to score every attempt as a pass. Added str handling (both message-level and top-level), normalized raw string entries inside list-shaped context, and gated the context_type fallback so it only runs when no usable context was extracted (including the context: null case).

azure-ai-evaluation_1.16.5

1.16.5 (2026-04-08)

Bugs Fixed

  • Fixed Jinja2 Server-Side Template Injection (SSTI) vulnerability by replacing unsandboxed jinja2.Template with jinja2.sandbox.SandboxedEnvironment across all template rendering paths (CWE-1336).
  • Fixed sensitive_data_leakage risk category producing 0% attack success rate (false negatives) in the Foundry execution path. Agent-specific tool context (e.g., document_client_smode, email_client_smode) was stored in SeedObjective.metadata but never propagated to the target callback, so the agent could not access the sensitive data it was supposed to leak. Context is now delivered via prepended_conversation SeedPrompts and extracted from conversation history metadata, enabling the ACA runtime to build FunctionTool injections.
  • Fixed multi-turn and crescendo red team strategies producing output items identical to their baseline counterparts. The Foundry execution path was writing all strategies' conversations to a single shared JSONL file, causing each strategy to read all conversations and mislabel them. Now writes per-strategy JSONL files using PyRIT's scenario result grouping.

azure-ai-evaluation_1.16.4

1.16.4 (2026-04-03)

Features Added

... (truncated)

Commits
  • e2cb236 Set CHANGELOG date to 2026-05-07 and bump version to 1.16.7
  • fd63edb Fix TaskNavigationEfficiencyEvaluator threshold defaulting to 3.0 for binary ...
  • 6a0a5aa Standradize Task Navigation Efficiency Output (#46474)
  • fd3c19d Accept JSON string inputs in TaskNavigationEfficiencyEvaluator (#46760)
  • 369603e fix filename length limit error (#46771)
  • ee92223 eval: set USER_ERROR blame when AOAI run fails with UserError code (#46746)
  • c730fe8 Add scenario=redteam to RAIServiceScorer eval_input (#46701)
  • 69b0ede Support skipped status & fix results aggregation bugs (#46289)
  • 5ef0688 [feat] add extra_headers for rai service evaluators + move gen_ai usage to in...
  • 73b395e Fix inverse metric adjustment to skip string labels from code-based evaluator...
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [azure-ai-evaluation](https://github.com/Azure/azure-sdk-for-python) from 1.15.0 to 1.16.7.
- [Release notes](https://github.com/Azure/azure-sdk-for-python/releases)
- [Commits](Azure/azure-sdk-for-python@azure-ai-evaluation_1.15.0...azure-ai-evaluation_1.16.7)

---
updated-dependencies:
- dependency-name: azure-ai-evaluation
  dependency-version: 1.16.7
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 13, 2026
Copilot AI review requested due to automatic review settings May 13, 2026 00:48
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 13, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@dependabot @github
Copy link
Copy Markdown
Author

dependabot Bot commented on behalf of github May 20, 2026

Superseded by #73.

@dependabot dependabot Bot closed this May 20, 2026
@dependabot dependabot Bot deleted the dependabot/pip/azure-ai-evaluation-1.16.7 branch May 20, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant