You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using GEPA optimization (AgentOptimizer.optimize()) with evaluation sets that include failed cases (e.g., due to inference failures, user simulator errors, or API timeouts), the GEPA adapter crashes with a KeyError in gepa_root_agent_prompt_optimizer.py:150.
Failed eval cases don't populate score entries in the result.scores dictionary, but the GEPA adapter assumes all batch examples have corresponding scores. This causes the optimization loop to crash mid-evaluation.
This PR (Bug 2) → failed cases don't crash during score aggregation
Bug 3 becomes a safety net (defensive None check before rounding)
How Often This Issue Occurs
Always (100%) when:
GEPA optimization is run with eval sets containing failed cases
AND any of those failed cases are sampled during the optimization loop
AND the initial baseline eval completes (Bug 1 doesn't prevent baseline eval from completing with some fixes)
In production environments with transient failures (rate limits, timeouts, network errors), this is a regular occurrence.
Local Workaround
Monkeypatch the optimizer to handle missing scores:
# Monkey-patch before calling optimizeroriginal_evaluate=GepaRootAgentPromptOptimizer.evaluatedefpatched_evaluate(self, batch):
result=original_evaluate(self, batch)
# Ensure all batch examples have scoresforexample_idinbatch:
ifexample_idnotinresult.scores:
result.scores[example_id] =0.0returnresultGepaRootAgentPromptOptimizer.evaluate=patched_evaluate
Testing
The fix should be validated with:
Unit test covering batch evaluation with missing score entry
Integration test: GEPA optimization with mixed passing/failing eval cases
Regression test: Verify scoring identical when all cases pass
Additional Context
This bug is part of a broader resilience issue in GEPA optimization. The three cascading bugs prevent optimization from completing when ANY eval case fails, even transiently. Together they block production usage of GEPA with real-world eval sets.
The fixes are minimal (1-2 lines each) and maintain backward compatibility while enabling graceful degradation.
🔴 Required Information
Describe the Bug
When using GEPA optimization (
AgentOptimizer.optimize()) with evaluation sets that include failed cases (e.g., due to inference failures, user simulator errors, or API timeouts), the GEPA adapter crashes with aKeyErroringepa_root_agent_prompt_optimizer.py:150.Failed eval cases don't populate score entries in the
result.scoresdictionary, but the GEPA adapter assumes all batch examples have corresponding scores. This causes the optimization loop to crash mid-evaluation.This is Part 2 of a three-part error cascade:
inference_result.inferences = None→ TypeError when iteratingresult.scoresdict → KeyError when accessing scoresscore = None→ TypeError when rounding (will be mitigated by Bug 1 fix)See related issues: #5876, #5115, #5403, and PR #5878.
Steps to Reproduce
conversation_scenario(user simulation) with edge cases that cause "LLM returned only thinking tokens"AgentOptimizer.optimize()with this eval setExpected Behavior
GEPA optimization should gracefully handle failed eval cases by:
Observed Behavior
Optimization terminates prematurely, blocking the ability to use GEPA with any eval sets containing transient failures.
Environment Details
Model Information
🟡 Optional Information
Root Cause Analysis
File:
google/adk/optimization/gepa_root_agent_prompt_optimizer.py:150Current Code:
The Problem:
result.scoresbatchlist being iterated[example_id]raises KeyErrorNote: This manifests AFTER Bug 1 (PR #5878) is fixed, because:
result.scoresSuggested Fix
Use
.get()with a conservative default of0.0for failed cases:Rationale:
.get()call with default valueRelationship to Other Issues
Fix order:
How Often This Issue Occurs
Always (100%) when:
In production environments with transient failures (rate limits, timeouts, network errors), this is a regular occurrence.
Local Workaround
Monkeypatch the optimizer to handle missing scores:
Testing
The fix should be validated with:
Additional Context
This bug is part of a broader resilience issue in GEPA optimization. The three cascading bugs prevent optimization from completing when ANY eval case fails, even transiently. Together they block production usage of GEPA with real-world eval sets.
The fixes are minimal (1-2 lines each) and maintain backward compatibility while enabling graceful degradation.