From e744de771aa6f3a03b906160581fe5a96100b1f4 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 14 Apr 2026 03:58:43 +0000 Subject: [PATCH 1/2] Initial plan From 9f2481ceebf659c7923b76d815c26da8132de769 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 14 Apr 2026 04:34:34 +0000 Subject: [PATCH 2/2] fix: reduce agentic-workflows test scope and strengthen safe-output instructions in Agent Persona Explorer (#25231) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/3c2b2c4d-a7b1-45b3-9855-08116477b367 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com> --- .github/workflows/agent-persona-explorer.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/.github/workflows/agent-persona-explorer.md b/.github/workflows/agent-persona-explorer.md index dac4c5363d7..fffd81fd3b6 100644 --- a/.github/workflows/agent-persona-explorer.md +++ b/.github/workflows/agent-persona-explorer.md @@ -73,7 +73,7 @@ Store all scenarios in cache memory. ## Phase 3: Test Agent Responses (15 minutes) -**Token Budget Optimization**: Test a **representative subset of 6-8 scenarios** (not all scenarios) to reduce token consumption while maintaining quality insights. +**Token Budget Optimization**: Test a **representative subset of 3-4 scenarios** (not all scenarios) to reduce token consumption and ensure budget remains for Phase 5 publishing. For each selected scenario, invoke the "agentic-workflows" custom agent tool and: @@ -99,6 +99,7 @@ For each selected scenario, invoke the "agentic-workflows" custom agent tool and - You are ONLY testing the agent's responses, NOT creating actual workflows - **Keep responses focused and concise** - summarize findings instead of verbose descriptions - Aim for quality over quantity - fewer well-analyzed scenarios are better than many shallow ones +- **If any tool call fails, record the error briefly and move on to the next scenario** - do NOT retry or get stuck ## Phase 4: Analyze Results (4 minutes) @@ -124,7 +125,9 @@ Review all captured responses and identify: ## Phase 5: Document and Publish Findings (1 minute) -Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. +**MANDATORY OUTPUT**: Regardless of how many phases completed successfully, you MUST call either the `create discussion` or the `noop` safe-output tool before finishing. Failing to call a safe-output tool is the most common cause of workflow failures. + +Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. Even if only 1-2 scenarios were tested, create the discussion with partial results. **Discussion title**: "Agent Persona Exploration - [DATE]" (e.g., "Agent Persona Exploration - 2024-01-16") @@ -221,15 +224,18 @@ Example: ## Success Criteria Your effectiveness is measured by: +- **Safe output**: ALWAYS call either `create discussion` or `noop` — this is the most critical requirement - **Efficiency**: Complete analysis within token budget (timeout: 180 minutes, concise outputs) -- **Quality over quantity**: Test 6-8 representative scenarios thoroughly rather than all scenarios superficially +- **Quality over quantity**: Test 3-4 representative scenarios thoroughly rather than many scenarios superficially - **Actionable insights**: Provide 3-5 concrete, implementable recommendations - **Concise documentation**: Report under 1000 words with progressive disclosure - **Consistency**: Maintain objective, research-focused methodology Execute all phases systematically and maintain an objective, research-focused approach to understanding the agentic-workflows custom agent's capabilities and limitations. -**Important**: If no action is needed after completing your analysis, you **MUST** call the `noop` safe-output tool with a brief explanation. Failing to call any safe-output tool is the most common cause of safe-output workflow failures. +**CRITICAL**: You MUST call a safe-output tool before finishing. Choose one: +1. Call `create discussion` to publish findings (preferred — even partial results are valuable) +2. Call `noop` if you were completely unable to gather any data ```json {"noop": {"message": "No action needed: [brief explanation of what was analyzed and why]"}}