-
Notifications
You must be signed in to change notification settings - Fork 351
fix: reduce agentic-workflows test scope and strengthen safe-output instructions in Agent Persona Explorer #26152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -73,7 +73,7 @@ Store all scenarios in cache memory. | |||||||||||||
|
|
||||||||||||||
| ## Phase 3: Test Agent Responses (15 minutes) | ||||||||||||||
|
|
||||||||||||||
| **Token Budget Optimization**: Test a **representative subset of 6-8 scenarios** (not all scenarios) to reduce token consumption while maintaining quality insights. | ||||||||||||||
| **Token Budget Optimization**: Test a **representative subset of 3-4 scenarios** (not all scenarios) to reduce token consumption and ensure budget remains for Phase 5 publishing. | ||||||||||||||
|
|
||||||||||||||
| For each selected scenario, invoke the "agentic-workflows" custom agent tool and: | ||||||||||||||
|
|
||||||||||||||
|
|
@@ -99,6 +99,7 @@ For each selected scenario, invoke the "agentic-workflows" custom agent tool and | |||||||||||||
| - You are ONLY testing the agent's responses, NOT creating actual workflows | ||||||||||||||
| - **Keep responses focused and concise** - summarize findings instead of verbose descriptions | ||||||||||||||
| - Aim for quality over quantity - fewer well-analyzed scenarios are better than many shallow ones | ||||||||||||||
| - **If any tool call fails, record the error briefly and move on to the next scenario** - do NOT retry or get stuck | ||||||||||||||
|
|
||||||||||||||
| ## Phase 4: Analyze Results (4 minutes) | ||||||||||||||
|
|
||||||||||||||
|
|
@@ -124,7 +125,9 @@ Review all captured responses and identify: | |||||||||||||
|
|
||||||||||||||
| ## Phase 5: Document and Publish Findings (1 minute) | ||||||||||||||
|
|
||||||||||||||
| Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. | ||||||||||||||
| **MANDATORY OUTPUT**: Regardless of how many phases completed successfully, you MUST call either the `create discussion` or the `noop` safe-output tool before finishing. Failing to call a safe-output tool is the most common cause of workflow failures. | ||||||||||||||
|
|
||||||||||||||
| Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. Even if only 1-2 scenarios were tested, create the discussion with partial results. | ||||||||||||||
|
Comment on lines
+128
to
+130
|
||||||||||||||
| **MANDATORY OUTPUT**: Regardless of how many phases completed successfully, you MUST call either the `create discussion` or the `noop` safe-output tool before finishing. Failing to call a safe-output tool is the most common cause of workflow failures. | |
| Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. Even if only 1-2 scenarios were tested, create the discussion with partial results. | |
| **MANDATORY OUTPUT**: Regardless of how many phases completed successfully, you MUST call either the `create-discussion` or the `noop` safe-output tool before finishing. Failing to call a safe-output tool is the most common cause of workflow failures. | |
| Create a GitHub discussion with a **concise** summary report. Use the `create-discussion` safe-output to publish your findings. Even if only 1-2 scenarios were tested, create the discussion with partial results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still references to the old scenario-count guidance ("6-8") elsewhere in this document (e.g., the front-matter guardrail comment and the report template line "Scenarios Tested: [count - should be 6-8]"). Since Phase 3 and Success Criteria are now 3-4 scenarios, these stale references can confuse the agent and undermine the token-budget goal; update the remaining "6-8" text to match the new 3-4 scope.