-
Notifications
You must be signed in to change notification settings - Fork 44
feat: add workflow hardening investigation workflow #712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
khaliqgant
wants to merge
1
commit into
main
Choose a base branch
from
feat/workflow-hardening
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+184
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # PLAN — workflow hardening and diagnosis | ||
|
|
||
| ## Goal | ||
| Create a workflow that identifies, reproduces, and helps iron out workflow execution problems discovered during real runs. | ||
|
|
||
| ## Problems to target | ||
| 1. Agent planning fragility | ||
| - Claude plan steps can fail, idle, or return low-quality output. | ||
| - Workflows should support deterministic plan docs or strict validation gates. | ||
|
|
||
| 2. Active checkout vs hard-coded path issues | ||
| - Agents/workflow steps must operate against the current checkout/worktree, not fixed absolute repo paths. | ||
|
|
||
| 3. Missing workflow assets | ||
| - Plan docs and helper files must be present and validated early. | ||
|
|
||
| 4. Opaque validation/build phases | ||
| - Large monolithic rebuild steps hide the real failing sub-step. | ||
| - Steps should be split for observability. | ||
|
|
||
| 5. Environment drift / local state problems | ||
| - stale `.agent-relay/` | ||
| - PATH shadowing | ||
| - tracked `.trajectories` causing false dirty states | ||
| - SSH/fetch issues that affect reruns | ||
|
|
||
| 6. Build-tooling assumptions | ||
| - package builds that rely on ambient tool resolution instead of deterministic invocation | ||
|
|
||
| ## Desired outcome | ||
| A workflow that: | ||
| - uses Claude for plan/research | ||
| - uses Codex for implementation | ||
| - records environment diagnostics up front | ||
| - validates required workflow assets before agent work begins | ||
| - verifies the active checkout/worktree path before implementation | ||
| - splits build/validation into explicit steps | ||
| - produces review output with actionable distinctions: | ||
| - workflow flaw | ||
| - repo/tooling flaw | ||
| - environment-specific issue | ||
|
|
||
| ## Acceptance criteria | ||
| - Workflow file added to repo | ||
| - Supporting deterministic plan/research doc added | ||
| - New PR opened |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| import { workflow } from '@agent-relay/sdk/workflows'; | ||
| import { ClaudeModels } from '@agent-relay/sdk'; | ||
|
|
||
| await workflow('workflow-hardening-investigation') | ||
| .description('Diagnose and harden workflow execution issues across planning, checkout scoping, environment drift, and validation/build observability.') | ||
| .pattern('dag') | ||
| .channel('wf-workflow-hardening') | ||
| .maxConcurrency(3) | ||
| .timeout(3600000) | ||
|
|
||
| .agent('planner', { | ||
| cli: 'claude', | ||
| preset: 'lead', | ||
| role: 'Workflow planning and failure-analysis researcher', | ||
| model: ClaudeModels.SONNET, | ||
| retries: 2, | ||
| }) | ||
| .agent('implementer', { | ||
| cli: 'codex', | ||
| preset: 'worker', | ||
| role: 'Workflow hardening implementer', | ||
| retries: 2, | ||
| }) | ||
| .agent('reviewer', { | ||
| cli: 'codex', | ||
| preset: 'reviewer', | ||
| role: 'Workflow hardening reviewer', | ||
| retries: 1, | ||
| }) | ||
|
|
||
| .step('capture-env', { | ||
| type: 'deterministic', | ||
| command: ` | ||
| set -e | ||
| echo 'PWD='$PWD | ||
| echo 'PATH='$PATH | ||
| echo 'agent-relay versions:' | ||
| which -a agent-relay || true | ||
| agent-relay --version || true | ||
| echo 'git branch:' | ||
| git rev-parse --abbrev-ref HEAD | ||
| echo 'dirty:' | ||
| git status --short || true | ||
| echo 'has .agent-relay?' | ||
| [ -d .agent-relay ] && echo yes || echo no | ||
| echo 'has .trajectories?' | ||
| [ -d .trajectories ] && echo yes || echo no | ||
| `, | ||
| captureOutput: true, | ||
| failOnError: true, | ||
| }) | ||
|
|
||
| .step('read-plan-doc', { | ||
| type: 'deterministic', | ||
| command: 'cat workflows/PLAN-workflow-hardening.md', | ||
| captureOutput: true, | ||
| failOnError: true, | ||
| }) | ||
|
|
||
| .step('plan', { | ||
| agent: 'planner', | ||
| dependsOn: ['capture-env', 'read-plan-doc'], | ||
| task: `Create a concise workflow-hardening plan for this repo. | ||
|
|
||
| Plan doc: | ||
| {{steps.read-plan-doc.output}} | ||
|
|
||
| Current environment: | ||
| {{steps.capture-env.output}} | ||
|
|
||
| Return sections: | ||
| 1. WORKFLOW_FLAWS | ||
| 2. ENVIRONMENT_SPECIFIC_ISSUES | ||
| 3. REPO_TOOLING_ISSUES | ||
| 4. IMPLEMENTATION_PLAN | ||
| 5. VALIDATION_PLAN | ||
|
|
||
| End with PLAN_COMPLETE.`, | ||
| verification: { type: 'output_contains', value: 'PLAN_COMPLETE' }, | ||
| retries: 2, | ||
| }) | ||
|
|
||
| .step('implement', { | ||
| agent: 'implementer', | ||
| dependsOn: ['plan'], | ||
| task: `Implement the workflow hardening plan in the current checkout/worktree. | ||
|
|
||
| Plan: | ||
| {{steps.plan.output}} | ||
|
|
||
| Requirements: | ||
| - keep edits focused on workflow reliability, diagnostics, and validation clarity | ||
| - prefer current-checkout semantics over hard-coded paths | ||
| - add/adjust files needed to make workflow runs easier to debug and more deterministic | ||
| - write code/files to disk | ||
| - end by printing CHANGES_COMPLETE`, | ||
| verification: { type: 'exit_code' }, | ||
| retries: 2, | ||
| }) | ||
|
|
||
| .step('verify-diff', { | ||
| type: 'deterministic', | ||
| dependsOn: ['implement'], | ||
| command: ` | ||
| set -e | ||
| if git diff --quiet; then | ||
| echo NO_CHANGES_DETECTED | ||
| exit 1 | ||
| fi | ||
| git diff --stat | ||
| `, | ||
| captureOutput: true, | ||
| failOnError: true, | ||
| }) | ||
|
|
||
| .step('review', { | ||
| agent: 'reviewer', | ||
| dependsOn: ['plan', 'verify-diff'], | ||
| task: `Review the workflow hardening changes. | ||
|
|
||
| Plan: | ||
| {{steps.plan.output}} | ||
|
|
||
| Diff summary: | ||
| {{steps.verify-diff.output}} | ||
|
|
||
| Return: | ||
| - PASS_FAIL | ||
| - what workflow flaws were addressed | ||
| - what environment-specific issues remain out of scope | ||
| - what repo/tooling follow-ups still remain | ||
|
|
||
| End with REVIEW_COMPLETE.`, | ||
| verification: { type: 'output_contains', value: 'REVIEW_COMPLETE' }, | ||
| retries: 1, | ||
| }) | ||
|
|
||
| .run({ cwd: process.cwd() }); | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡
verify-diffusesgit diff --quietwhich cannot detect new (untracked) files created by the implementer agentThe
verify-diffstep usesgit diff --quietto confirm the implementer made changes, but this command only compares tracked files in the working tree against the index. It exits 0 (no changes) when only new untracked files exist — confirmed by a live test. The implementer's task explicitly says "add/adjust files needed", making new-file-only output a likely scenario. When that happens, the step printsNO_CHANGES_DETECTEDand exits 1, failing the workflow even though the agent did produce output. The same limitation applies togit diff --staton line 111, meaning the review step would also receive an incomplete or empty diff summary. A more robust check would usegit status --porcelain(which detects untracked files and staged changes) andgit status --shortfor the summary passed to the reviewer. The existingadd-swift-sdk.tsworkflow atworkflows/ci/add-swift-sdk.ts:513already uses the more thoroughgit diff --quiet HEADvariant, though even that doesn't cover untracked files.Was this helpful? React with 👍 or 👎 to provide feedback.