Skip to content

test(uipath-test): coverage for testcases steps + execution rerun/wait/report [TMHUB-32229]#1761

Open
ganeshborle wants to merge 1 commit into
mainfrom
test/uipath-test-steps-execution-coverage
Open

test(uipath-test): coverage for testcases steps + execution rerun/wait/report [TMHUB-32229]#1761
ganeshborle wants to merge 1 commit into
mainfrom
test/uipath-test-steps-execution-coverage

Conversation

@ganeshborle

@ganeshborle ganeshborle commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What

Adds 4 coder-eval task YAMLs under tests/tasks/uipath-test/ covering recently shipped uip tm commands that previously had zero test coverage.

Jira: TMHUB-32229 (epic CA-5 — Coding Agents for Test · Sprint S198)

Task Tier New commands covered
testcase_steps_authoring_smoke smoke uip tm testcases steps add / list / get
testcase_steps_refine_integration integration steps update / move / delete
execution_rerun_failed_integration integration executions get-stats / retry
release_signoff_wait_report_e2e e2e uip tm wait / report get

The uip tm testcases steps subgroup shipped in cli/main via PR #2625 (so the @alpha CLI the smoke runner builds has it).

No overlap

Verified (script) against all existing uipath-test tasks: each new/primary command is asserted in exactly one task. Shared supporting commands (e.g. executions list, testsets run, testcases create) are left to their existing owners and intentionally not re-asserted here. The only command shared across tasks is uip login status — the universal Critical-Rule-#1 precondition present in every uipath-test task.

Validation

All four run green locally via coder-eval against the alpha tenant:

  • testcase_steps_authoring_smoke4/4 (1.000)
  • testcase_steps_refine_integration4/4 (1.000)
  • execution_rerun_failed_integration3/3 (1.000)
  • release_signoff_wait_report_e2e3/3 (1.000)

Live-TM latency headroom: the slow multi-step tasks carry explicit run_limits (turn_timeout/task_timeout) so they don't false-fail on tenant 504/latency under the experiment defaults.

CI

smoke-skills.yml runs --tags smoke, so testcase_steps_authoring_smoke executes on this PR's smoke gate. The integration/e2e tasks run via the nightly/dashboard suites (not the PR smoke gate) — validated locally as above.

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ganeshborle's task in 3m 7s —— View job


Coder-eval task lint (advisory)

4 task YAMLs changed; verdict counts: 0 Critical, 0 High, 0 Medium, 4 Low, 0 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

✅ Author confirms passing run: "All four run green locally via coder-eval against the alpha tenant: testcase_steps_authoring_smoke — 4/4 (1.000) … release_signoff_wait_report_e2e — 3/3 (1.000)"

Per-task lint

tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml — verdict: Low (theme-captured; see Theme 1)

tests/tasks/uipath-test/testcase_steps_refine_integration.yaml — verdict: Low (theme-captured; see Theme 1)

tests/tasks/uipath-test/execution_rerun_failed_integration.yaml — verdict: Low (theme-captured; see Theme 1)

tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml — verdict: Low (theme-captured; see Theme 1)

Within-PR duplicates

No duplicate clusters detected. Each task covers a materially distinct verb family (steps add/list/get, steps update/move/delete, executions get-stats/retry, wait/report get).

Themes

  • Theme 1 (Medium): All 4 tasks rely exclusively on command_executed with no output validation. Every success criterion (aside from the shared uip login status check) is a command_executed pattern. The patterns are precise (lookaheads for required flags like --test-case-id, --step-id, --execution-id, --output json), and the commands target a live tenant, so they're not trivially gameable. However, none of the tasks include a json_check, run_command, or file_contains criterion that validates the content of the CLI output — e.g., confirming that steps list returns the expected step descriptions, that get-stats reports a non-zero count, or that report get output contains execution metadata. Suggested fix: for each task, consider adding one json_check or run_command criterion that parses the output of the primary verb (e.g., pipe steps list --output json through jq and assert the step count or a description substring). This is most impactful on the e2e task (release_signoff_wait_report_e2e), where the higher tier warrants stronger correctness signals.

Conclusion

⚠ 4 task(s) have issues, max severity Medium (via Theme 1; per-task verdicts downgraded to Low). Advisory only — not blocking merge. The command patterns are well-crafted and the verb partitioning across tasks is clean — adding even one output-content assertion per task would close the gap.


@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Review: test(uipath-test) coverage for testcases steps + execution rerun/wait/report

  • Gather context (review instructions, project conventions, existing PR comments)
  • Read full diff and all changed files
  • Apply review criteria from .github/code_review.md
  • Check test task structure and conventions
  • Post review findings

View job run

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ee20733c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Review: test(uipath-test) coverage for testcases steps + execution rerun/wait/report

  • Gather context (review instructions, project conventions, existing PR comments)
  • Read full diff and all changed files
  • Apply review criteria from .github/code_review.md
  • Check test task structure and conventions
  • Post review findings

View job run

@ganeshborle

Copy link
Copy Markdown
Contributor Author

Thanks — re: Theme 1 (output-content assertions). Acknowledged; keeping the command_executed-only criteria deliberately, for three reasons:

  1. House style + non-gameable here. All existing uipath-test tasks grade via command_executed, and these patterns use multi-flag lookaheads (--test-case-id, --step-id, --target-position \d, --execution-id, --output json) against a live tenant — the command genuinely mutates Test Manager state, so it can't be satisfied without really invoking the verb.
  2. Output-content checks are impractical for these tasks. The graded resources are created by the agent with dynamic, non-deterministic IDs and are torn down in post_run. A run_command/json_check would need the agent's dynamic test-case/execution IDs (not available to a post-hoc shell check), and the only way to feed them in is to have the agent write its results to a file — which trips the rubric's own self-report anti-pattern.
  3. Content is already gated in CI. The smoke job enforces the LLM-reviewer ≥ 0.7 threshold, which validates that the agent produced the right output/behavior qualitatively — covering the content dimension the deterministic criteria intentionally leave to the reviewer.

Net: the advisory is well-taken, but a deterministic output assertion would be either fragile or self-report-based here. Happy to revisit if there's a preferred pattern for live, self-cleaning TM tasks.

@ganeshborle

Copy link
Copy Markdown
Contributor Author

@claude please review this PR — prior automated review run was cancelled by the workflow's concurrency group (not by findings). All other checks (smoke/T1, lint, schema) are green.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ganeshborle's task in 5m 10s —— View job


PR Review: test(uipath-test) coverage for testcases steps + execution rerun/wait/report

  • Gather context (review instructions, project conventions, existing PR comments)
  • Read full diff and all changed files
  • Apply review criteria from .github/code_review.md
  • Check test task structure and conventions
  • Post review findings

Summary

Adds 4 coder-eval task YAMLs covering the uip tm testcases steps family (add/list/get/update/move/delete), execution triage (get-stats/retry), and the synchronous wait-and-report loop (wait/report get). Verb partitioning across tasks is clean — each primary command is asserted in exactly one task, with explicit NOTE comments explaining intentional non-assertion of supporting commands. Prompts are goal-oriented and defer CLI teaching to the skill.

Change-by-Change Review

1. tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml

Severity: Medium (sandbox override — see below)

Well-structured smoke test. The regex patterns use multi-flag lookaheads (--test-case-id, --description, --step-id, --output json) that are precise and not trivially gameable against a live tenant. The post_run cleanup reads testcase-key.txt and deletes the seeded test case — solid self-cleaning pattern.

Issue — sandbox.driver: tempdir (line 23–24): This task explicitly sets driver: tempdir, overriding the experiment default. Every existing uipath-test task (14 of them) omits the sandbox block entirely and inherits from the experiment. Under smoke.yaml, the experiment default is driver: docker with skills-image:latest (which has @uipath/cli baked in). This override was already flagged by the Codex automated review. Since this is the only smoke-tagged task among the four, it's the one most likely to surface a problem in CI — the smoke workflow builds the Docker image with the CLI but does not install uip on the host.

2. tests/tasks/uipath-test/testcase_steps_refine_integration.yaml

Severity: Medium (same sandbox override)

Good integration coverage of the edit verbs (update, move, delete). The --target-position \d check for steps move and the (--yes|-y) alternation for steps delete are well-crafted. Same sandbox.driver: tempdir concern as above (line 27–28) — under nightly.yaml this would also bypass Docker.

3. tests/tasks/uipath-test/execution_rerun_failed_integration.yaml

Severity: Low

Clean integration test for the rerun surface (get-stats, retry). The get-stats criterion correctly requires --output json; the retry criterion omits it (acceptable — retry is a mutation, not a query whose output is parsed).

Minor — description overstates coverage (line 6): The description field lists executions run alongside get-stats and retry as covered verbs, but no criterion asserts executions run. The later text (line 12–13) clarifies the distinction between run and retry, but the parenthetical on line 6 reads as if all three are exercised. Consider dropping executions run from the parenthetical to match what's actually graded.

Same sandbox.driver: tempdir concern (line 25–26).

4. tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml

Severity: Low

The wait --timeout bounded-wait pattern is a good addition — the lookahead (?=.*--timeout\s+\d) ensures the agent doesn't issue an unbounded wait that hangs the turn. The report get criterion correctly requires --output json.

Same sandbox.driver: tempdir concern (line 26–27).

What's Missing

  • No sandbox.driver removal: All 4 tasks should drop the explicit sandbox: driver: tempdir block and inherit from the experiment, matching every other uipath-test task. If tempdir is genuinely needed (e.g., post_run cleanup can't run inside Docker), that rationale should be documented in a comment.

Area Ratings

Area Status Notes
Frontmatter N/A Test tasks, not skill files
E2E Tests OK Smoke + e2e + 2 integration — full tier coverage
Task Structure Issue sandbox.driver: tempdir overrides experiment default in all 4 tasks
Repo Hygiene OK CODEOWNERS present, no secrets, no cross-skill deps

Issues for Manual Review

  1. sandbox.driver: tempdir CI behavior — The author reports CI passed ("smoke/T1 green"), but the override contradicts the experiment's driver: docker. Verify whether coder-eval's task-level sandbox merge actually replaces the experiment driver, or if the experiment driver takes precedence. If the task override does win, clarify why tempdir is needed (e.g., post_run shell commands that require host-level access) and whether uip is available on the CI host outside Docker.
  2. executions run in descriptionexecution_rerun_failed_integration.yaml line 6 lists executions run as a covered verb but no criterion asserts it. Cosmetic, but could mislead coverage audits.

Conclusion

The tasks are well-crafted — precise regex patterns, clean verb partitioning, goal-oriented prompts, and proper self-cleaning post_run hooks. The one substantive concern is the sandbox.driver: tempdir override present in all 4 tasks, which breaks from the established convention of inheriting the driver from the experiment config. This was already flagged by the Codex review (comment). Recommend either removing the sandbox: block from all 4 tasks (to match the other 14 uipath-test tasks) or adding a comment explaining why tempdir is required.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Claude finished @ganeshborle's task in 3m 51s —— View job


PR Review: test(uipath-test) coverage for testcases steps + execution rerun/wait/report

  • Gather context (review instructions, project conventions, existing PR comments)
  • Read full diff and all changed files
  • Apply review criteria from .github/code_review.md
  • Check test task structure and conventions
  • Post review findings

Summary

Adds 4 coder-eval task YAMLs covering uip tm testcases steps (add/list/get/update/move/delete), execution triage (get-stats/retry), and the synchronous wait-and-report loop (wait/report get). Verb partitioning is clean — each primary command is asserted in exactly one task, with explicit NOTE comments documenting intentional non-assertion of supporting commands.

Change-by-Change Review

1. tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml

Severity: OK

Well-structured smoke test. The steps add multi-flag lookahead (--test-case-id, --description) is precise. The steps list pattern includes an alternation for list-steps (:72), which is a nice future-proof touch. post_run cleanup reads testcase-key.txt and deletes the seeded test case — solid self-cleaning pattern. Prompt is goal-oriented and defers CLI teaching to the skill.

2. tests/tasks/uipath-test/testcase_steps_refine_integration.yaml

Severity: OK

Good coverage of edit verbs. The --target-position \d check for steps move (:83) and the (--yes|-y) alternation for steps delete (:90) are precise. The prompt clearly distinguishes the three review actions (fix expected result, reorder, remove) without hand-holding CLI flags. Same solid post_run cleanup.

3. tests/tasks/uipath-test/execution_rerun_failed_integration.yaml

Severity: Low

Clean integration test for the rerun surface. The get-stats criterion correctly requires --output json; retry omits it (acceptable — mutation, not a query whose output is parsed).

Minor — description overstates coverage (line 6): The parenthetical lists executions run alongside get-stats and retry as covered verbs, but no criterion asserts executions run. The later description body (lines 12–13) explains the distinction between run and retry — valuable context — but the parenthetical reads as if all three are exercised. Consider dropping executions run from the parenthetical to match what's actually graded. Fix this →

4. tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml

Severity: OK

The bounded wait --timeout pattern (:57(?=.*--timeout\s+\d)) is a good addition ensuring the agent doesn't issue an unbounded wait that hangs the turn. The report get criterion correctly requires --output json. Generous run_limits (turn_timeout: 1500, task_timeout: 1800) appropriately account for TM latency on a live tenant.

What's Missing

Nothing identified. Tier coverage is complete (1 smoke + 2 integration + 1 e2e). CODEOWNERS has entries for both /skills/uipath-test/ and /tests/tasks/uipath-test/. No secrets or personal paths. No cross-skill dependencies.

Area Ratings

Area Status Notes
Frontmatter N/A Test tasks, not skill files
E2E Tests OK Full tier coverage (smoke + 2 integration + e2e), clean verb partitioning, all 4 validated green
Task Structure OK Tags, task_id, weights, prompts, post_run all follow conventions
Repo Hygiene OK CODEOWNERS present, no secrets, changes scoped to test tasks only

Issues for Manual Review

  1. sandbox.driver: tempdir — convention break, not a functional issue. All 4 new tasks explicitly set driver: tempdir, while all 14 existing uipath-test tasks inherit from the experiment (which defaults to driver: docker). The prior Codex review (comment) claimed this would cause uip: command not found, but this is incorrect — the GH smoke runner installs @uipath/cli globally on the host before any task runs (tests/README.md:25), and driver: tempdir is widely used across the repo (uipath-troubleshoot, uipath-ixp, uipath-review, uipath-agents, uipath-mcp-servers, uipath-maestro-bpmn — including smoke-tagged tasks like uipath-maestro-bpmn/smoke/registry_discovery.yaml). The author confirms CI passed. Still, it's a divergence from the uipath-test-specific convention — a one-line comment explaining why tempdir is preferred here (e.g., post_run host access, no benefit from Docker isolation for live-tenant tasks) would prevent the same question on future PRs.
  2. executions run in description (Low). See Change 3 above — cosmetic, but may mislead coverage audits that scrape description fields.

Conclusion

This PR is well-crafted. Precise regex patterns, clean verb partitioning with explicit ownership comments, goal-oriented prompts, appropriate run_limits for live-tenant latency, and proper self-cleaning post_run hooks. The only actionable finding is the minor executions run mention in the description parenthetical (Low). The sandbox.driver: tempdir override works correctly and is widely used across the repo, though a rationale comment would be helpful for consistency within the uipath-test task set.

Recommendation: approve. The Low finding is non-blocking.

@ganeshborle

Copy link
Copy Markdown
Contributor Author

Addressed the review findings in eb5101f3:

  • Dropped executions run from the execution_rerun_failed_integration description parenthetical (no criterion asserts it; the run-vs-retry distinction stays in the body).
  • Documented the driver: tempdir choice with a one-line rationale comment on all four tasks (post_run host cleanup for the seeding tasks; live-tenant + standalone-plan validity for the others; host uip+env-auth confirmed by the green smoke run).

Both were Low/cosmetic; logic and criteria are unchanged. Smoke (T1) + lint re-running on the new commit.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb5101f3d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from eb5101f to ac03f8d Compare June 30, 2026 10:45

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac03f8d944

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml Outdated
Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
@ganeshborle ganeshborle self-assigned this Jun 30, 2026
@ganeshborle ganeshborle added the uipath-test UiPath skill area: uipath-test label Jun 30, 2026
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from ac03f8d to 990a7d0 Compare June 30, 2026 14:02

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 990a7d01de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml Outdated
Comment thread tests/tasks/uipath-test/execution_rerun_failed_integration.yaml Outdated
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from 990a7d0 to 1117c43 Compare June 30, 2026 15:59

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1117c4386f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/execution_rerun_failed_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml Outdated
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from 1117c43 to 6228db9 Compare June 30, 2026 16:41
@ganeshborle

Copy link
Copy Markdown
Contributor Author

Pushed 6228db9d — every uip command criterion across all four tasks now enforces --output json (Critical Rule #3) and project-scoping (--project-key / --test-set-key) per SKILL.md signatures. Re-validated green via coder-eval: T1 5/5, T2 6/6, T3 3/3 (T4's wait+json confirmed against the captured command). This should close the contract-consistency thread.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6228db9d3d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/manual_test_session_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/manual_test_session_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/manual_test_session_integration.yaml Outdated
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from 6228db9 to 02e2691 Compare June 30, 2026 17:08
@ganeshborle

Copy link
Copy Markdown
Contributor Author

Scope correction (02e2691b): two files had been inadvertently swept into the branch by a glob git addmanual_test_session_integration.yaml (untracked) and integration_developer_workflow_impact.yaml (pre-existing local modification). Both are separate WIP, not part of this PR, and have been removed; the branch is back to exactly the 4 new task files. The three latest review comments target manual_test_session_integration.yaml, which is no longer in this PR — those will be handled wherever that file actually lands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02e2691b20

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from 02e2691 to 2a66f9e Compare June 30, 2026 17:36

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a66f9efc2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/testcase_steps_refine_integration.yaml Outdated
Comment thread tests/tasks/uipath-test/release_signoff_wait_report_e2e.yaml
Comment thread tests/tasks/uipath-test/testcase_steps_authoring_smoke.yaml Outdated
…t/report [TMHUB-32229]

Add 3 coder-eval tasks covering uip tm commands with no prior coverage:
- testcase_steps_lifecycle_integration (integration): full testcases steps
  lifecycle in one session — flag-mode add (one at a time) + JSON-mode add
  (one shot) + list + get + update + move + delete + case cleanup
- execution_rerun_failed_integration (integration): executions get-stats / retry
- release_signoff_wait_report_e2e (e2e): testsets run -> tm wait -> report get

Every uip-command criterion enforces --output json (Critical Rule #3) and
project-scoping per SKILL.md signatures; graded cleanup deletes seeded cases so
no tenant junk is left. Prompts read like a real user, no skill-load hint.
Validated green via coder-eval (steps 10/10, rerun 3/3, sign-off 3/3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ganeshborle ganeshborle force-pushed the test/uipath-test-steps-execution-coverage branch from 2a66f9e to 85f5d0a Compare July 4, 2026 18:28

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85f5d0a942

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- type: command_executed
description: "Agent added a step individually via flag mode (`steps add ... --description`)"
tool_name: "Bash"
command_pattern: '(?=[\s\S]*--project-key\s+\S)(?=[\s\S]*--test-case-id\s+\S)(?=[\s\S]*--description\s+\S)(?=[\s\S]*--output\s+json)uip\s+tm\s+testcases?\s+steps\s+add\b'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep flag checks on the target command

When the agent batches multiple uip calls in one Bash block, the [\s\S]* lookaheads in this pattern can be satisfied by flags on later commands before the regex matches a bare steps add; for example, an add missing --project-key/--output json followed by a scoped steps list still passes. The same structure is used throughout the new command criteria, so these evals can accept the unscoped or non-JSON calls they are intended to catch; constrain the flag checks to the same shell command/line or use order-insensitive alternatives that stop at command separators.

Useful? React with 👍 / 👎.

Comment on lines +57 to +58
- Move "Enter valid card details" so it runs right after "Open the payments
screen".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the move step change the order

At this point the initial adds already leave the steps ordered as Open the payments screen followed by Enter valid card details, so asking to move that step right after Open is a no-op. A passing run can issue a move that does not change the list and still get full credit, which means this lifecycle task no longer validates real reordering or catches target-position mistakes; ask for a step to move to a different position or assert the final order.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

uipath-test UiPath skill area: uipath-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants