Skip to content

fix(deploy): surface CloudFormation rollback events in the deploy TUI (#1610)#1662

Open
aidandaly24 wants to merge 3 commits into
aws:mainfrom
aidandaly24:fix/1610
Open

fix(deploy): surface CloudFormation rollback events in the deploy TUI (#1610)#1662
aidandaly24 wants to merge 3 commits into
aws:mainfrom
aidandaly24:fix/1610

Conversation

@aidandaly24

Copy link
Copy Markdown
Contributor

Description

#1610

During a agentcore deploy (bare TUI path, isInteractive:false) where CloudFormation enters ROLLBACK_IN_PROGRESS and then the terminal ROLLBACK_FAILED state, the TUI keeps showing the frozen "Deploying to AWS" banner with a static/bouncing progress bar for minutes, giving the impression that the deploy has hung. It is not a permanent hang: once CFN reaches the terminal ROLLBACK_FAILED state, toolkit-lib throws, the CLI catches it, marks the step errored, and exits. The user just gets no feedback that a rollback is happening, then the screen looks stuck until the (possibly long) rollback completes/fails.

Fix

see the_fix field

This fix was produced by an automated triage+fix run and validated locally (build + unit suite + symptom reproduction). Opened as a draft for CI and human review; will be bug-bashed before it's marked ready.

Related Issue

Closes #1610

Documentation PR

N/A — bug fix; no devguide change required.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change? (full validation in PR comments after bug bash)

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots — N/A, no src/assets/ changes

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@github-actions github-actions Bot added the size/s PR size: S label Jun 29, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.0.tgz

How to install

gh release download pr-1662-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, focused fix — adding the rollback statuses to the parser is the right call and the test is clean and uses a realistic message shape (no excessive mocking).

One concern about how *_ROLLBACK_COMPLETE will render: getStatusColor colors any status ending in _COMPLETE green. That means a CloudFormation::Stack ROLLBACK_COMPLETE (or UPDATE_ROLLBACK_COMPLETE) line will render in green, which signals success even though semantically the deploy has actually failed and CFN just finished tearing back the changes. In a frame full of green/cyan resource lines, a user could easily glance and think the deploy succeeded — especially since the terminal Deploy to AWS Failed banner only appears once toolkit-lib throws, which may be after ROLLBACK_COMPLETE has been sitting on screen for a while.

See inline comment for options.

| 'DELETE_FAILED'
| 'ROLLBACK_IN_PROGRESS'
| 'ROLLBACK_COMPLETE'
| 'ROLLBACK_FAILED'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads-up: getStatusColor uses suffix matching, so the new ROLLBACK_COMPLETE and UPDATE_ROLLBACK_COMPLETE statuses will render green — same as CREATE_COMPLETE. That's misleading because in CFN those statuses indicate a failed deploy whose rollback finished cleanly, not success.

A few options:

  1. Special-case rollback statuses in getStatusColor so that *ROLLBACK_COMPLETE is yellow/red and *ROLLBACK_IN_PROGRESS is yellow (signalling "recovering from failure").
  2. Check for ROLLBACK in the status name first and color all rollback states yellow/red regardless of suffix.
  3. Leave it as-is if you think the outer ✗ Deploy to AWS Failed banner provides enough context — but note that banner only appears once toolkit-lib throws, which can be well after ROLLBACK_COMPLETE lands on screen.

Option 1 or 2 would give users a clearer signal during the (potentially multi-minute) rollback window that #1610 is specifically about.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 3fdae14. Went with option 2 (the cleanest): getStatusColor now checks for ROLLBACK in the status name first, so all rollback states are colored regardless of suffix:

if (status.includes('ROLLBACK')) return status.endsWith('_FAILED') ? 'red' : 'yellow';
  • ROLLBACK_FAILED / UPDATE_ROLLBACK_FAILED → red (worst case)
  • ROLLBACK_COMPLETE / UPDATE_ROLLBACK_COMPLETE / *ROLLBACK_IN_PROGRESS → yellow ("recovering from failure")

This means ROLLBACK_COMPLETE no longer renders green (the misleading-success case), and the user gets a yellow signal throughout the multi-minute rollback window — exactly the #1610 scenario — instead of waiting for the outer ✗ Deploy to AWS Failed banner.

Added tests under a new status color coding block asserting that CREATE_COMPLETE stays green while ROLLBACK_COMPLETE / UPDATE_ROLLBACK_COMPLETE are NOT green (yellow), in-progress rollbacks are yellow, and ROLLBACK_FAILED is red.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
ROLLBACK_COMPLETE / UPDATE_ROLLBACK_COMPLETE indicate a failed deploy
whose rollback finished cleanly, not a successful deploy. The previous
suffix match colored any *_COMPLETE green, misleading users during the
rollback window (aws#1610). Check for ROLLBACK first: failed rollbacks are
red, in-progress/completed rollbacks are yellow (recovering from
failure). Adds tests asserting ROLLBACK_COMPLETE does not render green.
@github-actions github-actions Bot added size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress and removed size/s PR size: S labels Jun 29, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge. Focused, well-scoped bug fix:

  • The new rollback statuses are added in the correct order in the regex alternation (UPDATE_ROLLBACK_* before UPDATE_*), so UPDATE_ROLLBACK_IN_PROGRESS won't be truncated to UPDATE_IN_PROGRESS.
  • The previously-flagged color-coding concern (rollback statuses incorrectly rendering green via the _COMPLETE suffix match) has been addressed in 3fdae14ROLLBACK_COMPLETE / UPDATE_ROLLBACK_COMPLETE now correctly render yellow, and *_ROLLBACK_FAILED renders red.
  • Tests use ink-testing-library against the real component and assert against actual rendered output (including ANSI codes via FORCE_COLOR). No excessive mocking — vi.hoisted only sets an env var, which is the right primitive for forcing chalk's color support.
  • Stack-level rollback events (AWS::CloudFormation::Stack) match the existing (AWS::\S+) resource regex, so they'll flow through the UI as expected.
  • No telemetry needed — this is a pure UI/display bug fix, not a new feature.

Nothing blocking from my side.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@aidandaly24 aidandaly24 marked this pull request as ready for review June 29, 2026 15:47
@aidandaly24 aidandaly24 requested a review from a team June 29, 2026 15:47

@notgitika notgitika left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me but I think we must align as a team on when we use yellow vs red on rollbacks.

*/
function getStatusColor(status: ResourceStatus): string | undefined {
if (status.includes('ROLLBACK')) return status.endsWith('_FAILED') ? 'red' : 'yellow';
if (status.endsWith('_COMPLETE')) return 'green';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that UPDATE_ROLLBACK_COMPLETE would also render green? doesn't that mean that the deploy failed?


const color = colorOf(lastFrame()!, 'UPDATE_ROLLBACK_COMPLETE');
expect(color).not.toBe(ANSI.green);
expect(color).toBe(ANSI.yellow);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't update rollback complete be red (since it indicates a failed deploy but a successful cleanup)?

expect(colorOf(lastFrame()!, 'ROLLBACK_IN_PROGRESS')).toBe(ANSI.yellow);
});

it('renders ROLLBACK_FAILED red (worst case)', () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that any "rollback" which indicates that certain resources were not deployed should be red. because yellow seems more like a warning or just a statement. what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

agentcore deploy gets stuck when stack goes into ROLLBACK_FAILED

3 participants