Skip to content

ci: standardize tag management strategy#308

Open
Evrard-Nil wants to merge 4 commits into
mainfrom
feat/standardize-tag-management
Open

ci: standardize tag management strategy#308
Evrard-Nil wants to merge 4 commits into
mainfrom
feat/standardize-tag-management

Conversation

@Evrard-Nil

Copy link
Copy Markdown
Contributor

Summary

Aligns chat-api with the tagging strategy already in use by cloud-api and compose-manager.

Before: main pushed :latest; no promotion pipeline; no rollback mechanism; manual semver tags triggered versioned Docker builds.

After:

  • build.yml: main pushes :staging + immutable staging-YYYYMMDD-<sha> stamp tag (exact copy via skopeo from the OCI archive, same as compose-manager).
  • promote.yml (new): manually promotes :staging:prod + prod-YYYYMMDD-<sha> + :latest. Verifies cosign signature before touching :prod. Serialized with prod-image-mutation concurrency group. Creates a GitHub release for audit trail.
  • rollback.yml (new): rolls :prod back to any prod-YYYYMMDD-<sha> tag (auto-resolves most recent different digest if no target given). Verifies cosign. Creates prod-rollback-YYYYMMDD-HHMMSS-from-<sha> audit tag + GitHub release.

:latest is kept as a compatibility alias on promotion — the existing ansible-playbooks deploy_chat_api_prod.yml playbook that uses :latest continues to work unchanged.

Test plan

  • Merge to main → confirm build pushes nearaidev/private-chat:staging and staging-YYYYMMDD-<sha> (check build summary)
  • Run Promote to Production with confirm=promote → confirm :prod, prod-YYYYMMDD-<sha>, :latest all resolve to same digest; GitHub release created
  • Run Rollback Production Image with confirm=ROLLBACK (no target) → confirm auto-resolves previous tag, :prod updated, audit tag and release created

🤖 Generated with Claude Code

Align chat-api with the cloud-api / compose-manager tagging scheme:

- build.yml: main branch now pushes :staging (was :latest) + immutable
  staging-YYYYMMDD-<sha> stamp tag for traceability and rollback history
- promote.yml: new workflow — manually promotes :staging → :prod +
  prod-YYYYMMDD-<sha> + :latest; verifies cosign signature; creates
  GitHub release; serialized via prod-image-mutation concurrency group
- rollback.yml: new workflow — rolls :prod back to any prod-* stamped
  tag (auto-resolves most recent if not specified); creates audit tag
  prod-rollback-YYYYMMDD-HHMMSS-from-<sha>; verifies cosign; same
  concurrency group as promote
@Evrard-Nil Evrard-Nil requested a review from lloydmak99 June 15, 2026 12:57
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@Evrard-Nil Evrard-Nil requested a review from PierreLeGuen June 15, 2026 12:57
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Review: ci: standardize tag management strategy

Solid, well-structured change — confirmation gates, prod-image-mutation concurrency serialization, cosign verification before any prod mutation, digest re-tagging via skopeo copy, and post-mutation digest verification are all done right. A few issues worth addressing before merge.

⚠️ Critical — Unauthenticated Docker Hub API calls will fail for a private repo (promote.yml, rollback.yml)

The image is ${{ vars.DOCKER_REGISTRY_USER }}/private-chat. The skopeo calls work because docker/login-action writes credentials that skopeo reads — but the Hub tag-listing uses raw curl with NO auth:

RESP=$(curl -fsS --max-time 30 --retry 2 --retry-delay 2 "$PAGE_URL")

For a private Docker Hub repo the tags endpoint returns 401, curl -f exits non-zero, and under set -euo pipefail the whole step dies.

  • promote.yml: this lookup only derives the cosmetic SHORT_SHA for the stamped tag name. A digest-derived fallback exists, but the hard curl failure aborts the job BEFORE the fallback runs — so promotion is blocked entirely over a cosmetic detail.
  • rollback.yml: auto-resolve (empty target_tag) breaks the same way. Explicit target_tag rollback skips the loop and is unaffected.

Fix: authenticate the Hub API (JWT via /v2/users/login with the registry creds), or make curl non-fatal so promotion falls through to the digest-derived suffix, e.g. RESP=$(curl ... "$PAGE_URL") || break. Please confirm whether private-chat is actually private — if public, this is a non-issue.

Minor / suggestions

  • jq ... | head -n1 under pipefail (promote.yml resolve step): if jq outputs more than head consumes, jq can take SIGPIPE (141) and fail the pipeline. The select filter yields one line so it likely will not trigger, but jq -r first(...) or || true is more robust.
  • Staging deploy tag: build.yml now publishes :staging instead of :latest. The dispatched deploy-chat-api-stg job (in cvm-ansible-playbooks) must reference :staging (or a stamped tag), not :latest — :latest no longer moves on main pushes. Confirm that playbook is aligned.
  • id-token: write in promote.yml job: cosign verify (not sign) does not need an OIDC token. Harmless, but can be dropped.

Otherwise the digest-pinned re-tagging, cosign identity pinning to build.yml@refs/heads/main, and the audit-tag + release trail are well done.

Verdict: ⚠️ issues found (see above).

- build.yml: remove auto-generated staging-YYYYMMDD-<sha> stamp tag
- promote.yml: replace :staging source with explicit `tag` input
  (e.g. "0.2.13" or "v0.2.13"); validates the tag exists on Docker Hub
  before proceeding; promotes that digest to :prod + :latest; no
  auto-generated prod-YYYYMMDD-<sha> tag
- rollback.yml: make target_tag required; validate it exists on Docker
  Hub; remove auto-resolved prod-* tags and auto-created audit tags;
  rolls :prod + :latest to the user-specified tag digest
- cosign verification now accepts both refs/heads/main and refs/tags/v*
  since version tags are built from a tag ref, not main

@PierreLeGuen PierreLeGuen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag standardization (main→:staging, manual promote→:prod+:latest, rollback reverts both) is internally consistent: workflows parse cleanly, prod mutations are gated by cosign verification on a main-signed digest, and the shared concurrency group serializes promote vs rollback.

One thing to confirm before relying on staging deploys:

  • .github/workflows/build.yml:40 — main now pushes :staging instead of :latest, but this same workflow still dispatches the deploy-chat-api-stg deploy. That downstream playbook lives in the external cvm-ansible-playbooks repo and currently resolves :latest (chat_api_stg.yaml:17). If it isn't updated to pull :staging, staging will freeze on the last promoted digest until a manual promote. Can't be verified from this repo — worth confirming the staging playbook references :staging (covered by the test plan's first checkbox).

Optional follow-up:

  • .github/workflows/promote.yml:55 / rollback.yml:69 — when no matching source tag is found, the short-sha falls back to a digest-derived 7-char suffix that still satisfies the [0-9a-f]{7} regex but no longer maps to a real git sha. Behavior is correct; only the audit trail loses sha traceability in that edge case.

Checks: all three workflow YAMLs parse cleanly (actionlint / python3 yaml.safe_load, only SC2129 style warnings); grep found no stale :latest references; no promote/rollback tag re-triggers build.yml. Cargo build/tests skipped — CI-only changes.

@lloydmak99 lloydmak99 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. The workflow implementation does not yet match the rollout strategy described in the PR, and it breaks the current staging deploy path:

  1. .github/workflows/build.yml:40 changes main builds from latest to staging, but the downstream staging playbook currently resolves Docker Hub tag private-chat:latest before deploying. After this merges, the repository dispatch at the end of the build would redeploy the old latest digest instead of the image just built from main. Please update the downstream staging deploy to consume staging, or keep publishing the compatibility tag until that is changed.

  2. .github/workflows/build.yml:40/:60 only publishes one mutable tag for main. The PR body promises :staging plus an immutable staging-YYYYMMDD-<sha> stamp tag, but no step creates or copies that stamp tag. Please add the immutable staging tag creation so promotions and audit trails have a stable source.

  3. .github/workflows/promote.yml:98 only copies the selected digest to :prod. The PR body says promotion should also create a prod-YYYYMMDD-<sha> tag and update :latest as a compatibility alias. Without those, rollback/audit cannot target promoted stamps and the existing prod deployment path that uses latest will not track promotions.

  4. .github/workflows/rollback.yml:11 requires a target tag and .github/workflows/rollback.yml:102 only retags :prod. The PR body says rollback can auto-resolve the previous prod digest, creates an audit tag, and creates a GitHub release. Please either implement those behaviors or update the workflow contract/body before merging.

Validation run locally:

  • git diff --check 06325a39ffccbbd4c5568dc71b434c7f726bed98...HEAD
  • Verified /Users/lloyd/Documents/Repos/cvm-ansible-playbooks/playbooks/update/chat_api_stg.yaml currently queries Docker Hub private-chat:latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants