Skip to content

feat(langsmith): add pre-upgrade backfill check Job#591

Draft
Bagatur (baskaryan) wants to merge 2 commits intomainfrom
bagatur/backfill-pre-upgrade-check
Draft

feat(langsmith): add pre-upgrade backfill check Job#591
Bagatur (baskaryan) wants to merge 2 commits intomainfrom
bagatur/backfill-pre-upgrade-check

Conversation

@baskaryan
Copy link

@baskaryan Bagatur (baskaryan) commented Feb 23, 2026

Descriptoin

Adds a pre-upgrade Kubernetes Job that runs before Alembic migrations on every helm upgrade, blocking self-hosted customers
from upgrading to a version where backfill code has been deleted before the backfill has completed on their installation.

Companion to langchainplus PR #18179 which adds the script and
CI enforcement.

New files

  • charts/langsmith/templates/backend/backfill-check.yaml — a Helm pre-upgrade Job (ArgoCD PreSync) that runs
    backfill_check_entrypoint.sh from the backend image before any schema migrations are applied.

Key design decisions

Decision Rationale
helm.sh/hook: pre-upgrade only (not pre-install) Fresh installs have no backfills to wait for
backoffLimit: 0 No retries — operator fixes the issue (waits for backfill to complete) then re-runs the upgrade
hook-delete-policy: before-hook-creation Each upgrade attempt gets a fresh Job; a stale success never silently
bypasses a new backfill
argocd.argoproj.io/hook: PreSync + BeforeHookCreation Same semantics for ArgoCD-managed self-hosted deployments
LANGSMITH_APP_VERSION injected from {{ .Chart.AppVersion }} Script uses this to determine which registry entries are
active; no per-environment config needed
Enabled by default Protection on by default; set backend.backfillCheck.enabled: false to bypass

Test Plan

  • helm template the chart and verify the backfill-check Job renders with the correct hook annotations
  • helm upgrade --dry-run against a cluster where backfill_jobs has a running row — confirm the pre-upgrade Job
    fails and the upgrade is blocked
  • helm upgrade --dry-run with all backfills complete — confirm it proceeds to migrations
  • Set backend.backfillCheck.enabled: false and confirm the Job is not rendered

Introduces a Helm pre-upgrade hook Job (backfill-check.yaml) that runs
the backfill_check_entrypoint.sh script from the backend image before
any Alembic migrations are applied on upgrade.

If any asynq backfill is still in 'running' state the Job exits non-zero,
causing the Helm upgrade (and ArgoCD PreSync) to fail with a clear message
instructing the operator to remain on the current version until the
backfills complete.

Key design decisions:
- pre-upgrade only (not pre-install) — backfills don't exist on fresh installs
- backoffLimit: 0 — no retries; fix the underlying issue, then re-upgrade
- helm.sh/hook-delete-policy: before-hook-creation — each upgrade attempt
  gets a fresh Job so stale results from a previous attempt don't interfere
- Enabled by default; set backend.backfillCheck.enabled: false to bypass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@baskaryan Bagatur (baskaryan) marked this pull request as draft February 23, 2026 22:29
The check script uses this to determine which entries in REQUIRED_BACKFILLS
are active for the current upgrade (only those whose required version <=
the chart's appVersion are enforced).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@baskaryan
Copy link
Author

no need to land till v14 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant