Skip to content

Add external Karpenter guard and upgrade command to kubectl-datadog#2890

Draft
L3n41c wants to merge 4 commits intomainfrom
lenaic/add-upgrade-command
Draft

Add external Karpenter guard and upgrade command to kubectl-datadog#2890
L3n41c wants to merge 4 commits intomainfrom
lenaic/add-upgrade-command

Conversation

@L3n41c
Copy link
Copy Markdown
Member

@L3n41c L3n41c commented Apr 10, 2026

What does this PR do?

Two additions to kubectl datadog autoscaling cluster:

  1. External Karpenter guard in install: Detects an active Karpenter on the cluster via webhook configurations and validates Helm release ownership. Blocks installation when an external (non-plugin) Karpenter is found, while still allowing idempotent re-installs of our own.

  2. upgrade subcommand: Updates a Karpenter previously installed by the plugin. Auto-detects the installation namespace and previous parameters (cluster name) from the existing Helm release.

Motivation

  • Prevent install from deploying a second Karpenter controller on top of an externally-installed one.
  • Allow users to upgrade Karpenter version without having to uninstall/reinstall.

Additional Notes

Detection strategy:

  • Webhook-based detection (ValidatingWebhookConfiguration / MutatingWebhookConfiguration) identifies active Karpenter installations and their namespace
  • Helm release ownership check (additionalLabels["app.kubernetes.io/managed-by"] == "kubectl-datadog") distinguishes our installation from external ones
  • Helm secret scanning as a fallback when webhooks are absent (e.g. pods crashed)

Upgrade command:

  • No --karpenter-namespace flag — namespace is auto-detected
  • Cluster name is auto-detected from existing Helm release values
  • Helm values are reset to plugin defaults on each upgrade
  • Shared functions exported from install package to avoid duplication

Minimum Agent Versions

N/A — this is a kubectl plugin change only.

Describe your test plan

  • Unit tests for webhook detection (DetectActiveKarpenter) with various webhook configurations
  • Unit tests for Helm secret scanning (FindKarpenterHelmRelease) with various secret states
  • Unit tests for upgrade command validation and config extraction
  • go build, go test, and go vet all pass

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

🤖 Generated with Claude Code

Add a guard in `kubectl datadog autoscaling cluster install` to detect
and block installation when an external Karpenter is already active on
the cluster, using webhook-based detection with Helm release ownership
validation. This prevents accidentally deploying a second Karpenter
controller while still allowing idempotent re-installs of our own.

Add `kubectl datadog autoscaling cluster upgrade` to update a Karpenter
that was previously installed by the plugin. The command auto-detects
the installation namespace and previous parameters (cluster name) from
the existing Helm release, avoiding the need to re-specify them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@L3n41c L3n41c added the enhancement New feature or request label Apr 10, 2026
L3n41c and others added 2 commits April 10, 2026 18:34
Avoid variable shadowing of `err` in the upgrade run() method
by using plain assignments instead of if-init statements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use explicit var declaration and assignment instead of short variable
declaration to avoid shadowing the outer err variable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@L3n41c
Copy link
Copy Markdown
Member Author

L3n41c commented Apr 10, 2026

@codex review

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 19.23077% with 189 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.90%. Comparing base (5adfc81) to head (8f871c2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ctl-datadog/autoscaling/cluster/upgrade/upgrade.go 10.00% 108 Missing ⚠️
...ctl-datadog/autoscaling/cluster/install/install.go 0.00% 53 Missing ⚠️
...tl-datadog/autoscaling/cluster/common/helm/helm.go 0.00% 21 Missing ⚠️
...atadog/autoscaling/cluster/common/k8s/karpenter.go 84.61% 3 Missing and 3 partials ⚠️
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2890      +/-   ##
==========================================
- Coverage   40.06%   39.90%   -0.17%     
==========================================
  Files         319      321       +2     
  Lines       28039    28265     +226     
==========================================
+ Hits        11233    11278      +45     
- Misses      15983    16161     +178     
- Partials      823      826       +3     
Flag Coverage Δ
unittests 39.90% <19.23%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% <0.00%> (ø)
...atadog/autoscaling/cluster/common/k8s/karpenter.go 84.61% <84.61%> (ø)
...tl-datadog/autoscaling/cluster/common/helm/helm.go 0.00% <0.00%> (ø)
...ctl-datadog/autoscaling/cluster/install/install.go 12.76% <0.00%> (-2.43%) ⬇️
...ctl-datadog/autoscaling/cluster/upgrade/upgrade.go 10.00% <10.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5adfc81...8f871c2. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 bot commented Apr 10, 2026

🎯 Code Coverage (details)
Patch Coverage: 18.52%
Overall Coverage: 39.98%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 8f871c2 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f15baa9e41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

// Check for existing Karpenter installation
if found, webhookNamespace, err := commonk8s.DetectActiveKarpenter(ctx, o.Clientset); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add Helm-release fallback before proceeding with install

The new install guard only checks DetectActiveKarpenter webhooks, so when Karpenter is installed but webhook configs are absent (for example a crashed/disabled controller or manually removed webhook objects), this branch falls through and installation proceeds in the default namespace, creating a second Karpenter deployment instead of blocking or reusing the existing one. This is a production-impacting regression in the external-installation guard, especially since the commit already introduced FindKarpenterHelmRelease for exactly this fallback path.

Useful? React with 👍 / 👎.

When DetectActiveKarpenter finds no webhooks, fall back to
FindKarpenterHelmRelease to detect orphaned Helm releases
(e.g. when Karpenter pods have crashed but the release still
exists). This prevents creating a duplicate installation.

The upgrade command already used this fallback; this aligns
the install command's detection logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@L3n41c
Copy link
Copy Markdown
Member Author

L3n41c commented Apr 10, 2026

Addressed @chatgpt-codex-connector P1 feedback in commit 8f871c2: added FindKarpenterHelmRelease as a fallback in the install path when DetectActiveKarpenter finds no webhooks. This mirrors the approach already used in the upgrade command.

@L3n41c L3n41c added this to the v1.27.0 milestone Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants