Skip to content

feat(catalogs): expose catalog controller metrics#1787

Open
abhijith-darshan wants to merge 1 commit intomainfrom
feat/catalog_metrics
Open

feat(catalogs): expose catalog controller metrics#1787
abhijith-darshan wants to merge 1 commit intomainfrom
feat/catalog_metrics

Conversation

@abhijith-darshan
Copy link
Contributor

@abhijith-darshan abhijith-darshan commented Feb 7, 2026

Description

expose Catalog Controller metrics

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert

Related Tickets & Documents

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help
  • Separate ticket for tests # (issue/pr)

Added to documentation?

  • 📜 README.md
  • 🤝 Documentation pages updated
  • 🙅 no documentation needed
  • (if applicable) generated OpenAPI docs for CRD changes

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features
    • Prometheus catalog readiness metrics added with labels for catalog, organization, owner and readiness reason (SecretMissing, AuthError, Unknown).
  • Improvements
    • Readiness reason detection enhanced (prefers SecretMissing over AuthError) and metric updates occur after reconciliation; obsolete label values are cleared.
  • Bug Fixes
    • Catalog readiness metrics are removed when a catalog is deleted.

@github-actions github-actions bot added the size/L label Feb 7, 2026
@abhijith-darshan abhijith-darshan marked this pull request as ready for review February 10, 2026 14:30
@abhijith-darshan abhijith-darshan requested a review from a team as a code owner February 10, 2026 14:30
Copy link
Contributor

@IvoGoman IvoGoman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if this also added a metric to the PluginDefinition regarding it's ready status.
e.g. greenhouse_plugindefintion_ready{"plugindefinition", "organization", "catalog", "reason"} & one for the clusterplugindefinition

Where catalog is filled when available. That way we could have a overall perses dashboard that shows PluginDefinitions managed by Catalogs as well.

@abhijith-darshan abhijith-darshan force-pushed the feat/catalog_metrics branch 2 times, most recently from 305d140 to a1e62d7 Compare February 18, 2026 10:46
@abhijith-darshan abhijith-darshan force-pushed the feat/catalog_metrics branch 2 times, most recently from 980455f to 7c3d60f Compare February 23, 2026 13:37
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds a Prometheus readiness metric for Catalogs, computes a readiness reason (SecretMissing, AuthError, Unknown/OK) from inventory and Ready condition, exposes update/delete helpers, and integrates metric updates into the catalog reconciler (deferred update on create, delete on removal).

Changes

Cohort / File(s) Summary
Metrics subsystem
internal/controller/catalog/metrics.go
Adds a GaugeVec (greenhouse_catalog_ready) with labels (catalog, organization, owned_by, reason); registers it; implements updateCatalogReadyMetric, deleteCatalogReadyMetric, and determineReason to compute ownership, derive readiness reason from inventory and Ready condition, clear stale reasons, and set/remove metric series.
Controller integration
internal/controller/catalog/catalog_controller.go
Updates EnsureDeleted and EnsureCreated signatures to accept a named obj parameter and cast to *greenhousev1alpha1.Catalog; defers updateCatalogReadyMetric(catalog) in EnsureCreated; calls deleteCatalogReadyMetric in deletion flow; adds //nolint:unparam to verifyStatus.
Minor whitespace
internal/controller/catalog/source.go
Removes an extraneous empty line in objectReadiness (no behavioral change).

Sequence Diagram(s)

sequenceDiagram
    participant Controller
    participant Reconciler
    participant Metrics as "Metrics Package"
    participant Prometheus as "Prometheus Registry"

    rect rgba(200,230,255,0.5)
    Controller->>Reconciler: EnsureCreated(obj)
    Reconciler->>Reconciler: reconciliation logic (update resources)
    Reconciler-->>Controller: reconciliation result
    end

    rect rgba(200,255,200,0.5)
    Reconciler->>Metrics: updateCatalogReadyMetric(catalog) [deferred]
    Metrics->>Metrics: determineReason(catalog)
    Metrics->>Prometheus: readyGauge.With(labels).Set(1 or 0)
    end

    rect rgba(255,230,200,0.5)
    Controller->>Reconciler: EnsureDeleted(obj)
    Reconciler->>Metrics: deleteCatalogReadyMetric(catalog)
    Metrics->>Prometheus: readyGauge.DeleteLabelValues(...)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I hopped through code to plant a gauge,
I sniffed the secrets on each page.
If auth is foul or keys are missed,
I mark the reason on the list.
Tiny rabbit metric, neat and sage.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(catalogs): expose catalog controller metrics' clearly and specifically summarizes the main change, follows conventional commit format, and accurately reflects the changeset.
Description check ✅ Passed The PR description includes all required sections: description, PR type selection (Feature), related tickets (#1646), test status, documentation status, and a completed checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/catalog_metrics

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/controller/catalog/catalog_controller.go (1)

379-379: Remove unnecessary unparam suppression.

Line 379 suppresses unparam, but verifyStatus parameters are actively used. Keeping this suppression lowers lint signal quality.

Suggested cleanup
-//nolint:unparam
 func (r *CatalogReconciler) verifyStatus(ctx context.Context, catalog *greenhousev1alpha1.Catalog, allErrors []catalogError) (ctrl.Result, lifecycle.ReconcileResult, error) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/catalog/catalog_controller.go` at line 379, Remove the
unnecessary nolint:unparam directive on the verifyStatus declaration: the
function's parameters are used so the suppression should be deleted. Locate the
verifyStatus function in catalog_controller.go (the line with
"//nolint:unparam") and delete that nolint comment so the linter can properly
report real unparam issues elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/catalog/catalog_controller.go`:
- Around line 215-217: Ensure the catalog readiness metric is cleaned up when a
Catalog CR is deleted by adding metric removal logic to EnsureDeleted: use the
same identifying fields used in UpdateCatalogReadyMetric (e.g., catalog.Name and
catalog.Namespace from the Catalog object) to remove the
greenhouse_catalog_ready series. Either call a new helper
DeleteCatalogReadyMetric(catalog) or extend UpdateCatalogReadyMetric to accept a
deletion flag, and invoke that from EnsureDeleted in the CatalogReconciler to
guarantee the metric is removed when the CR is gone.

In `@internal/controller/catalog/metrics.go`:
- Around line 46-51: The code sets reason = "OK" for ready catalogs but can also
emit "OK" for NOT-ready catalogs when determineReason(catalog) returns "OK";
update both places that compute the reason (the blocks using isReady and
determineReason) so that when isReady is false you use determineReason(catalog)
and if that returns "OK" replace it with a distinct not-ready marker (e.g.,
"UNKNOWN_NOT_READY" or empty string) instead of "OK" so not-ready catalogs never
report "OK". Ensure the same change is applied to the duplicate logic later in
the file.

---

Nitpick comments:
In `@internal/controller/catalog/catalog_controller.go`:
- Line 379: Remove the unnecessary nolint:unparam directive on the verifyStatus
declaration: the function's parameters are used so the suppression should be
deleted. Locate the verifyStatus function in catalog_controller.go (the line
with "//nolint:unparam") and delete that nolint comment so the linter can
properly report real unparam issues elsewhere.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67bbde6 and 591fb59.

📒 Files selected for processing (3)
  • internal/controller/catalog/catalog_controller.go
  • internal/controller/catalog/metrics.go
  • internal/controller/catalog/source.go
💤 Files with no reviewable changes (1)
  • internal/controller/catalog/source.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
internal/controller/catalog/metrics.go (1)

102-110: ⚠️ Potential issue | 🟠 Major

Not-ready catalogs can still be emitted as reason="OK"

On Line 109, fallback returns "OK" even when the catalog is not ready. Since Line 50 uses this for non-ready states, you can emit value=0 with reason="OK", which breaks reason-based alerts/dashboards.

Suggested fix
- // Reason values: OK (ready), AuthError (auth issues), SecretMissing (missing secrets)
+ // Reason values: OK (ready), AuthError (auth issues), SecretMissing (missing secrets), OtherError (non-auth readiness failures)

- for _, r := range []string{"OK", "AuthError", "SecretMissing"} {
+ for _, r := range []string{"OK", "AuthError", "SecretMissing", "OtherError"} {

- return "OK"
+ return "OtherError"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/catalog/metrics.go` around lines 102 - 110, The current
fallback returns "OK" even for non-ready catalogs (the block with returns for
hasMissingSecret and hasAuthError falls through to return "OK"), which causes
non-ready metrics to be emitted with reason="OK"; update the logic so that if
the catalog is not ready you return a distinct reason (e.g. "NotReady" or
"Initializing") instead of "OK" — change the final return that currently returns
"OK" to return a non-ready reason when readiness is false (use the same
readiness check used where value=0 is emitted), keeping the existing
hasMissingSecret/hasAuthError branches intact (refer to the hasMissingSecret,
hasAuthError variables and the return "OK" in this function).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/controller/catalog/metrics.go`:
- Around line 102-110: The current fallback returns "OK" even for non-ready
catalogs (the block with returns for hasMissingSecret and hasAuthError falls
through to return "OK"), which causes non-ready metrics to be emitted with
reason="OK"; update the logic so that if the catalog is not ready you return a
distinct reason (e.g. "NotReady" or "Initializing") instead of "OK" — change the
final return that currently returns "OK" to return a non-ready reason when
readiness is false (use the same readiness check used where value=0 is emitted),
keeping the existing hasMissingSecret/hasAuthError branches intact (refer to the
hasMissingSecret, hasAuthError variables and the return "OK" in this function).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 591fb59 and 01a9da9.

📒 Files selected for processing (3)
  • internal/controller/catalog/catalog_controller.go
  • internal/controller/catalog/metrics.go
  • internal/controller/catalog/source.go
💤 Files with no reviewable changes (1)
  • internal/controller/catalog/source.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/catalog/catalog_controller.go

IvoGoman
IvoGoman previously approved these changes Mar 3, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/catalog/metrics.go`:
- Around line 44-47: The cleanup logic only deletes series for the current
ownedBy label value, leaving historical series behind; update the cleanup calls
that build label matchers (the code that computes ownedBy from catalog.Labels
and the subsequent DeleteSeries/cleanup calls) to remove/omit the owned_by label
from the matcher so deletion targets all series for the catalog+namespace
regardless of owned_by (i.e., build matchers using catalog.Name and
catalog.Namespace only, or explicitly delete across all owned_by values), and
apply the same change to the other occurrences referenced (the blocks around the
ownedBy usage at the other locations).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93268c46-9614-4fa1-b188-07efe0a09e5a

📥 Commits

Reviewing files that changed from the base of the PR and between 01a9da9 and 9fad9d8.

📒 Files selected for processing (3)
  • internal/controller/catalog/catalog_controller.go
  • internal/controller/catalog/metrics.go
  • internal/controller/catalog/source.go
💤 Files with no reviewable changes (1)
  • internal/controller/catalog/source.go

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): set secret missing and repo auth metrics

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): set catalog ready metric

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): add license header

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): update catalog object metrics func name

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): use simple if

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): suppress unparam lint issue

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): consolidate metrics with reason grouping

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): handle clean up of metrics when resource is deleted

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): remove lint overrides

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
internal/controller/catalog/metrics.go (2)

44-47: ⚠️ Potential issue | 🟠 Major

Clean up previous owned_by label values too.

The cleanup only targets the current owned_by value. If that label changes, the old time series is left behind, so this metric can report the same catalog under multiple owners until the process restarts. Please either track the previous owner per catalog or delete series without matching on owned_by.

Also applies to: 61-64, 76-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/catalog/metrics.go` around lines 44 - 47, The cleanup
currently only deletes time series matching the current ownedBy value (ownedBy
:= catalog.Labels[greenhouseapis.LabelKeyOwnedBy]), leaving old owner series
behind; update the deletion logic so it does not include the owned_by label in
the Prometheus/TS deletion matcher (i.e., match on the catalog identity fields
only, not owned_by) or alternatively track the previous owner per catalog and
delete by that previous value — apply this change to all similar cleanup blocks
that reference ownedBy (the blocks around the ownedBy assignment and the cleanup
code at the other two places noted).

101-106: ⚠️ Potential issue | 🟡 Minor

Drop the bare "auth" substring match here.

source.Message is freeform text, so matching any "auth" substring is too broad and can misclassify non-auth failures as AuthError. Keep this to specific phrases only, or switch to a structured reason once that data is available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/catalog/metrics.go` around lines 101 - 106, The substring
check for "auth" is too broad and should be removed from the Auth error
detection condition; in the authentication detection block that examines the
variable msg (the strings.Contains checks that determine an AuthError), delete
the strings.Contains(msg, "auth") clause and leave the specific checks (e.g.,
"authentication", "unauthorized", "forbidden", "credentials"); ensure the
conditional still maps to AuthError correctly and run/update any tests
referencing this behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@internal/controller/catalog/metrics.go`:
- Around line 44-47: The cleanup currently only deletes time series matching the
current ownedBy value (ownedBy :=
catalog.Labels[greenhouseapis.LabelKeyOwnedBy]), leaving old owner series
behind; update the deletion logic so it does not include the owned_by label in
the Prometheus/TS deletion matcher (i.e., match on the catalog identity fields
only, not owned_by) or alternatively track the previous owner per catalog and
delete by that previous value — apply this change to all similar cleanup blocks
that reference ownedBy (the blocks around the ownedBy assignment and the cleanup
code at the other two places noted).
- Around line 101-106: The substring check for "auth" is too broad and should be
removed from the Auth error detection condition; in the authentication detection
block that examines the variable msg (the strings.Contains checks that determine
an AuthError), delete the strings.Contains(msg, "auth") clause and leave the
specific checks (e.g., "authentication", "unauthorized", "forbidden",
"credentials"); ensure the conditional still maps to AuthError correctly and
run/update any tests referencing this behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a18cf51-3948-449e-b65b-8f5f4f3c913b

📥 Commits

Reviewing files that changed from the base of the PR and between 9fad9d8 and e812997.

📒 Files selected for processing (3)
  • internal/controller/catalog/catalog_controller.go
  • internal/controller/catalog/metrics.go
  • internal/controller/catalog/source.go
💤 Files with no reviewable changes (1)
  • internal/controller/catalog/source.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/catalog/catalog_controller.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants