Skip to content

feat: remove instance name from metrics module#112

Merged
LakshanSS merged 3 commits into
openchoreo:mainfrom
chalindukodikara:issue-3299_improve_metrics_module
May 16, 2026
Merged

feat: remove instance name from metrics module#112
LakshanSS merged 3 commits into
openchoreo:mainfrom
chalindukodikara:issue-3299_improve_metrics_module

Conversation

@chalindukodikara
Copy link
Copy Markdown
Contributor

@chalindukodikara chalindukodikara commented May 16, 2026

Purpose

$subject

Approach

Summarize the solution and implementation details.

Related Issues

Include any related issues that are resolved by this PR.

Checklist

  • Tests added or updated (unit, integration, etc.)
  • Samples updated (if applicable)

Remarks

Add any additional context, known issues, or TODOs related to this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added multi-cluster deployment support with unified metric namespace and log group configuration across clusters.
    • Enhanced EKS Pod Identity authentication with streamlined installation guidance.
  • Documentation

    • Refreshed installation, authentication, and IAM permissions documentation.
    • Expanded webhook and EventBridge configuration guidance with enhanced security recommendations.
    • Clarified multi-cluster mapping and credential injection workflows.
  • Chores

    • Simplified configuration by removing the instanceName requirement.
    • Updated metric dimensions and collector telemetry settings.

Review Change Stack

Signed-off-by: Chalindu Kodikara <chalindumkodikara@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Warning

Rate limit exceeded

@chalindukodikara has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 49 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa3aee4c-6514-4cd4-9c4b-9830b6d59600

📥 Commits

Reviewing files that changed from the base of the PR and between f8ca92b and f28b087.

📒 Files selected for processing (2)
  • observability-metrics-cloudwatch/README.md
  • observability-metrics-cloudwatch/helm/Chart.yaml
📝 Walkthrough

Walkthrough

This PR removes the per-cluster instanceName concept and pivots the observability module to use shared CloudWatch EMF log groups/metric namespaces across multiple clusters. Configuration now requires AWS_REGION instead of INSTANCE_NAME, Helm templates remove the instanceName helper and InstanceName metric dimension, and installation documentation is reorganized to emphasize EKS Pod Identity and expand webhook/EventBridge setup guidance.

Changes

Remove instanceName across configuration, code, and templates

Layer / File(s) Summary
Configuration and core type definitions
internal/config.go, internal/cloudwatchmetrics/client.go
Remove InstanceName field from exported Config struct, switch environment validation from INSTANCE_NAME to required AWS_REGION, remove DimensionInstanceName constant, remove instanceName member from internal Client, and update NewClient and NewClientWithAWS constructors.
Helm helpers, values, and template updates
helm/templates/_helpers.tpl, helm/values.yaml, helm/templates/adapter/deployment.yaml, helm/templates/adot-collector/env-configmap.yaml
Remove metrics-cloudwatch.instanceName helper definition and validation, update logGroup helper to default to fixed path /aws/openchoreo/metrics, remove resource/instance processor and InstanceName from EMF metric dimensions in ADOT values, replace INSTANCE_NAME ConfigMap entry with AWS_REGION, and adjust adapter deployment SERVER_PORT value quoting.
Metric dimensions and query logic
internal/cloudwatchmetrics/queries.go, internal/cloudwatchmetrics/queries_test.go, internal/cloudwatchmetrics/alarms_test.go
Remove DimensionInstanceName from buildScopeDimensions to return only ComponentUID, EnvironmentUID, and Namespace dimensions; update all dimension assertion tests across scope, ordering, and alarm tests.
Test configuration and setup updates
internal/config_test.go, internal/cloudwatchmetrics/client_test.go
Remove INSTANCE_NAME from requiredEnv and all test case initializations, add direct AWS_REGION requirement validation, and update newTestClient helper to use empty Config{}.
Application entry point cleanup
main.go
Remove instanceName attribute from structured configuration log and stop passing InstanceName field when constructing the CloudWatch metrics client.

Documentation updates for multi-cluster and deployment flows

Layer / File(s) Summary
Module overview and architecture
README.md
Update module description to emphasize Pod Identity/static credentials without instanceName, clarify that all clusters write to the same shared EMF log group/metric namespace, and revise deployment topology guidance to remove instanceName from cross-cluster consistency requirements.
Prerequisites and IAM permissions guidance
README.md
Add AWS prerequisites subsection, clarify Pod Identity vs static-credentials models, update IAM policy guidance for separate vs combined policies, narrow CloudWatch Logs resource ARNs from /aws/openchoreo/*/metrics:* to /aws/openchoreo/metrics:*, and add notes about alarm tagging behavior and EventBridge constraints.
EKS Pod Identity installation and setup
README.md
Remove INSTANCE_NAME export from shared-values step, clarify Pod Identity credential injection timing, update install command snippets, expand Pod Identity association instructions for single-cluster and multi-cluster topologies with creation steps (console + CLI), and add workload restart and retention hook re-trigger guidance.
Non-EKS static credentials installation
README.md
Reorganize non-EKS path into numbered steps, document chart-managed Kubernetes Secret creation for credentials, clarify that OpenTelemetry collector reads the same Secret via adotcollector.extraEnvsFrom, and note that workload restarts are not required because credentials are injected during install.
Webhook endpoint exposure and EventBridge configuration
README.md
Add "expose only the webhook endpoint" guidance with explicit do-not-expose list mentioning Gateway API HTTPRoute, provide step-by-step EventBridge connection/API destination/rule setup with constrained alarm-name prefix pattern, update token/header requirements, and add test-delivery verification steps.
Webhook secrets and troubleshooting
README.md
Warn that inline secrets are visible in Helm values, introduce Kubernetes Secret reference option with sharedSecret="" clearing warning, expand troubleshooting logs checklist with symptom matrix, update retention Job rerun workflow with corrected Helm upgrade flags (--reuse-values), and update configuration reference table with metrics.logGroup description and log-group-override footnote.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • akila-i

🐰 A cluster that stands alone no more,
Now sharing metrics through the ADOT door,
With Pod Identity and region so true,
The metrics flow seamlessly through and through!
InstanceName gone, the architecture's clear,
Multi-cluster observability without fear! 🔍✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete; it only contains template placeholders with a subject field filled, missing Purpose, Approach, Related Issues, and Remarks sections. Fill in all required sections: Purpose (explain why instance name was removed), Approach (summarize implementation across code and docs), Related Issues (reference issue #3299 if applicable), and Remarks with any relevant context.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing the instance name concept from the metrics module across code and documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@observability-metrics-cloudwatch/README.md`:
- Around line 476-478: The helm upgrade command shown (helm upgrade
observability-metrics-cloudwatch
oci://ghcr.io/openchoreo/helm-charts/observability-metrics-cloudwatch
--namespace "$NS" --reuse-values) must pin the chart version to avoid an
unintended chart upgrade; update both occurrences (the one at the shown snippet
and the other at lines ~839-841) to include a --version flag (e.g. --version
"$CHART_VERSION" or a literal semantic version) and document or source the
CHART_VERSION variable so recovery flows only re-run hooks/jobs rather than
upgrade the release.
- Around line 450-464: Update the restart instructions to be topology-aware by
checking for each workload before restarting: for the adapter, test existence of
the Deployment named metrics-adapter-cloudwatch (e.g. kubectl -n "$NS" get
deployment metrics-adapter-cloudwatch) and only run kubectl -n "$NS" rollout
restart deployment/metrics-adapter-cloudwatch if it exists; likewise check for
the DaemonSet observability-metrics-cloudwatch-adotcollector-agent before
running kubectl -n "$NS" rollout restart
daemonset/observability-metrics-cloudwatch-adotcollector-agent. This keeps
observability-plane-only or data-plane-only installs from attempting to restart
absent resources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e646892a-3d30-46e7-bef2-80043b950254

📥 Commits

Reviewing files that changed from the base of the PR and between 2d4390b and f8ca92b.

📒 Files selected for processing (13)
  • observability-metrics-cloudwatch/README.md
  • observability-metrics-cloudwatch/helm/templates/_helpers.tpl
  • observability-metrics-cloudwatch/helm/templates/adapter/deployment.yaml
  • observability-metrics-cloudwatch/helm/templates/adot-collector/env-configmap.yaml
  • observability-metrics-cloudwatch/helm/values.yaml
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/alarms_test.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/client.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/client_test.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/queries.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/queries_test.go
  • observability-metrics-cloudwatch/internal/config.go
  • observability-metrics-cloudwatch/internal/config_test.go
  • observability-metrics-cloudwatch/main.go
💤 Files with no reviewable changes (5)
  • observability-metrics-cloudwatch/helm/templates/adapter/deployment.yaml
  • observability-metrics-cloudwatch/internal/config.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/alarms_test.go
  • observability-metrics-cloudwatch/main.go
  • observability-metrics-cloudwatch/internal/cloudwatchmetrics/client.go

Comment thread observability-metrics-cloudwatch/README.md
Comment thread observability-metrics-cloudwatch/README.md Outdated
Signed-off-by: Chalindu Kodikara <chalindumkodikara@gmail.com>
Signed-off-by: Chalindu Kodikara <chalindumkodikara@gmail.com>
nilushancosta
nilushancosta previously approved these changes May 16, 2026
@nilushancosta nilushancosta dismissed their stale review May 16, 2026 06:22

need to review the readme

Copy link
Copy Markdown
Contributor

@LakshanSS LakshanSS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LakshanSS LakshanSS merged commit fe64eb8 into openchoreo:main May 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants